GetHDFS

Description:

Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.

Tags:

hadoop, HCFS, HDFS, get, fetch, ingest, source, filesystem

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesHadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. To use swebhdfs, see 'Additional Details' section of PutHDFS's documentation.

This property expects a comma-separated list of file resources.

Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos Credentials Servicekerberos-credentials-serviceController Service API:
KerberosCredentialsService
Implementation: KeytabCredentialsService
Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Kerberos User Servicekerberos-user-serviceController Service API:
KerberosUserService
Implementations: KerberosPasswordUserService
KerberosKeytabUserService
KerberosTicketCacheUserService
Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos
Kerberos PrincipalKerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos KeytabKerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties

This property requires exactly one file to be provided..

Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos PasswordKerberos PasswordKerberos password associated with the principal.
Sensitive Property: true
Kerberos Relogin PeriodKerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin. This property has been deprecated, and has no effect on processing. Relogins now occur automatically.
Supports Expression Language: true (will be evaluated using variable registry only)
Additional Classpath ResourcesAdditional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath and used for loading native libraries. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.

This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: file, directory.
DirectoryDirectoryThe HDFS directory from which files should be read
Supports Expression Language: true (will be evaluated using variable registry only)
Recurse SubdirectoriesRecurse Subdirectoriestrue
  • true
  • false
Indicates whether to pull files from subdirectories of the HDFS directory
Keep Source FileKeep Source Filefalse
  • true
  • false
Determines whether to delete the file from HDFS after it has been successfully transferred. If true, the file will be fetched repeatedly. This is intended for testing only.
File Filter RegexFile Filter RegexA Java Regular Expression for filtering Filenames; if a filter is supplied then only files whose names match that Regular Expression will be fetched, otherwise all files will be fetched
Filter Match Name OnlyFilter Match Name Onlytrue
  • true
  • false
If true then File Filter Regex will match on just the filename, otherwise subdirectory names will be included with filename in the regex comparison
Ignore Dotted FilesIgnore Dotted Filestrue
  • true
  • false
If true, files whose names begin with a dot (".") will be ignored
Minimum File AgeMinimum File Age0 secThe minimum age that a file must be in order to be pulled; any file younger than this amount of time (based on last modification date) will be ignored
Maximum File AgeMaximum File AgeThe maximum age that a file must be in order to be pulled; any file older than this amount of time (based on last modification date) will be ignored
Polling IntervalPolling Interval0 secIndicates how long to wait between performing directory listings
Batch SizeBatch Size100The maximum number of files to pull in each iteration, based on run schedule.
IO Buffer SizeIO Buffer SizeAmount of memory to use to buffer file contents during IO. This overrides the Hadoop Configuration
Compression codecCompression codecNONE
  • NONE No compression
  • DEFAULT Default ZLIB compression
  • BZIP BZIP compression
  • GZIP GZIP compression
  • LZ4 LZ4 compression
  • LZO LZO compression - it assumes LD_LIBRARY_PATH has been set and jar is available
  • SNAPPY Snappy compression
  • AUTOMATIC Will attempt to automatically detect the compression codec.
No Description Provided.

Relationships:

NameDescription
successAll files retrieved from HDFS are transferred to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
filenameThe name of the file that was read from HDFS.
pathThe path is set to the relative path of the file's directory on HDFS. For example, if the Directory property is set to /tmp, then files picked up from /tmp will have the path attribute set to "./". If the Recurse Subdirectories property is set to true and a file is picked up from /tmp/abc/1/2/3, then the path attribute will be set to "abc/1/2/3".

State management:

This component does not store state.

Restricted:

Required PermissionExplanation
read distributed filesystemProvides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem.
write distributed filesystemProvides operator the ability to delete any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component does not allow an incoming relationship.

System Resource Considerations:

None specified.

See Also:

PutHDFS, ListHDFS