Reads from a given Parquet file and writes records to the content of the flow file using the selected record writer. The original Parquet file will remain unchanged, and the content of the flow file will be replaced with records of the selected type. This processor can be used with ListHDFS or ListFile to obtain a listing of files to fetch.
parquet, hadoop, HDFS, get, ingest, fetch, source, record
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Hadoop Configuration Resources | Hadoop Configuration Resources | A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. To use swebhdfs, see 'Additional Details' section of PutHDFS's documentation. This property expects a comma-separated list of file resources. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Kerberos Credentials Service | kerberos-credentials-service | Controller Service API: KerberosCredentialsService Implementation: KeytabCredentialsService | Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos | |
Kerberos User Service | kerberos-user-service | Controller Service API: KerberosUserService Implementations: KerberosPasswordUserService KerberosKeytabUserService KerberosTicketCacheUserService | Specifies the Kerberos User Controller Service that should be used for authenticating with Kerberos | |
Kerberos Principal | Kerberos Principal | Kerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties Supports Expression Language: true (will be evaluated using variable registry only) | ||
Kerberos Keytab | Kerberos Keytab | Kerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties This property requires exactly one file to be provided.. Supports Expression Language: true (will be evaluated using variable registry only) | ||
Kerberos Password | Kerberos Password | Kerberos password associated with the principal. Sensitive Property: true | ||
Kerberos Relogin Period | Kerberos Relogin Period | 4 hours | Period of time which should pass before attempting a kerberos relogin.
This property has been deprecated, and has no effect on processing. Relogins now occur automatically. Supports Expression Language: true (will be evaluated using variable registry only) | |
Additional Classpath Resources | Additional Classpath Resources | A comma-separated list of paths to files and/or directories that will be added to the classpath and used for loading native libraries. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included. This property expects a comma-separated list of resources. Each of the resources may be of any of the following types: file, directory. | ||
Filename | filename | ${path}/${filename} | The name of the file to retrieve Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) | |
Record Writer | record-writer | Controller Service API: RecordSetWriterFactory Implementations: JsonRecordSetWriter RecordSetWriterLookup AvroRecordSetWriter XMLRecordSetWriter FreeFormTextRecordSetWriter CSVRecordSetWriter ParquetRecordSetWriter ScriptedRecordSetWriter | The service for writing records to the FlowFile content |
Name | Description |
---|---|
retry | FlowFiles will be routed to this relationship if the content of the file cannot be retrieved, but might be able to be in the future if tried again. This generally indicates that the Fetch should be tried again. |
success | FlowFiles will be routed to this relationship once they have been updated with the content of the file |
failure | FlowFiles will be routed to this relationship if the content of the file cannot be retrieved and trying again will likely not be helpful. This would occur, for instance, if the file is not found or if there is a permissions issue |
Name | Description |
---|---|
record.offset | Gets the index of first record in the input. |
record.count | Gets the number of records in the input. |
Name | Description |
---|---|
fetch.failure.reason | When a FlowFile is routed to 'failure', this attribute is added indicating why the file could not be fetched from the given filesystem. |
record.count | The number of records in the resulting flow file |
hadoop.file.url | The hadoop url for the file is stored in this attribute. |
Required Permission | Explanation |
---|---|
read distributed filesystem | Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem. |