FetchParquet

Description:

Reads from a given Parquet file and writes records to the content of the flow file using the selected record writer. The original Parquet file will remain unchanged, and the content of the flow file will be replaced with records of the selected type. This processor can be used with ListHDFS or ListFile to obtain a listing of files to fetch.

Tags:

parquet, hadoop, HDFS, get, ingest, fetch, source, restricted

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Hadoop Configuration ResourcesA file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
Kerberos PrincipalKerberos principal to authenticate as. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Kerberos KeytabKerberos keytab associated with the principal. Requires nifi.kerberos.krb5.file to be set in your nifi.properties
Kerberos Relogin Period4 hoursPeriod of time which should pass before attempting a kerberos relogin
Additional Classpath ResourcesA comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.
Filename${path}/${filename}The name of the file to retrieve
Supports Expression Language: true
Record WriterController Service API:
RecordSetWriterFactory
Implementations: FreeFormTextRecordSetWriter
CSVRecordSetWriter
JsonRecordSetWriter
ScriptedRecordSetWriter
AvroRecordSetWriter
The service for writing records to the FlowFile content

Relationships:

NameDescription
retryFlowFiles will be routed to this relationship if the content of the file cannot be retrieved, but might be able to be in the future if tried again. This generally indicates that the Fetch should be tried again.
successFlowFiles will be routed to this relationship once they have been updated with the content of the file
failureFlowFiles will be routed to this relationship if the content of the file cannot be retrieved and trying again will likely not be helpful. This would occur, for instance, if the file is not found or if there is a permissions issue

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
fetch.failure.reasonWhen a FlowFile is routed to 'failure', this attribute is added indicating why the file could not be fetched from the given filesystem.
record.countThe number of records in the resulting flow file

State management:

This component does not store state.

Restricted:

Provides operator the ability to retrieve any file that NiFi has access to in HDFS or the local filesystem.

Input requirement:

This component requires an incoming relationship.

See Also:

PutParquet