ConvertAvroToParquet

Description:

Converts Avro records into Parquet file format. The incoming FlowFile should be a valid avro file. If an incoming FlowFile does not contain any records, an empty parquet file is the output. NOTE: Many Avro datatypes (collections, primitives, and unions of primitives, e.g.) can be converted to parquet, but unions of collections and other complex datatypes may not be able to be converted to Parquet.

Tags:

avro, parquet, convert

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Compression Typecompression-typeUNCOMPRESSED
  • UNCOMPRESSED
  • SNAPPY
  • GZIP
  • LZO
  • BROTLI
  • LZ4
  • ZSTD
The type of compression for the file being written.
Row Group Sizerow-group-sizeThe row group size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Page Sizepage-sizeThe page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Dictionary Page Sizedictionary-page-sizeThe dictionary page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Max Padding Sizemax-padding-sizeThe maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Enable Dictionary Encodingenable-dictionary-encoding
  • true
  • false
Specifies whether dictionary encoding should be enabled for the Parquet writer
Enable Validationenable-validation
  • true
  • false
Specifies whether validation should be enabled for the Parquet writer
Writer Versionwriter-version
  • PARQUET_1_0
  • PARQUET_2_0
Specifies the version used by Parquet writer

Relationships:

NameDescription
successParquet file that was converted successfully from Avro
failureAvro content that could not be processed

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
filenameSets the filename to the existing filename with the extension replaced by / added to by .parquet
record.countSets the number of records in the parquet file.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.