ParquetRecordSetWriter

Description:

Writes the contents of a RecordSet in Parquet format.

Tags:

parquet, result, set, writer, serializer, record, recordset, row

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Schema Write StrategySchema Write StrategyDo Not Write Schema
  • Do Not Write Schema Do not add any schema-related information to the FlowFile.
  • Set 'schema.name' Attribute The FlowFile will be given an attribute named 'schema.name' and this attribute will indicate the name of the schema in the Schema Registry. Note that ifthe schema for a record is not obtained from a Schema Registry, then no attribute will be added.
  • Set 'avro.schema' Attribute The FlowFile will be given an attribute named 'avro.schema' and this attribute will contain the Avro Schema that describes the records in the FlowFile. The contents of the FlowFile need not be Avro, but the text of the schema will be used.
  • Schema Reference Writer The schema reference information will be written through a configured Schema Reference Writer service implementation.
Specifies how the schema for a Record should be added to the data.
Schema Cacheschema-cacheController Service API:
RecordSchemaCacheService
Implementation: VolatileSchemaCache
Specifies a Schema Cache to add the Record Schema to so that Record Readers can quickly lookup the schema.
Schema Reference WriterSchema Reference WriterController Service API:
SchemaReferenceWriter
Implementation: ConfluentEncodedSchemaReferenceWriter
Service implementation responsible for writing FlowFile attributes or content header with Schema reference information

This Property is only considered if the [Schema Write Strategy] Property has a value of "Schema Reference Writer".
Schema Access Strategyschema-access-strategyInherit Record Schema
  • Inherit Record Schema The schema used to write records will be the same schema that was given to the Record when the Record was created.
  • Use 'Schema Name' Property The name of the Schema to use is specified by the 'Schema Name' Property. The value of this property is used to lookup the Schema in the configured Schema Registry service.
  • Use 'Schema Text' Property The text of the Schema itself is specified by the 'Schema Text' Property. The value of this property must be a valid Avro Schema. If Expression Language is used, the value of the 'Schema Text' property must be valid after substituting the expressions.
Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Registryschema-registryController Service API:
SchemaRegistry
Implementations: ConfluentSchemaRegistry
AmazonGlueSchemaRegistry
DatabaseTableSchemaRegistry
AvroSchemaRegistry
ApicurioSchemaRegistry
Specifies the Controller Service to use for the Schema Registry

This Property is only considered if the [Schema Access Strategy] Property is set to one of the following values: [Use 'Schema Name' Property]
Schema Nameschema-name${schema.name}Specifies the name of the schema to lookup in the Schema Registry property
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Versionschema-versionSpecifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Branchschema-branchSpecifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Textschema-text${avro.schema}The text of an Avro-formatted Schema
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Text' Property".
Schema Reference Readerschema-reference-readerController Service API:
SchemaReferenceReader
Implementation: ConfluentEncodedSchemaReferenceReader
Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier

This Property is only considered if the [Schema Access Strategy] Property
Cache Sizecache-size1000Specifies how many Schemas should be cached
Compression Typecompression-typeUNCOMPRESSED
  • UNCOMPRESSED
  • SNAPPY
  • GZIP
  • LZO
  • BROTLI
  • LZ4
  • ZSTD
  • LZ4_RAW
The type of compression for the file being written.
Row Group Sizerow-group-sizeThe row group size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Page Sizepage-sizeThe page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Dictionary Page Sizedictionary-page-sizeThe dictionary page size used by the Parquet writer. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Max Padding Sizemax-padding-sizeThe maximum amount of padding that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect. The value is specified in the format of <Data Size> <Data Unit> where Data Unit is one of B, KB, MB, GB, TB.
Supports Expression Language: true (will be evaluated using flow file attributes and Environment variables)
Enable Dictionary Encodingenable-dictionary-encoding
  • true
  • false
Specifies whether dictionary encoding should be enabled for the Parquet writer
Enable Validationenable-validation
  • true
  • false
Specifies whether validation should be enabled for the Parquet writer
Writer Versionwriter-version
  • PARQUET_1_0
  • PARQUET_2_0
Specifies the version used by Parquet writer
Avro Write Old List Structureavro-write-old-list-structuretrue
  • true
  • false
Specifies the value for 'parquet.avro.write-old-list-structure' in the underlying Parquet library
Avro Add List Element Recordsavro-add-list-element-recordstrue
  • true
  • false
Specifies the value for 'parquet.avro.add-list-element-records' in the underlying Parquet library
INT96 Fieldsint96-fieldsList of fields with full path that should be treated as INT96 timestamps.

State management:

This component does not store state.

Restricted:

This component is not restricted.

System Resource Considerations:

None specified.