-
Processors
- AttributeRollingWindow
- AttributesToCSV
- AttributesToJSON
- CalculateRecordStats
- CaptureChangeMySQL
- CompressContent
- ConnectWebSocket
- ConsumeAMQP
- ConsumeAzureEventHub
- ConsumeElasticsearch
- ConsumeGCPubSub
- ConsumeIMAP
- ConsumeJMS
- ConsumeKafka
- ConsumeKinesisStream
- ConsumeMQTT
- ConsumePOP3
- ConsumeSlack
- ConsumeTwitter
- ConsumeWindowsEventLog
- ControlRate
- ConvertCharacterSet
- ConvertRecord
- CopyAzureBlobStorage_v12
- CopyS3Object
- CountText
- CryptographicHashContent
- DebugFlow
- DecryptContentAge
- DecryptContentPGP
- DeduplicateRecord
- DeleteAzureBlobStorage_v12
- DeleteAzureDataLakeStorage
- DeleteByQueryElasticsearch
- DeleteDynamoDB
- DeleteFile
- DeleteGCSObject
- DeleteGridFS
- DeleteMongo
- DeleteS3Object
- DeleteSFTP
- DeleteSQS
- DetectDuplicate
- DistributeLoad
- DuplicateFlowFile
- EncodeContent
- EncryptContentAge
- EncryptContentPGP
- EnforceOrder
- EvaluateJsonPath
- EvaluateXPath
- EvaluateXQuery
- ExecuteGroovyScript
- ExecuteProcess
- ExecuteScript
- ExecuteSQL
- ExecuteSQLRecord
- ExecuteStreamCommand
- ExtractAvroMetadata
- ExtractEmailAttachments
- ExtractEmailHeaders
- ExtractGrok
- ExtractHL7Attributes
- ExtractRecordSchema
- ExtractText
- FetchAzureBlobStorage_v12
- FetchAzureDataLakeStorage
- FetchBoxFile
- FetchDistributedMapCache
- FetchDropbox
- FetchFile
- FetchFTP
- FetchGCSObject
- FetchGoogleDrive
- FetchGridFS
- FetchS3Object
- FetchSFTP
- FetchSmb
- FilterAttribute
- FlattenJson
- ForkEnrichment
- ForkRecord
- GenerateFlowFile
- GenerateRecord
- GenerateTableFetch
- GeoEnrichIP
- GeoEnrichIPRecord
- GeohashRecord
- GetAsanaObject
- GetAwsPollyJobStatus
- GetAwsTextractJobStatus
- GetAwsTranscribeJobStatus
- GetAwsTranslateJobStatus
- GetAzureEventHub
- GetAzureQueueStorage_v12
- GetDynamoDB
- GetElasticsearch
- GetFile
- GetFTP
- GetGcpVisionAnnotateFilesOperationStatus
- GetGcpVisionAnnotateImagesOperationStatus
- GetHubSpot
- GetMongo
- GetMongoRecord
- GetS3ObjectMetadata
- GetSFTP
- GetShopify
- GetSmbFile
- GetSNMP
- GetSplunk
- GetSQS
- GetWorkdayReport
- GetZendesk
- HandleHttpRequest
- HandleHttpResponse
- IdentifyMimeType
- InvokeHTTP
- InvokeScriptedProcessor
- ISPEnrichIP
- JoinEnrichment
- JoltTransformJSON
- JoltTransformRecord
- JSLTTransformJSON
- JsonQueryElasticsearch
- ListAzureBlobStorage_v12
- ListAzureDataLakeStorage
- ListBoxFile
- ListDatabaseTables
- ListDropbox
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSlack
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- ListenWebSocket
- ListFile
- ListFTP
- ListGCSBucket
- ListGoogleDrive
- ListS3
- ListSFTP
- ListSmb
- LogAttribute
- LogMessage
- LookupAttribute
- LookupRecord
- MergeContent
- MergeRecord
- ModifyBytes
- ModifyCompression
- MonitorActivity
- MoveAzureDataLakeStorage
- Notify
- PackageFlowFile
- PaginatedJsonQueryElasticsearch
- ParseEvtx
- ParseNetflowv5
- ParseSyslog
- ParseSyslog5424
- PartitionRecord
- PublishAMQP
- PublishGCPubSub
- PublishJMS
- PublishKafka
- PublishMQTT
- PublishSlack
- PutAzureBlobStorage_v12
- PutAzureCosmosDBRecord
- PutAzureDataExplorer
- PutAzureDataLakeStorage
- PutAzureEventHub
- PutAzureQueueStorage_v12
- PutBigQuery
- PutBoxFile
- PutCloudWatchMetric
- PutDatabaseRecord
- PutDistributedMapCache
- PutDropbox
- PutDynamoDB
- PutDynamoDBRecord
- PutElasticsearchJson
- PutElasticsearchRecord
- PutEmail
- PutFile
- PutFTP
- PutGCSObject
- PutGoogleDrive
- PutGridFS
- PutKinesisFirehose
- PutKinesisStream
- PutLambda
- PutMongo
- PutMongoBulkOperations
- PutMongoRecord
- PutRecord
- PutRedisHashRecord
- PutS3Object
- PutSalesforceObject
- PutSFTP
- PutSmbFile
- PutSNS
- PutSplunk
- PutSplunkHTTP
- PutSQL
- PutSQS
- PutSyslog
- PutTCP
- PutUDP
- PutWebSocket
- PutZendeskTicket
- QueryAirtableTable
- QueryAzureDataExplorer
- QueryDatabaseTable
- QueryDatabaseTableRecord
- QueryRecord
- QuerySalesforceObject
- QuerySplunkIndexingStatus
- RemoveRecordField
- RenameRecordField
- ReplaceText
- ReplaceTextWithMapping
- RetryFlowFile
- RouteHL7
- RouteOnAttribute
- RouteOnContent
- RouteText
- RunMongoAggregation
- SampleRecord
- ScanAttribute
- ScanContent
- ScriptedFilterRecord
- ScriptedPartitionRecord
- ScriptedTransformRecord
- ScriptedValidateRecord
- SearchElasticsearch
- SegmentContent
- SendTrapSNMP
- SetSNMP
- SignContentPGP
- SplitAvro
- SplitContent
- SplitExcel
- SplitJson
- SplitPCAP
- SplitRecord
- SplitText
- SplitXml
- StartAwsPollyJob
- StartAwsTextractJob
- StartAwsTranscribeJob
- StartAwsTranslateJob
- StartGcpVisionAnnotateFilesOperation
- StartGcpVisionAnnotateImagesOperation
- TagS3Object
- TailFile
- TransformXml
- UnpackContent
- UpdateAttribute
- UpdateByQueryElasticsearch
- UpdateCounter
- UpdateDatabaseTable
- UpdateRecord
- ValidateCsv
- ValidateJson
- ValidateRecord
- ValidateXml
- VerifyContentMAC
- VerifyContentPGP
- Wait
-
Controller Services
- ADLSCredentialsControllerService
- ADLSCredentialsControllerServiceLookup
- AmazonGlueSchemaRegistry
- ApicurioSchemaRegistry
- AvroReader
- AvroRecordSetWriter
- AvroSchemaRegistry
- AWSCredentialsProviderControllerService
- AzureBlobStorageFileResourceService
- AzureCosmosDBClientService
- AzureDataLakeStorageFileResourceService
- AzureEventHubRecordSink
- AzureStorageCredentialsControllerService_v12
- AzureStorageCredentialsControllerServiceLookup_v12
- CEFReader
- ConfluentEncodedSchemaReferenceReader
- ConfluentEncodedSchemaReferenceWriter
- ConfluentSchemaRegistry
- CSVReader
- CSVRecordLookupService
- CSVRecordSetWriter
- DatabaseRecordLookupService
- DatabaseRecordSink
- DatabaseTableSchemaRegistry
- DBCPConnectionPool
- DBCPConnectionPoolLookup
- DistributedMapCacheLookupService
- ElasticSearchClientServiceImpl
- ElasticSearchLookupService
- ElasticSearchStringLookupService
- EmailRecordSink
- EmbeddedHazelcastCacheManager
- ExcelReader
- ExternalHazelcastCacheManager
- FreeFormTextRecordSetWriter
- GCPCredentialsControllerService
- GCSFileResourceService
- GrokReader
- HazelcastMapCacheClient
- HikariCPConnectionPool
- HttpRecordSink
- IPLookupService
- JettyWebSocketClient
- JettyWebSocketServer
- JMSConnectionFactoryProvider
- JndiJmsConnectionFactoryProvider
- JsonConfigBasedBoxClientService
- JsonPathReader
- JsonRecordSetWriter
- JsonTreeReader
- Kafka3ConnectionService
- KerberosKeytabUserService
- KerberosPasswordUserService
- KerberosTicketCacheUserService
- LoggingRecordSink
- MapCacheClientService
- MapCacheServer
- MongoDBControllerService
- MongoDBLookupService
- PropertiesFileLookupService
- ProtobufReader
- ReaderLookup
- RecordSetWriterLookup
- RecordSinkServiceLookup
- RedisConnectionPoolService
- RedisDistributedMapCacheClientService
- RestLookupService
- S3FileResourceService
- ScriptedLookupService
- ScriptedReader
- ScriptedRecordSetWriter
- ScriptedRecordSink
- SetCacheClientService
- SetCacheServer
- SimpleCsvFileLookupService
- SimpleDatabaseLookupService
- SimpleKeyValueLookupService
- SimpleRedisDistributedMapCacheClientService
- SimpleScriptedLookupService
- SiteToSiteReportingRecordSink
- SlackRecordSink
- SmbjClientProviderService
- StandardAsanaClientProviderService
- StandardAzureCredentialsControllerService
- StandardDropboxCredentialService
- StandardFileResourceService
- StandardHashiCorpVaultClientService
- StandardHttpContextMap
- StandardJsonSchemaRegistry
- StandardKustoIngestService
- StandardKustoQueryService
- StandardOauth2AccessTokenProvider
- StandardPGPPrivateKeyService
- StandardPGPPublicKeyService
- StandardPrivateKeyService
- StandardProxyConfigurationService
- StandardRestrictedSSLContextService
- StandardS3EncryptionService
- StandardSSLContextService
- StandardWebClientServiceProvider
- Syslog5424Reader
- SyslogReader
- UDPEventRecordSink
- VolatileSchemaCache
- WindowsEventLogReader
- XMLFileLookupService
- XMLReader
- XMLRecordSetWriter
- YamlTreeReader
- ZendeskRecordSink
CSVReader 2.0.0
- Bundle
- org.apache.nifi | nifi-record-serialization-services-nar
- Description
- Parses CSV-formatted data, returning each row in the CSV file as a separate record. This reader allows for inferring a schema based on the first line of the CSV, if a 'header line' is present, or providing an explicit schema for interpreting the values. See Controller Service's Usage for further documentation.
- Tags
- comma, csv, delimited, parse, reader, record, row, separated, values
- Input Requirement
- Supports Sensitive Dynamic Properties
- false
-
Additional Details for CSVReader 2.0.0
CSVReader
The CSVReader allows for interpreting input data as delimited Records. By default, a comma is used as the field separator, but this is configurable. It is common, for instance, to use a tab in order to read tab-separated values, or TSV.
There are pre-defined CSV formats in the reader like EXCEL. Further information regarding their settings can be found here: https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html
The reader allows for customization of the CSV Format, such as which character should be used to separate CSV fields, which character should be used for quoting and when to quote fields, which character should denote a comment, etc. The names of the fields may be specified either by having a “header line” as the first line in the CSV (in which case the Schema Access Strategy should be “Infer Schema” or “Use String Fields From Header”) or can be supplied by specifying the schema by using the Schema Text or looking up the schema in a Schema Registry.Schemas and Type Coercion
When a record is parsed from incoming data, it is separated into fields. Each of these fields is then looked up against the configured schema (by field name) in order to determine what the type of the data should be. If the field is not present in the schema, that field is omitted from the Record. If the field is found in the schema, the data type of the received data is compared against the data type specified in the schema. If the types match, the value of that field is used as-is. If the schema indicates that the field should be of a different type, then the Controller Service will attempt to coerce the data into the type specified by the schema. If the field cannot be coerced into the specified type, an Exception will be thrown.
The following rules apply when attempting to coerce a field value from one data type to another:
- Any data type can be coerced into a String type.
- Any numeric data type (Byte, Short, Int, Long, Float, Double) can be coerced into any other numeric data type.
- Any numeric value can be coerced into a Date, Time, or Timestamp type, by assuming that the Long value is the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
- A String value can be coerced into a Date, Time, or Timestamp type, if its format matches the configured “Date Format,” “Time Format,” or “Timestamp Format.”
- A String value can be coerced into a numeric value if the value is of the appropriate type. For example, the String
value
8
can be coerced into any numeric type. However, the String value8.2
can be coerced into a Double or Float type but not an Integer. - A String value of “true” or “false” (regardless of case) can be coerced into a Boolean value.
- A String value that is not empty can be coerced into a Char type. If the String contains more than 1 character, the first character is used and the rest of the characters are ignored.
- Any “date/time” type (Date, Time, Timestamp) can be coerced into any other “date/time” type.
- Any “date/time” type can be coerced into a Long type, representing the number of milliseconds since epoch (Midnight GMT, January 1, 1970).
- Any “date/time” type can be coerced into a String. The format of the String is whatever DateFormat is configured for the corresponding property (Date Format, Time Format, Timestamp Format property).
If none of the above rules apply when attempting to coerce a value from one data type to another, the coercion will fail and an Exception will be thrown.
Schema Inference
While NiFi’s Record API does require that each Record have a schema, it is often convenient to infer the schema based on the values in the data, rather than having to manually create a schema. This is accomplished by selecting a value of " Infer Schema" for the “Schema Access Strategy” property. When using this strategy, the Reader will determine the schema by first parsing all data in the FlowFile, keeping track of all fields that it has encountered and the type of each field. Once all data has been parsed, a schema is formed that encompasses all fields that have been encountered.
A common concern when inferring schemas is how to handle the condition of two values that have different types. For example, consider a FlowFile with the following two records:
name, age John, 8 Jane, Ten
It is clear that the “name” field will be inferred as a STRING type. However, how should we handle the “age” field? Should the field be an CHOICE between INT and STRING? Should we prefer LONG over INT? Should we just use a STRING? Should the field be considered nullable?
To help understand how this Record Reader infers schemas, we have the following list of rules that are followed in the inference logic:
- All fields are inferred to be nullable.
- When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers to use a “wider” data type over using a CHOICE data type. A data type “A” is said to be wider than data type “B” if and only if data type “A” encompasses all values of “B” in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
- Before inferring the type of value, leading and trailing whitespace are removed. Additionally, if the value is
surrounded by double-quotes ("), the double-quotes are removed. Therefore, the value
16
is interpreted the same as"16"
. Both will be interpreted as an INT. However, the value" 16"
will be inferred as a STRING type because the white space is enclosed within double-quotes, which means that the white space is considered part of the value. - If the “Time Format,” “Timestamp Format,” or “Date Format” properties are configured, any value that would otherwise be considered a STRING type is first checked against the configured formats to see if it matches any of them. If the value matches the Timestamp Format, the value is considered a Timestamp field. If it matches the Date Format, it is considered a Date field. If it matches the Time Format, it is considered a Time field. In the unlikely event that the value matches more than one of the configured formats, they will be matched in the order: Timestamp, Date, Time. I.e., if a value matched both the Timestamp Format and the Date Format, the type that is inferred will be Timestamp. Because parsing dates and times can be expensive, it is advisable not to configure these formats if dates, times, and timestamps are not expected, or if processing the data as a STRING is acceptable. For use cases when this is important, though, the inference engine is intelligent enough to optimize the parsing by first checking several very cheap conditions. For example, the string’s length is examined to see if it is too long or too short to match the pattern. This results in far more efficient processing than would result if attempting to parse each string value as a timestamp.
- The MAP type is never inferred.
- The ARRAY type is never inferred.
- The RECORD type is never inferred.
- If a field exists but all values are null, then the field is inferred to be of type STRING.
Caching of Inferred Schemas
This Record Reader requires that if a schema is to be inferred, that all records be read in order to ensure that the schema that gets inferred is applicable for all records in the FlowFile. However, this can become expensive, especially if the data undergoes many different transformations. To alleviate the cost of inferring schemas, the Record Reader can be configured with a “Schema Inference Cache” by populating the property with that name. This is a Controller Service that can be shared by Record Readers and Record Writers.
Whenever a Record Writer is used to write data, if it is configured with a “Schema Cache,” it will also add the schema to the Schema Cache. This will result in an identifier for that schema being added as an attribute to the FlowFile.
Whenever a Record Reader is used to read data, if it is configured with a “Schema Inference Cache”, it will first look for a “schema.cache.identifier” attribute on the FlowFile. If the attribute exists, it will use the value of that attribute to lookup the schema in the schema cache. If it is able to find a schema in the cache with that identifier, then it will use that schema instead of reading, parsing, and analyzing the data to infer the schema. If the attribute is not available on the FlowFile, or if the attribute is available but the cache does not have a schema with that identifier, then the Record Reader will proceed to infer the schema as described above.
The end result is that users are able to chain together many different Processors to operate on Record-oriented data. Typically, only the first such Processor in the chain will incur the “penalty” of inferring the schema. For all other Processors in the chain, the Record Reader is able to simply lookup the schema in the Schema Cache by identifier. This allows the Record Reader to infer a schema accurately, since it is inferred based on all data in the FlowFile, and still allows this to happen efficiently since the schema will typically only be inferred once, regardless of how many Processors handle the data.
Examples
Example 1
As an example, consider a FlowFile whose contents consists of the following:
id, name, balance, join_date, notes 1, John, 48.23, 04/03/2007 "Our very first customer!" 2, Jane, 1245.89, 08/22/2009, 3, Frank Franklin, "48481.29", 04/04/2016,
Additionally, let’s consider that this Controller Service is configured with the Schema Registry pointing to an AvroSchemaRegistry and the schema is configured as the following:
{ "namespace": "nifi", "name": "balances", "type": "record", "fields": [ { "name": "id", "type": "int" }, { "name": "name", "type": "string" }, { "name": "balance", "type": "double" }, { "name": "join_date", "type": { "type": "int", "logicalType": "date" } }, { "name": "notes", "type": "string" } ] }
In the example above, we see that the ‘join_date’ column is a Date type. In order for the CSV Reader to be able to properly parse a value as a date, we need to provide the reader with the date format to use. In this example, we would configure the Date Format property to be
MM/dd/yyyy
to indicate that it is a two-digit month, followed by a two-digit day, followed by a four-digit year - each separated by a slash. In this case, the result will be that this FlowFile consists of 3 different records. The first record will contain the following values:Field Name Field Value id 1 name John balance 48.23 join_date 04/03/2007 notes Our very
first customer!The second record will contain the following values:
Field Name Field Value id 2 name Jane balance 1245.89 join_date 08/22/2009 notes The third record will contain the following values:
Field Name Field Value id 3 name Frank Franklin balance 48481.29 join_date 04/04/2016 notes Example 2 - Schema with CSV Header Line
When CSV data consists of a header line that outlines the column names, the reader provides a couple of different properties for configuring how to handle these column names. The “Schema Access Strategy” property as well as the associated properties (“Schema Registry,” “Schema Text,” and “Schema Name” properties) can be used to specify how to obtain the schema. If the “Schema Access Strategy” is set to “Use String Fields From Header” then the header line of the CSV will be used to determine the schema. Otherwise, a schema will be referenced elsewhere. But what happens if a schema is obtained from a Schema Registry, for instance, and the CSV Header indicates a different set of column names?
For example, let’s say that the following schema is obtained from the Schema Registry:
{ "namespace": "nifi", "name": "balances", "type": "record", "fields": [ { "name": "id", "type": "int" }, { "name": "name", "type": "string" }, { "name": "balance", "type": "double" }, { "name": "memo", "type": "string" } ] }
And the CSV contains the following data:
id, name, balance, notes 1, John Doe, 123.45, First Customer
Note here that our schema indicates that the final column is named “memo” whereas the CSV Header indicates that it is named “notes.”
In this case, the reader will look at the “Ignore CSV Header Column Names” property. If this property is set to “true” then the column names provided in the CSV will simply be ignored and the last column will be called “memo.” However, if the “Ignore CSV Header Column Names” property is set to “false” then the result will be that the last column will be named “notes” and each record will have a null value for the “memo” column.
With “Ignore CSV Header Column Names” property set to “false”:
Field Name Field Value id 1 name John Doe balance 123.45 memo First Customer With “Ignore CSV Header Column Names” property set to “true”:
Field Name Field Value id 1 name John Doe balance 123.45 notes First Customer memo null
-
Comment Marker
The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored.
- Display Name
- Comment Marker
- Description
- The character that is used to denote the start of a comment. Any line that begins with this comment will be ignored.
- API Name
- Comment Marker
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- CSV Format is set to any of [custom]
-
CSV Format
Specifies which "format" the CSV data is in, or specifies if custom formatting should be used.
- Display Name
- CSV Format
- Description
- Specifies which "format" the CSV data is in, or specifies if custom formatting should be used.
- API Name
- CSV Format
- Default Value
- custom
- Allowable Values
-
- Custom Format
- RFC 4180
- Microsoft Excel
- Tab-Delimited
- MySQL Format
- Informix Unload
- Informix Unload Escape Disabled
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
CSV Parser
Specifies which parser to use to read CSV records. NOTE: Different parsers may support different subsets of functionality and may also exhibit different levels of performance.
- Display Name
- CSV Parser
- Description
- Specifies which parser to use to read CSV records. NOTE: Different parsers may support different subsets of functionality and may also exhibit different levels of performance.
- API Name
- csv-reader-csv-parser
- Default Value
- commons-csv
- Allowable Values
-
- Apache Commons CSV
- Jackson CSV
- FastCSV
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Allow Duplicate Header Names
Whether duplicate header names are allowed. Header names are case-sensitive, for example "name" and "Name" are treated as separate fields. Handling of duplicate header names is CSV Parser specific (where applicable): * Apache Commons CSV - duplicate headers will result in column data "shifting" right with new fields created for "unknown_field_index_X" where "X" is the CSV column index number * Jackson CSV - duplicate headers will be de-duplicated with the field value being that of the right-most duplicate CSV column * FastCSV - duplicate headers will be de-duplicated with the field value being that of the left-most duplicate CSV column
- Display Name
- Allow Duplicate Header Names
- Description
- Whether duplicate header names are allowed. Header names are case-sensitive, for example "name" and "Name" are treated as separate fields. Handling of duplicate header names is CSV Parser specific (where applicable): * Apache Commons CSV - duplicate headers will result in column data "shifting" right with new fields created for "unknown_field_index_X" where "X" is the CSV column index number * Jackson CSV - duplicate headers will be de-duplicated with the field value being that of the right-most duplicate CSV column * FastCSV - duplicate headers will be de-duplicated with the field value being that of the left-most duplicate CSV column
- API Name
- csvutils-allow-duplicate-header-names
- Default Value
- true
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
- Dependencies
-
- CSV Format is set to any of [custom]
-
Character Set
The Character Encoding that is used to encode/decode the CSV file
- Display Name
- Character Set
- Description
- The Character Encoding that is used to encode/decode the CSV file
- API Name
- csvutils-character-set
- Default Value
- UTF-8
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Date Format
Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
- Display Name
- Date Format
- Description
- Specifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters, as in 01/01/2017).
- API Name
- Date Format
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Escape Character
The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used.
- Display Name
- Escape Character
- Description
- The character that is used to escape characters that would otherwise have a specific meaning to the CSV Parser. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Escape Character at runtime, then it will be skipped and the default Escape Character will be used. Setting it to an empty string means no escape character should be used.
- API Name
- Escape Character
- Default Value
- \
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [custom]
-
Ignore CSV Header Column Names
If the first line of a CSV is a header, and the configured schema does not match the fields named in the header line, this controls how the Reader will interpret the fields. If this property is true, then the field names mapped to each column are driven only by the configured schema and any fields not in the schema will be ignored. If this property is false, then the field names found in the CSV Header will be used as the names of the fields.
- Display Name
- Ignore CSV Header Column Names
- Description
- If the first line of a CSV is a header, and the configured schema does not match the fields named in the header line, this controls how the Reader will interpret the fields. If this property is true, then the field names mapped to each column are driven only by the configured schema and any fields not in the schema will be ignored. If this property is false, then the field names found in the CSV Header will be used as the names of the fields.
- API Name
- ignore-csv-header
- Default Value
- false
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Null String
Specifies a String that, if present as a value in the CSV, should be considered a null field instead of using the literal value.
- Display Name
- Null String
- Description
- Specifies a String that, if present as a value in the CSV, should be considered a null field instead of using the literal value.
- API Name
- Null String
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
- Dependencies
-
- CSV Format is set to any of [custom]
-
Quote Character
The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used.
- Display Name
- Quote Character
- Description
- The character that is used to quote values so that escape characters do not have to be used. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Quote Character at runtime, then it will be skipped and the default Quote Character will be used.
- API Name
- Quote Character
- Default Value
- "
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [custom]
-
Record Separator
Specifies the characters to use in order to separate CSV Records
- Display Name
- Record Separator
- Description
- Specifies the characters to use in order to separate CSV Records
- API Name
- Record Separator
- Default Value
- \n
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [custom]
-
Schema Access Strategy
Specifies how to obtain the schema that is to be used for interpreting the data.
- Display Name
- Schema Access Strategy
- Description
- Specifies how to obtain the schema that is to be used for interpreting the data.
- API Name
- schema-access-strategy
- Default Value
- infer-schema
- Allowable Values
-
- Use 'Schema Name' Property
- Use 'Schema Text' Property
- Schema Reference Reader
- Use String Fields From Header
- Infer Schema
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Schema Branch
Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
- Display Name
- Schema Branch
- Description
- Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
- API Name
- schema-branch
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Schema Name
Specifies the name of the schema to lookup in the Schema Registry property
- Display Name
- Schema Name
- Description
- Specifies the name of the schema to lookup in the Schema Registry property
- API Name
- schema-name
- Default Value
- ${schema.name}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Schema Reference Reader
Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
- Display Name
- Schema Reference Reader
- Description
- Service implementation responsible for reading FlowFile attributes or content to determine the Schema Reference Identifier
- API Name
- schema-reference-reader
- Service Interface
- org.apache.nifi.schemaregistry.services.SchemaReferenceReader
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
- Dependencies
-
- Schema Access Strategy is set to any of [schema-reference-reader]
-
Schema Registry
Specifies the Controller Service to use for the Schema Registry
- Display Name
- Schema Registry
- Description
- Specifies the Controller Service to use for the Schema Registry
- API Name
- schema-registry
- Service Interface
- org.apache.nifi.schemaregistry.services.SchemaRegistry
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-reference-reader, schema-name]
-
Schema Text
The text of an Avro-formatted Schema
- Display Name
- Schema Text
- Description
- The text of an Avro-formatted Schema
- API Name
- schema-text
- Default Value
- ${avro.schema}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-text-property]
-
Schema Version
Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
- Display Name
- Schema Version
- Description
- Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
- API Name
- schema-version
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Treat First Line as Header
Specifies whether or not the first line of CSV should be considered a Header or should be considered a record. If the Schema Access Strategy indicates that the columns must be defined in the header, then this property will be ignored, since the header must always be present and won't be processed as a Record. Otherwise, if 'true', then the first line of CSV data will not be processed as a record and if 'false',then the first line will be interpreted as a record.
- Display Name
- Treat First Line as Header
- Description
- Specifies whether or not the first line of CSV should be considered a Header or should be considered a record. If the Schema Access Strategy indicates that the columns must be defined in the header, then this property will be ignored, since the header must always be present and won't be processed as a Record. Otherwise, if 'true', then the first line of CSV data will not be processed as a record and if 'false',then the first line will be interpreted as a record.
- API Name
- Skip Header Line
- Default Value
- false
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Time Format
Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
- Display Name
- Time Format
- Description
- Specifies the format to use when reading/writing Time fields. If not specified, Time fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, HH:mm:ss for a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 18:04:15).
- API Name
- Time Format
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Timestamp Format
Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
- Display Name
- Timestamp Format
- Description
- Specifies the format to use when reading/writing Timestamp fields. If not specified, Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT). If specified, the value must match the Java java.time.format.DateTimeFormatter format (for example, MM/dd/yyyy HH:mm:ss for a two-digit month, followed by a two-digit day, followed by a four-digit year, all separated by '/' characters; and then followed by a two-digit hour in 24-hour format, followed by a two-digit minute, followed by a two-digit second, all separated by ':' characters, as in 01/01/2017 18:04:15).
- API Name
- Timestamp Format
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Trim double quote
Whether or not to trim starting and ending double quotes. For example: with trim string '"test"' would be parsed to 'test', without trim would be parsed to '"test"'.If set to 'false' it means full compliance with RFC-4180. Default value is true, with trim.
- Display Name
- Trim double quote
- Description
- Whether or not to trim starting and ending double quotes. For example: with trim string '"test"' would be parsed to 'test', without trim would be parsed to '"test"'.If set to 'false' it means full compliance with RFC-4180. Default value is true, with trim.
- API Name
- Trim double quote
- Default Value
- true
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [rfc-4180]
-
Trim Fields
Whether or not white space should be removed from the beginning and end of fields
- Display Name
- Trim Fields
- Description
- Whether or not white space should be removed from the beginning and end of fields
- API Name
- Trim Fields
- Default Value
- true
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [custom]
-
Value Separator
The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used.
- Display Name
- Value Separator
- Description
- The character that is used to separate values/fields in a CSV Record. If the property has been specified via Expression Language but the expression gets evaluated to an invalid Value Separator at runtime, then it will be skipped and the default Value Separator will be used.
- API Name
- Value Separator
- Default Value
- ,
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- true
- Dependencies
-
- CSV Format is set to any of [custom]