-
Processors
- AttributeRollingWindow
- AttributesToCSV
- AttributesToJSON
- CalculateRecordStats
- CaptureChangeMySQL
- CompressContent
- ConnectWebSocket
- ConsumeAMQP
- ConsumeAzureEventHub
- ConsumeElasticsearch
- ConsumeGCPubSub
- ConsumeIMAP
- ConsumeJMS
- ConsumeKafka
- ConsumeKinesisStream
- ConsumeMQTT
- ConsumePOP3
- ConsumeSlack
- ConsumeTwitter
- ConsumeWindowsEventLog
- ControlRate
- ConvertCharacterSet
- ConvertRecord
- CopyAzureBlobStorage_v12
- CopyS3Object
- CountText
- CryptographicHashContent
- DebugFlow
- DecryptContentAge
- DecryptContentPGP
- DeduplicateRecord
- DeleteAzureBlobStorage_v12
- DeleteAzureDataLakeStorage
- DeleteByQueryElasticsearch
- DeleteDynamoDB
- DeleteFile
- DeleteGCSObject
- DeleteGridFS
- DeleteMongo
- DeleteS3Object
- DeleteSFTP
- DeleteSQS
- DetectDuplicate
- DistributeLoad
- DuplicateFlowFile
- EncodeContent
- EncryptContentAge
- EncryptContentPGP
- EnforceOrder
- EvaluateJsonPath
- EvaluateXPath
- EvaluateXQuery
- ExecuteGroovyScript
- ExecuteProcess
- ExecuteScript
- ExecuteSQL
- ExecuteSQLRecord
- ExecuteStreamCommand
- ExtractAvroMetadata
- ExtractEmailAttachments
- ExtractEmailHeaders
- ExtractGrok
- ExtractHL7Attributes
- ExtractRecordSchema
- ExtractText
- FetchAzureBlobStorage_v12
- FetchAzureDataLakeStorage
- FetchBoxFile
- FetchDistributedMapCache
- FetchDropbox
- FetchFile
- FetchFTP
- FetchGCSObject
- FetchGoogleDrive
- FetchGridFS
- FetchS3Object
- FetchSFTP
- FetchSmb
- FilterAttribute
- FlattenJson
- ForkEnrichment
- ForkRecord
- GenerateFlowFile
- GenerateRecord
- GenerateTableFetch
- GeoEnrichIP
- GeoEnrichIPRecord
- GeohashRecord
- GetAsanaObject
- GetAwsPollyJobStatus
- GetAwsTextractJobStatus
- GetAwsTranscribeJobStatus
- GetAwsTranslateJobStatus
- GetAzureEventHub
- GetAzureQueueStorage_v12
- GetDynamoDB
- GetElasticsearch
- GetFile
- GetFTP
- GetGcpVisionAnnotateFilesOperationStatus
- GetGcpVisionAnnotateImagesOperationStatus
- GetHubSpot
- GetMongo
- GetMongoRecord
- GetS3ObjectMetadata
- GetSFTP
- GetShopify
- GetSmbFile
- GetSNMP
- GetSplunk
- GetSQS
- GetWorkdayReport
- GetZendesk
- HandleHttpRequest
- HandleHttpResponse
- IdentifyMimeType
- InvokeHTTP
- InvokeScriptedProcessor
- ISPEnrichIP
- JoinEnrichment
- JoltTransformJSON
- JoltTransformRecord
- JSLTTransformJSON
- JsonQueryElasticsearch
- ListAzureBlobStorage_v12
- ListAzureDataLakeStorage
- ListBoxFile
- ListDatabaseTables
- ListDropbox
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSlack
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- ListenWebSocket
- ListFile
- ListFTP
- ListGCSBucket
- ListGoogleDrive
- ListS3
- ListSFTP
- ListSmb
- LogAttribute
- LogMessage
- LookupAttribute
- LookupRecord
- MergeContent
- MergeRecord
- ModifyBytes
- ModifyCompression
- MonitorActivity
- MoveAzureDataLakeStorage
- Notify
- PackageFlowFile
- PaginatedJsonQueryElasticsearch
- ParseEvtx
- ParseNetflowv5
- ParseSyslog
- ParseSyslog5424
- PartitionRecord
- PublishAMQP
- PublishGCPubSub
- PublishJMS
- PublishKafka
- PublishMQTT
- PublishSlack
- PutAzureBlobStorage_v12
- PutAzureCosmosDBRecord
- PutAzureDataExplorer
- PutAzureDataLakeStorage
- PutAzureEventHub
- PutAzureQueueStorage_v12
- PutBigQuery
- PutBoxFile
- PutCloudWatchMetric
- PutDatabaseRecord
- PutDistributedMapCache
- PutDropbox
- PutDynamoDB
- PutDynamoDBRecord
- PutElasticsearchJson
- PutElasticsearchRecord
- PutEmail
- PutFile
- PutFTP
- PutGCSObject
- PutGoogleDrive
- PutGridFS
- PutKinesisFirehose
- PutKinesisStream
- PutLambda
- PutMongo
- PutMongoBulkOperations
- PutMongoRecord
- PutRecord
- PutRedisHashRecord
- PutS3Object
- PutSalesforceObject
- PutSFTP
- PutSmbFile
- PutSNS
- PutSplunk
- PutSplunkHTTP
- PutSQL
- PutSQS
- PutSyslog
- PutTCP
- PutUDP
- PutWebSocket
- PutZendeskTicket
- QueryAirtableTable
- QueryAzureDataExplorer
- QueryDatabaseTable
- QueryDatabaseTableRecord
- QueryRecord
- QuerySalesforceObject
- QuerySplunkIndexingStatus
- RemoveRecordField
- RenameRecordField
- ReplaceText
- ReplaceTextWithMapping
- RetryFlowFile
- RouteHL7
- RouteOnAttribute
- RouteOnContent
- RouteText
- RunMongoAggregation
- SampleRecord
- ScanAttribute
- ScanContent
- ScriptedFilterRecord
- ScriptedPartitionRecord
- ScriptedTransformRecord
- ScriptedValidateRecord
- SearchElasticsearch
- SegmentContent
- SendTrapSNMP
- SetSNMP
- SignContentPGP
- SplitAvro
- SplitContent
- SplitExcel
- SplitJson
- SplitPCAP
- SplitRecord
- SplitText
- SplitXml
- StartAwsPollyJob
- StartAwsTextractJob
- StartAwsTranscribeJob
- StartAwsTranslateJob
- StartGcpVisionAnnotateFilesOperation
- StartGcpVisionAnnotateImagesOperation
- TagS3Object
- TailFile
- TransformXml
- UnpackContent
- UpdateAttribute
- UpdateByQueryElasticsearch
- UpdateCounter
- UpdateDatabaseTable
- UpdateRecord
- ValidateCsv
- ValidateJson
- ValidateRecord
- ValidateXml
- VerifyContentMAC
- VerifyContentPGP
- Wait
-
Controller Services
- ADLSCredentialsControllerService
- ADLSCredentialsControllerServiceLookup
- AmazonGlueSchemaRegistry
- ApicurioSchemaRegistry
- AvroReader
- AvroRecordSetWriter
- AvroSchemaRegistry
- AWSCredentialsProviderControllerService
- AzureBlobStorageFileResourceService
- AzureCosmosDBClientService
- AzureDataLakeStorageFileResourceService
- AzureEventHubRecordSink
- AzureStorageCredentialsControllerService_v12
- AzureStorageCredentialsControllerServiceLookup_v12
- CEFReader
- ConfluentEncodedSchemaReferenceReader
- ConfluentEncodedSchemaReferenceWriter
- ConfluentSchemaRegistry
- CSVReader
- CSVRecordLookupService
- CSVRecordSetWriter
- DatabaseRecordLookupService
- DatabaseRecordSink
- DatabaseTableSchemaRegistry
- DBCPConnectionPool
- DBCPConnectionPoolLookup
- DistributedMapCacheLookupService
- ElasticSearchClientServiceImpl
- ElasticSearchLookupService
- ElasticSearchStringLookupService
- EmailRecordSink
- EmbeddedHazelcastCacheManager
- ExcelReader
- ExternalHazelcastCacheManager
- FreeFormTextRecordSetWriter
- GCPCredentialsControllerService
- GCSFileResourceService
- GrokReader
- HazelcastMapCacheClient
- HikariCPConnectionPool
- HttpRecordSink
- IPLookupService
- JettyWebSocketClient
- JettyWebSocketServer
- JMSConnectionFactoryProvider
- JndiJmsConnectionFactoryProvider
- JsonConfigBasedBoxClientService
- JsonPathReader
- JsonRecordSetWriter
- JsonTreeReader
- Kafka3ConnectionService
- KerberosKeytabUserService
- KerberosPasswordUserService
- KerberosTicketCacheUserService
- LoggingRecordSink
- MapCacheClientService
- MapCacheServer
- MongoDBControllerService
- MongoDBLookupService
- PropertiesFileLookupService
- ProtobufReader
- ReaderLookup
- RecordSetWriterLookup
- RecordSinkServiceLookup
- RedisConnectionPoolService
- RedisDistributedMapCacheClientService
- RestLookupService
- S3FileResourceService
- ScriptedLookupService
- ScriptedReader
- ScriptedRecordSetWriter
- ScriptedRecordSink
- SetCacheClientService
- SetCacheServer
- SimpleCsvFileLookupService
- SimpleDatabaseLookupService
- SimpleKeyValueLookupService
- SimpleRedisDistributedMapCacheClientService
- SimpleScriptedLookupService
- SiteToSiteReportingRecordSink
- SlackRecordSink
- SmbjClientProviderService
- StandardAsanaClientProviderService
- StandardAzureCredentialsControllerService
- StandardDropboxCredentialService
- StandardFileResourceService
- StandardHashiCorpVaultClientService
- StandardHttpContextMap
- StandardJsonSchemaRegistry
- StandardKustoIngestService
- StandardKustoQueryService
- StandardOauth2AccessTokenProvider
- StandardPGPPrivateKeyService
- StandardPGPPublicKeyService
- StandardPrivateKeyService
- StandardProxyConfigurationService
- StandardRestrictedSSLContextService
- StandardS3EncryptionService
- StandardSSLContextService
- StandardWebClientServiceProvider
- Syslog5424Reader
- SyslogReader
- UDPEventRecordSink
- VolatileSchemaCache
- WindowsEventLogReader
- XMLFileLookupService
- XMLReader
- XMLRecordSetWriter
- YamlTreeReader
- ZendeskRecordSink
ValidateRecord 2.0.0
- Bundle
- org.apache.nifi | nifi-standard-nar
- Description
- Validates the Records of an incoming FlowFile against a given schema. All records that adhere to the schema are routed to the "valid" relationship while records that do not adhere to the schema are routed to the "invalid" relationship. It is therefore possible for a single incoming FlowFile to be split into two individual FlowFiles if some records are valid according to the schema and others are not. Any FlowFile that is routed to the "invalid" relationship will emit a ROUTE Provenance Event with the Details field populated to explain why records were invalid. In addition, to gain further explanation of why records were invalid, DEBUG-level logging can be enabled for the "org.apache.nifi.processors.standard.ValidateRecord" logger.
- Tags
- record, schema, validate
- Input Requirement
- REQUIRED
- Supports Sensitive Dynamic Properties
- false
-
Additional Details for ValidateRecord 2.0.0
ValidateRecord
Examples for the effect of Force Types From Reader’s Schema property
The processor first reads the data from the incoming FlowFile using the specified Record Reader, which uses a schema. Then, depending on the value of the Schema Access Strategy property, the processor can either use the reader’s schema, or a different schema to validate the data against. After that, the processor writes the data into the outgoing FlowFile using the specified Record Writer. If the data is valid, the validation schema is used by the writer. If the data is invalid, the writer uses the reader’s schema. The Force Types From Reader’s Schema property affects the first step: how strictly the reader’s schema should be applied when reading the data from the incoming FlowFile. By affecting how the data is read, the value of the Force Types From Reader’s Schema property also has an effect on what the output of the ValidateRecord processor is, and also whether the output is forwarded to the valid or the invalid relationship. Below are two examples where the value of this property affects the output significantly.
In both examples the input is in XML format and the output is in JSON. In the examples we assume that the same schema is used for reading, validation and writing.
Example 1
Schema:
{ "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "string" } ] }
Input:
<test> <field1> <sub_field>content</sub_field> </field1> <field2>content_of_field_2</field2> </test>
Output if Force Types From Reader’s Schema = true (forwarded to the invalid relationship):
[ { "field2": "content_of_field_2" } ]
Output if Force Types From Reader’s Schema = false (forwarded to the invalid relationship):
[ { "field1": { "sub_field": "content" }, "field2": "content_of_field_2" } ]
As you can see, the FlowFile is forwarded to the invalid relationship in both cases, since the input data does not match the provided Avro schema. However, if Force Types From Reader’s Schema = true, only those fields appear in the output that comply with the schema. If Force Types From Reader’s Schema = false, all fields appear in the output regardless of whether they comply with the schema or not.
Example 2
Schema:
{ "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "field1", "type": { "type": "array", "items": "string" } }, { "name": "field2", "type": { "type": "array", "items": "string" } } ] }
Input:
<test> <field1>content_1</field1> <field2>content_2</field2> <field2>content_3</field2> </test>
Output if Force Types From Reader’s Schema = true (forwarded to the valid relationship):
[ { "field1": [ "content_1" ], "field2": [ "content_2", "content_3" ] } ]
Output if Force Types From Reader’s Schema = false (forwarded to the invalid relationship):
[ { "field1": "content_1", "field2": [ "content_2", "content_3" ] } ]
The schema expects two fields (field1 and field2), both of type ARRAY. field1 only appears once in the input XML document. If Force Types From Reader’s Schema = true, the processor forces this field to be in a type that complies with the schema. So it is put in an array with one element. Since this type coercion can be done, the output is routed to the valid relationship. If Force Types From Reader’s Schema = false the processor does not try to apply type coercion, thus field1 appears in the output as a single value. According to the schema, the processor expects an array for field1, but receives a single element so the output is routed to the invalid relationship.
Schema compliance (and getting routed to the valid or the invalid relationship) does not depend on what Writer is used to produce the output of the ValidateRecord processor. Let us suppose that we used the same schema and input as in Example 2, but instead of JsonRecordSetWriter, we used XMLRecordSetWriter to produce the output. Both in case of Force Types From Reader’s Schema = true and Force Types From Reader’s Schema = false the output is:
<test> <field1>content_1</field1> <field2>content_2</field2> <field2>content_3</field2> </test>
However, if Force Types From Reader’s Schema = true this output is routed to the valid relationship and if * Force Types From Reader’s Schema = false* it is routed to the invalid relationship.
-
Allow Extra Fields
If the incoming data has fields that are not present in the schema, this property determines whether or not the Record is valid. If true, the Record is still valid. If false, the Record will be invalid due to the extra fields.
- Display Name
- Allow Extra Fields
- Description
- If the incoming data has fields that are not present in the schema, this property determines whether or not the Record is valid. If true, the Record is still valid. If false, the Record will be invalid due to the extra fields.
- API Name
- allow-extra-fields
- Default Value
- true
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Force Types From Reader's Schema
If enabled, the processor will coerce every field to the type specified in the Reader's schema. If the value of a field cannot be coerced to the type, the field will be skipped (will not be read from the input data), thus will not appear in the output. If not enabled, then every field will appear in the output but their types may differ from what is specified in the schema. For details please see the Additional Details page of the processor's Help. This property controls how the data is read by the specified Record Reader.
- Display Name
- Force Types From Reader's Schema
- Description
- If enabled, the processor will coerce every field to the type specified in the Reader's schema. If the value of a field cannot be coerced to the type, the field will be skipped (will not be read from the input data), thus will not appear in the output. If not enabled, then every field will appear in the output but their types may differ from what is specified in the schema. For details please see the Additional Details page of the processor's Help. This property controls how the data is read by the specified Record Reader.
- API Name
- coerce-types
- Default Value
- false
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Record Writer for Invalid Records
If specified, this Controller Service will be used to write out any records that are invalid. If not specified, the writer specified by the "Record Writer" property will be used with the schema used to read the input records. This is useful, for example, when the configured Record Writer cannot write data that does not adhere to its schema (as is the case with Avro) or when it is desirable to keep invalid records in their original format while converting valid records to another format.
- Display Name
- Record Writer for Invalid Records
- Description
- If specified, this Controller Service will be used to write out any records that are invalid. If not specified, the writer specified by the "Record Writer" property will be used with the schema used to read the input records. This is useful, for example, when the configured Record Writer cannot write data that does not adhere to its schema (as is the case with Avro) or when it is desirable to keep invalid records in their original format while converting valid records to another format.
- API Name
- invalid-record-writer
- Service Interface
- org.apache.nifi.serialization.RecordSetWriterFactory
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Maximum Validation Details Length
Specifies the maximum number of characters that validation details value can have. Any characters beyond the max will be truncated. This property is only used if 'Validation Details Attribute Name' is set
- Display Name
- Maximum Validation Details Length
- Description
- Specifies the maximum number of characters that validation details value can have. Any characters beyond the max will be truncated. This property is only used if 'Validation Details Attribute Name' is set
- API Name
- maximum-validation-details-length
- Default Value
- 1024
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
-
Record Reader
Specifies the Controller Service to use for reading incoming data
- Display Name
- Record Reader
- Description
- Specifies the Controller Service to use for reading incoming data
- API Name
- record-reader
- Service Interface
- org.apache.nifi.serialization.RecordReaderFactory
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Record Writer
Specifies the Controller Service to use for writing out the records. Regardless of the Controller Service schema access configuration, the schema that is used to validate record is used to write the valid results.
- Display Name
- Record Writer
- Description
- Specifies the Controller Service to use for writing out the records. Regardless of the Controller Service schema access configuration, the schema that is used to validate record is used to write the valid results.
- API Name
- record-writer
- Service Interface
- org.apache.nifi.serialization.RecordSetWriterFactory
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Schema Access Strategy
Specifies how to obtain the schema that should be used to validate records
- Display Name
- Schema Access Strategy
- Description
- Specifies how to obtain the schema that should be used to validate records
- API Name
- schema-access-strategy
- Default Value
- reader-schema
- Allowable Values
-
- Use Reader's Schema
- Use 'Schema Name' Property
- Use 'Schema Text' Property
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Schema Branch
Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
- Display Name
- Schema Branch
- Description
- Specifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
- API Name
- schema-branch
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Schema Name
Specifies the name of the schema to lookup in the Schema Registry property
- Display Name
- Schema Name
- Description
- Specifies the name of the schema to lookup in the Schema Registry property
- API Name
- schema-name
- Default Value
- ${schema.name}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Schema Registry
Specifies the Controller Service to use for the Schema Registry
- Display Name
- Schema Registry
- Description
- Specifies the Controller Service to use for the Schema Registry
- API Name
- schema-registry
- Service Interface
- org.apache.nifi.schemaregistry.services.SchemaRegistry
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-reference-reader, schema-name]
-
Schema Text
The text of an Avro-formatted Schema
- Display Name
- Schema Text
- Description
- The text of an Avro-formatted Schema
- API Name
- schema-text
- Default Value
- ${avro.schema}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-text-property]
-
Schema Version
Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
- Display Name
- Schema Version
- Description
- Specifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
- API Name
- schema-version
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
- Dependencies
-
- Schema Access Strategy is set to any of [schema-name]
-
Strict Type Checking
If the incoming data has a Record where a field is not of the correct type, this property determines how to handle the Record. If true, the Record will be considered invalid. If false, the Record will be considered valid and the field will be coerced into the correct type (if possible, according to the type coercion supported by the Record Writer). This property controls how the data is validated against the validation schema.
- Display Name
- Strict Type Checking
- Description
- If the incoming data has a Record where a field is not of the correct type, this property determines how to handle the Record. If true, the Record will be considered invalid. If false, the Record will be considered valid and the field will be coerced into the correct type (if possible, according to the type coercion supported by the Record Writer). This property controls how the data is validated against the validation schema.
- API Name
- strict-type-checking
- Default Value
- true
- Allowable Values
-
- true
- false
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Validation Details Attribute Name
If specified, when a validation error occurs, this attribute name will be used to leave the details. The number of characters will be limited by the property 'Maximum Validation Details Length'.
- Display Name
- Validation Details Attribute Name
- Description
- If specified, when a validation error occurs, this attribute name will be used to leave the details. The number of characters will be limited by the property 'Maximum Validation Details Length'.
- API Name
- validation-details-attribute-name
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
Name | Description |
---|---|
valid | Records that are valid according to the schema will be routed to this relationship |
invalid | Records that are not valid according to the schema will be routed to this relationship |
failure | If the records cannot be read, validated, or written, for any reason, the original FlowFile will be routed to this relationship |
Name | Description |
---|---|
mime.type | Sets the mime.type attribute to the MIME Type specified by the Record Writer |
record.count | The number of records in the FlowFile routed to a relationship |