CEFReader

Description:

Parses CEF (Common Event Format) events, returning each row as a record. This reader allows for inferring a schema based on the first event in the FlowFile or providing an explicit schema for interpreting the values.

Additional Details...

Tags:

cef, record, reader, parser

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Schema Access Strategyschema-access-strategyInfer Schema
  • Use 'Schema Name' Property The name of the Schema to use is specified by the 'Schema Name' Property. The value of this property is used to lookup the Schema in the configured Schema Registry service.
  • Use 'Schema Text' Property The text of the Schema itself is specified by the 'Schema Text' Property. The value of this property must be a valid Avro Schema. If Expression Language is used, the value of the 'Schema Text' property must be valid after substituting the expressions.
  • HWX Schema Reference Attributes The FlowFile contains 3 Attributes that will be used to lookup a Schema from the configured Schema Registry: 'schema.identifier', 'schema.version', and 'schema.protocol.version'
  • HWX Content-Encoded Schema Reference The content of the FlowFile contains a reference to a schema in the Schema Registry service. The reference is encoded as a single byte indicating the 'protocol version', followed by 8 bytes indicating the schema identifier, and finally 4 bytes indicating the schema version, as per the Hortonworks Schema Registry serializers and deserializers, found at https://github.com/hortonworks/registry
  • Confluent Content-Encoded Schema Reference The content of the FlowFile contains a reference to a schema in the Schema Registry service. The reference is encoded as a single 'Magic Byte' followed by 4 bytes representing the identifier of the schema, as outlined at http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html. This is based on version 3.2.x of the Confluent Schema Registry.
  • Infer Schema The Schema of the data will be inferred automatically when the data is read. See component Usage and Additional Details for information about how the schema is inferred.
Specifies how to obtain the schema that is to be used for interpreting the data.
Schema Registryschema-registryController Service API:
SchemaRegistry
Implementations: DatabaseTableSchemaRegistry
HortonworksSchemaRegistry
ConfluentSchemaRegistry
AvroSchemaRegistry
AmazonGlueSchemaRegistry
Specifies the Controller Service to use for the Schema Registry

This Property is only considered if the [Schema Access Strategy] Property is set to one of the following values: [Confluent Content-Encoded Schema Reference], [Use 'Schema Name' Property], [HWX Schema Reference Attributes], [HWX Content-Encoded Schema Reference]
Schema Nameschema-name${schema.name}Specifies the name of the schema to lookup in the Schema Registry property
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Versionschema-versionSpecifies the version of the schema to lookup in the Schema Registry. If not specified then the latest version of the schema will be retrieved.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Branchschema-branchSpecifies the name of the branch to use when looking up the schema in the Schema Registry property. If the chosen Schema Registry does not support branching, this value will be ignored.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Name' Property".
Schema Textschema-text${avro.schema}The text of an Avro-formatted Schema
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

This Property is only considered if the [Schema Access Strategy] Property has a value of "Use 'Schema Text' Property".
Raw Message Fieldraw-message-fieldIf set the raw message will be added to the record using the property value as field name. This is not the same as the "rawEvent" extension field!
Supports Expression Language: true (will be evaluated using variable registry only)
Invalid Fieldinvalid-message-fieldUsed when a line in the FlowFile cannot be parsed by the CEF parser. If set, instead of failing to process the FlowFile, a record is being added with one field. This record contains one field with the name specified by the property and the raw message as value.
Supports Expression Language: true (will be evaluated using variable registry only)
DateTime Localedatetime-representationen-USThe IETF BCP 47 representation of the Locale to be used when parsing date fields with long or short month names (e.g. may <en-US> vs. mai. <fr-FR>. The defaultvalue is generally safe. Only change if having issues parsing CEF messages
Supports Expression Language: true (will be evaluated using variable registry only)
Inference Strategyinference-strategyWith custom extensions inferred
  • Headers only Includes only CEF header fields into the inferred schema.
  • Headers and extensions Includes the CEF header and extension fields to the schema, but not the custom extensions.
  • With custom extensions as strings Includes all fields into the inferred schema, involving custom extension fields as string values.
  • With custom extensions inferred Includes all fields into the inferred schema, involving custom extension fields with inferred data types. The inference works based on the values in the FlowFile. In some scenarios this might result unsatisfiable behaviour. In these cases it is suggested to use "With custom extensions as strings" Inference Strategy or predefined schema.
Defines the set of fields should be included in the schema and the way the fields are being interpreted.

This Property is only considered if the [Schema Access Strategy] Property has a value of "Infer Schema".
Schema Inference Cacheschema-inference-cacheController Service API:
RecordSchemaCacheService
Implementation: VolatileSchemaCache
Specifies a Schema Cache to use when inferring the schema. If not populated, the schema will be inferred each time. However, if a cache is specified, the cache will first be consulted and if the applicable schema can be found, it will be used instead of inferring the schema.

This Property is only considered if the [Schema Access Strategy] Property has a value of "Infer Schema".
Accept empty extensionsaccept-empty-extensionsfalse
  • true
  • false
If set to true, empty extensions will be accepted and will be associated to a null value.

State management:

This component does not store state.

Restricted:

This component is not restricted.

System Resource Considerations:

None specified.