Description

This Processor puts the contents of a FlowFile to a Topic in Apache Kafka using KafkaProducer API available with Kafka 2.6 API. The contents of the incoming FlowFile will be read using the configured Record Reader. Each record will then be serialized using the configured Record Writer, and this serialized form will be the content of a Kafka message. This message is optionally assigned a key by using the <Kafka Key> Property.

Security Configuration

The Security Protocol property allows the user to specify the protocol for communicating with the Kafka broker. The following sections describe each of the protocols in further detail.

PLAINTEXT

This option provides an unsecured connection to the broker, with no client authentication and no encryption. In order to use this option the broker must be configured with a listener of the form:

    PLAINTEXT://host.name:port
            

SSL

This option provides an encrypted connection to the broker, with optional client authentication. In order to use this option the broker must be configured with a listener of the form:

    SSL://host.name:port
            
In addition, the processor must have an SSL Context Service selected.

If the broker specifies ssl.client.auth=none, or does not specify ssl.client.auth, then the client will not be required to present a certificate. In this case, the SSL Context Service selected may specify only a truststore containing the public key of the certificate authority used to sign the broker's key.

If the broker specifies ssl.client.auth=required then the client will be required to present a certificate. In this case, the SSL Context Service must also specify a keystore containing a client key, in addition to a truststore as described above.

SASL_PLAINTEXT

This option uses SASL with a PLAINTEXT transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

    SASL_PLAINTEXT://host.name:port
            
In addition, the Kerberos Service Name must be specified in the processor.

SASL_PLAINTEXT - GSSAPI

If the SASL mechanism is GSSAPI, then the client must provide a JAAS configuration to authenticate. The JAAS configuration can be provided by specifying the java.security.auth.login.config system property in NiFi's bootstrap.conf, such as:

    java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
            

An example of the JAAS config file would be the following:

    KafkaClient {
        com.sun.security.auth.module.Krb5LoginModule required
        useKeyTab=true
        storeKey=true
        keyTab="/path/to/nifi.keytab"
        serviceName="kafka"
        principal="nifi@YOURREALM.COM";
    };
            
NOTE: The serviceName in the JAAS file must match the Kerberos Service Name in the processor.

Alternatively, the JAAS configuration when using GSSAPI can be provided by specifying the Kerberos Principal and Kerberos Keytab directly in the processor properties. This will dynamically create a JAAS configuration like above, and will take precedence over the java.security.auth.login.config system property.

SASL_PLAINTEXT - PLAIN

If the SASL mechanism is PLAIN, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka's PlainLoginModule. An example of the JAAS config file would be the following:

    KafkaClient {
      org.apache.kafka.common.security.plain.PlainLoginModule required
      username="nifi"
      password="nifi-password";
    };
            
The JAAS configuration can be provided by either of below ways
  1. specify the java.security.auth.login.config system property in NiFi's bootstrap.conf. This limits you to use only one user credential across the cluster.
  2.                 java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
                
  3. add user attribute 'sasl.jaas.config' in the processor configurations. This method allows one to have multiple consumers with different user credentials or gives flexibility to consume from multiple kafka clusters.
  4.                 sasl.jaas.config : org.apache.kafka.common.security.plain.PlainLoginModule required
                                            username="nifi"
                                            password="nifi-password";
                
    NOTE: The dynamic properties of this processor are not secured and as a result the password entered when utilizing sasl.jaas.config will be stored in the flow.xml.gz file in plain-text, and will be saved to NiFi Registry if using versioned flows.

NOTE: It is not recommended to use a SASL mechanism of PLAIN with SASL_PLAINTEXT, as it would transmit the username and password unencrypted.

NOTE: The Kerberos Service Name is not required for SASL mechanism of PLAIN. However, processor warns saying this attribute has to be filled with non empty string. You can choose to fill any random string, such as "null".

NOTE: Using the PlainLoginModule will cause it be registered in the JVM's static list of Providers, making it visible to components in other NARs that may access the providers. There is currently a known issue where Kafka processors using the PlainLoginModule will cause HDFS processors with Keberos to no longer work.

SASL_PLAINTEXT - SCRAM

If the SASL mechanism is SSL, then client must provide a JAAS configuration to authenticate, but the JAAS configuration must use Kafka's ScramLoginModule. Ensure that you add user defined attribute 'sasl.mechanism' and assign 'SCRAM-SHA-256' or 'SCRAM-SHA-512' based on kafka broker configurations. An example of the JAAS config file would be the following:

    KafkaClient {
      org.apache.kafka.common.security.scram.ScramLoginModule
      username="nifi"
      password="nifi-password";
    };
            
The JAAS configuration can be provided by either of below ways
  1. specify the java.security.auth.login.config system property in NiFi's bootstrap.conf. This limits you to use only one user credential across the cluster.
  2.                 java.arg.16=-Djava.security.auth.login.config=/path/to/kafka_client_jaas.conf
                
  3. add user attribute 'sasl.jaas.config' in the processor configurations. This method allows one to have multiple consumers with different user credentials or gives flexibility to consume from multiple kafka clusters.
  4.                 sasl.jaas.config : org.apache.kafka.common.security.scram.ScramLoginModule required
                                            username="nifi"
                                            password="nifi-password";
                
    NOTE: The dynamic properties of this processor are not secured and as a result the password entered when utilizing sasl.jaas.config will be stored in the flow.xml.gz file in plain-text, and will be saved to NiFi Registry if using versioned flows.
NOTE: The Kerberos Service Name is not required for SASL mechanism of SCRAM-SHA-256 or SCRAM-SHA-512. However, processor warns saying this attribute has to be filled with non empty string. You can choose to fill any random string, such as "null".

SASL_SSL

This option uses SASL with an SSL/TLS transport layer to authenticate to the broker. In order to use this option the broker must be configured with a listener of the form:

    SASL_SSL://host.name:port
            

See the SASL_PLAINTEXT section for a description of how to provide the proper JAAS configuration depending on the SASL mechanism (GSSAPI or PLAIN).

See the SSL section for a description of how to configure the SSL Context Service based on the ssl.client.auth property.

Publish Strategy

This processor includes optional properties that control how a Kafka Record's key and headers are determined:

'Publish Strategy' controls the mode used to convert the FlowFile record into a Kafka record.

If Publish Strategy is set to 'Use Wrapper', two additional processor configuration properties are made available: 'Record Key Writer' and 'Record Metadata Strategy'.

The 'Record Key Writer' property determines the Record Writer that should be used to serialize the Kafka record's key. This may be used to emit the key as JSON, Avro, XML, or some other data format. If this property is not set, and the NiFi Record indicates that the key itself is a Record, the FlowFile will be routed to the 'failure' relationship. If this property is not set and the NiFi Record has a Byte Array or a String (encoded in UTF-8 format), the Kafka record's key will still be set accordingly.

The 'Record Metadata Strategy' specifies whether the Kafka Topic and partition should come from the configured 'Topic Name' property and 'Partition' / 'Partitioner class' properties, or if they should come from the Record's optional metadata field. If the value is set to 'Metadata From Record', the incoming FlowFile record is expected to have a field named 'metadata'. That field is expected to be a Record with a 'topic' and a 'partition' field. If these fields are missing or invalid, the processor's 'Topic Name' and 'Partition' / 'Partitioner class' properties will still be used.

Using the metadata field to convey the topic and partition has two advantages. Firstly, it pairs well with the ConsumeKafkaRecord_* processor, which produces this same schema. This means that if data is consumed from one topic and pushed to another topic (or Kafka cluster), the data can be easily pinned to the same partition and topic name. If the data should be pushed to a different topic, it can be easily updated using an UpdateRecord processor, for instance.

Additionally, because a single FlowFile can be sent as a single Kafka transaction, this allows sending records to multiple Kafka topics in a single transaction.

Examples

The below examples illustrate what will be sent to Kafka, given different configurations and FlowFile contents. These examples all assume that JsonRecordSetWriter and JsonTreeReader will be used for the Record Readers and Writers.

Publish Strategy = 'Use Content as Record Value'

Given the processor configuration:

Processor Property Configured Value
Message Key Field account
Attributes to Send as Headers (Regex) attribute.*

And a FlowFile with the content:

                {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
            

And attributes:

Attribute Name Attribute Value
attributeA valueA
attributeB valueB
otherAttribute otherValue

The record that is produced to Kafka will have the following characteristics:

Record Key {"name":"Acme","number":"AC1234"}
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
attributeA valueA
attributeB valueB

Publish Strategy = 'Use Wrapper'

When the Publish Strategy is configured to 'Use Wrapper', each FlowFile Record is expected to adhere to a specific schema. The Record must have three fields: key, value, and headers. There is a fourth, optional field named metadata. The key may be a String, a byte array, or a Record. The value can be any Record. The headers is a Map where the values are Strings. The metadata field is a Record that has two fields of interest: topic and partition. If these fields are specified, they will take precedence over the configured 'Topic Name' and 'Partition' and 'Partitioner class' processor properties.

Example 1 - Key as String

Given a FlowFile with the content:

{
    "key": "Acme Holdings",
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "accountType": "enterprise",
        "test": "true"
    }
}
            

The record that is produced to Kafka will have the following characteristics:

Record Key Acme Holdings
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
accountType enterprise
test true

Note that in this case, the headers and key come directly from the Record, not from FlowFile attributes. If there is a desire to include some FlowFile attributes in the headers, this should be accomplished by using a Processor upstream in order to inject those values into the headers field. For example, an UpdateRecord processor could be used to easily add new fields to the headers Map.

Example 2 - Key as Record

Additionally, we may choose to use a more complex value for the record key. The key itself may be a record. This is sometimes used to write the record key either as JSON or as Avro. In this example, we assume that the 'Record Key Writer' property is set to a JsonRecordSetWriter.

Given a FlowFile with the content:

{
    "key": {
        "accountName": "Acme Holdings",
        "accountHolder": "John Doe",
        "accountId": "280182830-A009"
    },
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    }
}
            

The record that is produced to Kafka will have the following characteristics:

Record Key {"accountName":"Acme Holdings","accountHolder":"John Doe","accountId":"280182830-A009"}
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers

Note here that the Record Key is JSON, as the 'Record Key Writer' property is configured to write JSON. it could just as easily be Avro.

Also note that if the 'Record Key Writer' had not been set, the FlowFile would have been routed to the 'failure' relationship because the key is a Record.

Finally, note here that the headers field is missing. This is acceptable and no headers will be added to the Kafka record.

Example 3 - Key as Byte Array

We can also have a Record whose key field is an array of bytes. In this case, the 'Record Key Writer' property is not used.

Given a FlowFile with the content:

{
    "key": [65, 27, 10, 20, 11, 57, 88, 19, 65],
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "otherField": {
        "a": "b"
    }
}
            

The record that is produced to Kafka will have the following characteristics:

Record Key 0x411b0a140b39581341
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers

In this case, the byte array that is specified for the key is provided to the Kafka Record as a byte array without changes (in the table, it is simply represented as Hex).

Finally, note here that the headers field is missing and an extraneous field, otherField is present. This is acceptable and no headers will be added to the Kafka record. The otherField is simply ignored.

Example 4 - No Key

We can also have a Record whose key field is null or missing. In this case, the 'Record Key Writer' property is not used.

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b",
        "c": {
            "d": "e"
        }
    }
}
            

The record that is produced to Kafka will have the following characteristics:

Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b
c MapRecord[{d=e}]

In this case, the key is not present, so the Kafka record that is produced has no key associated with it.

Note also that the headers field has the expected value for the a header but the c header has an expected value of MapRecord[{d=e}]. This is because the headers field is expected always to be a Map with String values. By providing a Record for the c element, we have violated the contract. NiFi attempts to compensate for this by creating a String representation of the Record, even if it is unlikely to be the representation that the user expects.

Example 5 - Topic provided in Record

If the Metadata field is provided in the FlowFile's Record, it will be used to determine the Topic and the Partition that the Records are written to.

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b"
    },
    "metadata": {
        "topic": "topic1"
    }
}
            

And considering that the processor properties are configured as:

Property Name Property Value
Topic Name My Topic
Partition 2
Record Metadata Strategy Metadata From Record

The record that is produced to Kafka will have the following characteristics:

Kafka Topic topic1
Topic Partition 2
Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b

Note that the topic name comes directly from the FlowFile record, and the configured topic name ("My Topic") is ignored. However, if either the "metadata" field or its "topic" sub-field were missing, the configured topic name ("My Topic") would be used.

Example 6 - Partition provided in Record

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b"
    },
    "metadata": {
        "partition": 6
    }
}
            

And considering that the processor properties are configured as:

Property Name Property Value
Topic Name My Topic
Partition 2
Record Metadata Strategy Metadata From Record

The record that is produced to Kafka will have the following characteristics:

Kafka Topic My Topic
Topic Partition 6
Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b
Example 7 - Topic and Partition provided in Record

If the Metadata field is provided in the FlowFile's Record, it will be used to determine the Topic and the Partition that the Records are written to.

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b"
    },
    "metadata": {
        "topic": "topic1",
        "partition": 0
    }
}
            

And considering that the processor properties are configured as:

Property Name Property Value
Topic Name My Topic
Partition 2
Record Metadata Strategy Metadata From Record

The record that is produced to Kafka will have the following characteristics:

Kafka Topic topic1
Topic Partition 0
Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b

In this case, both the topic name and the partition are explicitly defined within the incoming Record, and those will be used.

Example 8 - Invalid metadata provided in Record

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b"
    },
    "metadata": "hello"
}
            

And considering that the processor properties are configured as:

Property Name Property Value
Topic Name My Topic
Partition 2
Record Metadata Strategy Metadata From Record

The record that is produced to Kafka will have the following characteristics:

Kafka Topic My Topic
Topic Partition 2
Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b

In this case, the "metadata" field in the Record is ignored because it is not itself a Record.

Example 9 - Use Configured Values for Metadata

Given a FlowFile with the content:

{
    "value": {
        "address": "1234 First Street",
        "zip": "12345",
        "account": {
            "name": "Acme",
            "number":"AC1234"
        }
    },
    "headers": {
        "a": "b"
    },
    "metadata": {
        "topic": "topic1",
        "partition": 6
    }
}
            

And considering that the processor properties are configured as:

Property Name Property Value
Topic Name My Topic
Partition 2
Record Metadata Strategy Use Configured Values

The record that is produced to Kafka will have the following characteristics:

Kafka Topic My Topic
Topic Partition 2
Record Key
Record Value {"address":"1234 First Street","zip":"12345","account":{"name":"Acme","number":"AC1234"}}
Record Headers
Header Name Header Value
a b

In this case, the "metadata" field specifies both the topic and the partition. However, it is ignored in favor of the processor properties 'Topic' and 'Partition' because the property 'Record Metadata Strategy' is set to 'Use Configured Values'.