ConsumeKinesisStream 2.0.0

Bundle
org.apache.nifi | nifi-aws-nar
Description
Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. At-least-once delivery of all Kinesis Records within the Stream while the processor is running. AWS Kinesis Client Library can take several seconds to initialise before starting to fetch data. Uses DynamoDB for check pointing and CloudWatch (optional) for metrics. Ensure that the credentials provided have access to DynamoDB and CloudWatch (optional) along with Kinesis.
Tags
amazon, aws, consume, kinesis, stream
Input Requirement
FORBIDDEN
Supports Sensitive Dynamic Properties
false
  • Additional Details for ConsumeKinesisStream 2.0.0

    ConsumeKinesisStream

    Streaming Versus Batch Processing

    ConsumeKinesisStream retrieves all Kinesis Records that it encounters in the configured Kinesis Stream. There are two common, broadly defined use cases.

    Per-Message Use Case

    By default, the Processor will create a separate FlowFile for each Kinesis Record (message) in the Stream and add attributes for shard id, sequence number, etc.

    Per-Batch Use Case

    Another common use case is the desire to process all Kinesis Records retrieved from the Stream in a batch as a single FlowFile.

    The ConsumeKinesisStream Processor can optionally be configured with a Record Reader and Record Writer. When a Record Reader and Record Writer are configured, a single FlowFile will be created that will contain a Record for each Record within the batch of Kinesis Records (messages), instead of a separate FlowFile per Kinesis Record.

    The FlowFiles emitted in this mode will include the standard record.* attributes along with the same Kinesis Shard ID, Sequence Number and Approximate Arrival Timestamp; but the values will relate to the last Kinesis Record that was processed in the batch of messages constituting the content of the FlowFile.

Properties
Dynamic Properties
System Resource Considerations
Resource Description
CPU Kinesis Client Library is used to create a Worker thread for consumption of Kinesis Records. The Worker is initialised and started when this Processor has been triggered. It runs continually, spawning Kinesis Record Processors as required to fetch Kinesis Records. The Worker Thread (and any child Record Processor threads) are not controlled by the normal NiFi scheduler as part of the Concurrent Thread pool and are not released until this processor is stopped.
NETWORK Kinesis Client Library will continually poll for new Records, requesting up to a maximum number of Records/bytes per call. This can result in sustained network usage.
Relationships
Name Description
success FlowFiles are routed to success relationship
Writes Attributes
Name Description
aws.kinesis.partition.key Partition key of the (last) Kinesis Record read from the Shard
aws.kinesis.shard.id Shard ID from which the Kinesis Record was read
aws.kinesis.sequence.number The unique identifier of the (last) Kinesis Record within its Shard
aws.kinesis.approximate.arrival.timestamp Approximate arrival timestamp of the (last) Kinesis Record read from the stream
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer (if configured)
record.count Number of records written to the FlowFiles by the Record Writer (if configured)
record.error.message This attribute provides on failure the error message encountered by the Record Reader or Record Writer (if configured)
See Also