PutDynamoDBRecord

PutDynamoDBRecord 2.4.0

Bundle: org.apache.nifi | nifi-aws-nar
Description: Inserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the "unprocessed" relationship with the necessary attribute to retry later without duplicating the already executed inserts.
Tags: AWS, Amazon, DynamoDB, Insert, Put, Record
Input Requirement: REQUIRED
Supports Sensitive Dynamic Properties: false

Additional Details for PutDynamoDBRecord 2.4.0
PutDynamoDBRecord

Description

PutDynamoDBRecord intends to provide the capability to insert multiple Items into a DynamoDB table from a record-oriented FlowFile. Compared to the PutDynamoDB, this processor is capable to process data based other than JSON format too and prepared to add multiple fields for a given Item. Also, PutDynamoDBRecord is designed to insert bigger batches of data into the database.

Data types

The list data types supported by DynamoDB does not fully overlap with the capabilities of the Record data structure. Some conversions and simplifications are necessary during inserting the data. These are:
- Numeric values are stored using a floating-point data structure within Items. In some cases this representation might cause issues with the accuracy.
- Char is not a supported type within DynamoDB, these fields are converted into String values.
- Enum types are stored as String fields, using the name of the given enum.
- DynamoDB stores time and date related information as Strings.
- Internal record structures are converted into maps.
- Choice is not a supported data type, regardless of the actual wrapped data type, values enveloped in Choice are handled as Strings.
- Unknown data types are handled as stings.
Limitations

Working with DynamoDB when batch inserting comes with two inherit limitations. First, the number of inserted Items is limited to 25 in any case. In order to overcome this, during one execution, depending on the number of records in the incoming FlowFile, PutDynamoDBRecord might attempt multiple insert calls towards the database server. Using this approach, the flow does not have to work with this limitation in most cases.

Having multiple external actions comes with the risk of having an unforeseen result at one of the steps. For example when the incoming FlowFile is consists of 70 records, it will be split into 3 chunks, with a single insert operation for every chunk. The first two chunks contains 25 Items to insert per chunk, and the third contains the remaining 20. In some cases it might occur that the first two insert operation succeeds but the third one fails. In these cases we consider the FlowFile “partially processed” and we will transfer it to the “failure” or “unprocessed” Relationship according to the nature of the issue. In order to keep the information about the successfully processed chunks the processor assigns the “dynamodb.chunks.processed” attribute to the FlowFile, which has the number of successfully processed chunks as value.

The most common reason for this behaviour comes from the other limitation the inserts have with DynamoDB: the database has a build in supervision over the amount of inserted data. When a client reaches the “throughput limit”, the server refuses to process the insert request until a certain amount of time. More information here. From the perspective of the PutDynamoDBRecord we consider these cases as temporary issues and the FlowFile will be transferred to the “unprocessed” Relationship after which the processor will yield in order to avoid further throughput issues. (Other kinds of failures will result transfer to the “failure” Relationship)

Retry

It is suggested to loop back the “unprocessed” Relationship to the PutDynamoDBRecord in some way. FlowFiles transferred to that relationship considered as healthy ones might be successfully processed in a later point. It is possible that the FlowFile contains such a high number of records, what needs more than two attempts to fully insert. The attribute “dynamodb.chunks.processed” is “rolled” through the attempts, which means, after each trigger it will contain the sum number of inserted chunks making it possible for the later attempts to continue from the right point without duplicated inserts.

Partition and sort keys

The processor supports multiple strategies for assigning partition key and sort key to the inserted Items. These are:

Partition Key Strategies

Partition By Field

The processors assign one of the record fields as partition key. The name of the record field is specified by the " Partition Key Field" property and the value will be the value of the record field with the same name.

Partition By Attribute

The processor assigns the value of a FlowFile attribute as partition key. With this strategy all the Items within a FlowFile will share the same partition key value, and it is suggested to use for tables also having a sort key in order to meet the primary key requirements of the DynamoDB. The property “Partition Key Field” defines the name of the Item field and the property “Partition Key Attribute” will specify which attribute’s value will be assigned to the partition key. With this strategy the “Partition Key Field” must be different from the fields consisted by the incoming records.

Generated UUID

By using this strategy the processor will generate a UUID identifier for every single Item. This identifier will be used as value for the partition key. The name of the field used as partition key is defined by the property “Partition Key Field”. With this strategy the “Partition Key Field” must be different from the fields consisted by the incoming records. When using this strategy, the partition key in the DynamoDB table must have String data type.

Sort Key Strategies

None

No sort key will be assigned to the Item. In case of the table definition expects it, using this strategy will result unsuccessful inserts.

Sort By Field

The processors assign one of the record fields as sort key. The name of the record field is specified by the “Sort Key Field” property and the value will be the value of the record field with the same name. With this strategy the “Sort Key Field” must be different from the fields consisted by the incoming records.

Generate Sequence

The processor assigns a generated value to every Item based on the original record’s position in the incoming FlowFile ( regardless of the chunks). The first Item will have the sort key 1, the second will have sort key 2 and so on. The generated keys are unique within a given FlowFile. The name of the record field is specified by the “Sort Key Field” attribute. With this strategy the “Sort Key Field” must be different from the fields consisted by the incoming records. When using this strategy, the sort key in the DynamoDB table must have Number data type.

Examples

Using fields as partition and sort key

Setup
- Partition Key Strategy: Partition By Field
- Partition Key Field: class
- Sort Key Strategy: Sort By Field
- Sort Key Field: size
Note: both fields have to exist in the incoming records!

Result

Using this pair of strategies will result Items identical to the incoming record (not counting the representational changes from the conversion). The field specified by the properties are added to the Items normally with the only difference of flagged as (primary) key items.

Input
```
[
  {
    "type": "A",
    "subtype": 4,
    "class": "t",
    "size": 1
  }
]
```
Output (stylized)
- type: String field with value “A”
- subtype: Number field with value 4
- class: String field with value “t” and serving as partition key
- size: Number field with value 1 and serving as sort key
Using FlowFile filename as partition key with generated sort key

Setup
- Partition Key Strategy: Partition By Attribute
- Partition Key Field: source
- Partition Key Attribute: filename
- Sort Key Strategy: Generate Sequence
- Sort Key Field: sort
Result

The FlowFile’s filename attribute will be used as partition key. In this case all the records within the same FlowFile will share the same partition key. In order to avoid collusion, if FlowFiles contain multiple records, using sort key is suggested. In this case a generated sequence is used which is guaranteed to be unique within a given FlowFile.

Input
```
[
  {
    "type": "A",
    "subtype": 4,
    "class": "t",
    "size": 1
  },
  {
    "type": "B",
    "subtype": 5,
    "class": "m",
    "size": 2
  }
]
```
Output (stylized)

First Item
- source: String field with value “data46362.json” and serving as partition key
- type: String field with value “A”
- subtype: Number field with value 4
- class: String field with value “t”
- size: Number field with value 1
- sort: Number field with value 1 and serving as sort key
Second Item
- source: String field with value “data46362.json” and serving as partition key
- type: String field with value “B”
- subtype: Number field with value 5
- class: String field with value “m”
- size: Number field with value 2
- sort: Number field with value 2 and serving as sort key
Using generated partition key

Setup
- Partition Key Strategy: Generated UUID
- Partition Key Field: identifier
- Sort Key Strategy: None
Result

A generated UUID will be used as partition key. A different UUID will be generated for every Item.

Input
```
[
  {
    "type": "A",
    "subtype": 4,
    "class": "t",
    "size": 1
  }
]
```
Output (stylized)
- identifier: String field with value “872ab776-ed73-4d37-a04a-807f0297e06e” and serving as partition key
- type: String field with value “A”
- subtype: Number field with value 4
- class: String field with value “t”
- size: Number field with value 1

Properties

AWS Credentials Provider Service
The Controller Service that is used to obtain AWS credentials provider

Display Name

AWS Credentials Provider Service

Description

The Controller Service that is used to obtain AWS credentials provider

API Name

AWS Credentials Provider service

Service Interface

org.apache.nifi.processors.aws.credentials.provider.service.AWSCredentialsProviderService

Service Implementations

org.apache.nifi.processors.aws.credentials.provider.service.AWSCredentialsProviderControllerService

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Communications Timeout

Display Name

Communications Timeout

Description

API Name

Communications Timeout

Default Value

30 secs

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Endpoint Override URL
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.

Display Name

Endpoint Override URL

Description

Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.

API Name

Endpoint Override URL

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

false
Partition Key Attribute
Specifies the FlowFile attribute that will be used as the value of the partition key when using "Partition by attribute" partition key strategy.
Display Name

Partition Key Attribute

Description

Specifies the FlowFile attribute that will be used as the value of the partition key when using "Partition by attribute" partition key strategy.

API Name

partition-key-attribute

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true

Dependencies
- Partition Key Strategy is set to any of [ByAttribute]
Partition Key Field
Defines the name of the partition key field in the DynamoDB table. Partition key is also known as hash key. Depending on the "Partition Key Strategy" the field value might come from the incoming Record or a generated one.

Display Name

Partition Key Field

Description

Defines the name of the partition key field in the DynamoDB table. Partition key is also known as hash key. Depending on the "Partition Key Strategy" the field value might come from the incoming Record or a generated one.

API Name

partition-key-field

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true
Partition Key Strategy
Defines the strategy the processor uses to assign partition key value to the inserted Items.
Display Name

Partition Key Strategy

Description

Defines the strategy the processor uses to assign partition key value to the inserted Items.

API Name

partition-key-strategy

Default Value

ByField

Allowable Values
- Partition By Field
- Partition By Attribute
- Generated UUID
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Proxy Configuration Service
Specifies the Proxy Configuration Controller Service to proxy network requests.

Display Name

Proxy Configuration Service

Description

Specifies the Proxy Configuration Controller Service to proxy network requests.

API Name

proxy-configuration-service

Service Interface

org.apache.nifi.proxy.ProxyConfigurationService

Service Implementations

org.apache.nifi.proxy.StandardProxyConfigurationService

Expression Language Scope

Not Supported

Sensitive

false

Required

false
Record Reader
Specifies the Controller Service to use for parsing incoming data and determining the data's schema.

Display Name

Record Reader

Description

Specifies the Controller Service to use for parsing incoming data and determining the data's schema.

API Name

record-reader

Service Interface

org.apache.nifi.serialization.RecordReaderFactory

Service Implementations

org.apache.nifi.avro.AvroReader

org.apache.nifi.cef.CEFReader

org.apache.nifi.csv.CSVReader

org.apache.nifi.excel.ExcelReader

org.apache.nifi.grok.GrokReader

org.apache.nifi.json.JsonPathReader

org.apache.nifi.json.JsonTreeReader

org.apache.nifi.services.protobuf.ProtobufReader

org.apache.nifi.lookup.ReaderLookup

org.apache.nifi.record.script.ScriptedReader

org.apache.nifi.syslog.Syslog5424Reader

org.apache.nifi.syslog.SyslogReader

org.apache.nifi.windowsevent.WindowsEventLogReader

org.apache.nifi.xml.XMLReader

org.apache.nifi.yaml.YamlTreeReader

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Region
Display Name

Region

Description

API Name

Region

Default Value

us-west-2

Allowable Values
- AWS GovCloud (US-East)
- AWS GovCloud (US-West)
- Africa (Cape Town)
- Asia Pacific (Hong Kong)
- Asia Pacific (Hyderabad)
- Asia Pacific (Jakarta)
- Asia Pacific (Malaysia)
- Asia Pacific (Melbourne)
- Asia Pacific (Mumbai)
- Asia Pacific (Osaka)
- Asia Pacific (Seoul)
- Asia Pacific (Singapore)
- Asia Pacific (Sydney)
- Asia Pacific (Thailand)
- Asia Pacific (Tokyo)
- Canada (Central)
- Canada West (Calgary)
- China (Beijing)
- China (Ningxia)
- EU (Germany)
- EU ISOE West
- Europe (Frankfurt)
- Europe (Ireland)
- Europe (London)
- Europe (Milan)
- Europe (Paris)
- Europe (Spain)
- Europe (Stockholm)
- Europe (Zurich)
- Israel (Tel Aviv)
- Mexico (Central)
- Middle East (Bahrain)
- Middle East (UAE)
- South America (Sao Paulo)
- US East (N. Virginia)
- US East (Ohio)
- US ISO East
- US ISO WEST
- US ISOB East (Ohio)
- US ISOF EAST
- US ISOF SOUTH
- US West (N. California)
- US West (Oregon)
- aws-cn-global
- aws-global
- aws-iso-b-global
- aws-iso-global
- aws-us-gov-global
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Sort Key Field
Defines the name of the sort key field in the DynamoDB table. Sort key is also known as range key.
Display Name

Sort Key Field

Description

Defines the name of the sort key field in the DynamoDB table. Sort key is also known as range key.

API Name

sort-key-field

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true

Dependencies
- Sort Key Strategy is set to any of [ByField, BySequence]
Sort Key Strategy
Defines the strategy the processor uses to assign sort key to the inserted Items.
Display Name

Sort Key Strategy

Description

Defines the strategy the processor uses to assign sort key to the inserted Items.

API Name

sort-key-strategy

Default Value

None

Allowable Values
- None
- Sort By Field
- Generate Sequence
Expression Language Scope

Not Supported

Sensitive

false

Required

true
SSL Context Service
Specifies an optional SSL Context Service that, if provided, will be used to create connections

Display Name

SSL Context Service

Description

Specifies an optional SSL Context Service that, if provided, will be used to create connections

API Name

SSL Context Service

Service Interface

org.apache.nifi.ssl.SSLContextProvider

Service Implementations

org.apache.nifi.ssl.PEMEncodedSSLContextProvider

org.apache.nifi.ssl.StandardRestrictedSSLContextService

org.apache.nifi.ssl.StandardSSLContextService

Expression Language Scope

Not Supported

Sensitive

false

Required

false
Table Name
The DynamoDB table name

Display Name

Table Name

Description

The DynamoDB table name

API Name

Table Name

Expression Language Scope

Environment variables defined at JVM level and system properties

Sensitive

false

Required

true

System Resource Considerations

Resource	Description
MEMORY	An instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.
NETWORK	An instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.

Relationships

Name	Description
failure	FlowFiles are routed to failure relationship
success	FlowFiles are routed to success relationship
unprocessed	FlowFiles are routed to unprocessed relationship when DynamoDB is not able to process all the items in the request. Typical reasons are insufficient table throughput capacity and exceeding the maximum bytes per request. Unprocessed FlowFiles can be retried with a new request.

Reads Attributes

Name	Description
dynamodb.chunks.processed	Number of chunks successfully inserted into DynamoDB. If not set, it is considered as 0

Writes Attributes

Name	Description
dynamodb.chunks.processed	Number of chunks successfully inserted into DynamoDB. If not set, it is considered as 0
dynamodb.key.error.unprocessed	DynamoDB unprocessed keys
dynmodb.range.key.value.error	DynamoDB range key error
dynamodb.key.error.not.found	DynamoDB key not found
dynamodb.error.exception.message	DynamoDB exception message
dynamodb.error.code	DynamoDB error code
dynamodb.error.message	DynamoDB error message
dynamodb.error.service	DynamoDB error service
dynamodb.error.retryable	DynamoDB error is retryable
dynamodb.error.request.id	DynamoDB error request id
dynamodb.error.status.code	DynamoDB error status code
dynamodb.item.io.error	IO exception message on creating item

PutDynamoDBRecord 2.4.0

PutDynamoDBRecord

Description

Data types

Limitations

Retry

Partition and sort keys

Partition Key Strategies

Partition By Field

Partition By Attribute

Generated UUID

Sort Key Strategies

None

Sort By Field

Generate Sequence

Examples

Using fields as partition and sort key

Setup

Result

Input

Output (stylized)

Using FlowFile filename as partition key with generated sort key

Setup

Result

Input

Output (stylized)

First Item

Second Item

Using generated partition key

Setup

Result

Input

Output (stylized)