PutHBaseJSON

Description:

Adds rows to HBase based on the contents of incoming JSON documents. Each FlowFile must contain a single UTF-8 encoded JSON document, and any FlowFiles where the root element is not a single document will be routed to failure. Each JSON field name and value will become a column qualifier and value of the HBase row. Any fields with a null value will be skipped, and fields with a complex value will be handled according to the Complex Field Strategy. The row id can be specified either directly on the processor through the Row Identifier property, or can be extracted from the JSON document by specifying the Row Identifier Field Name property. This processor will hold the contents of all FlowFiles for the given batch in memory at one time.

Additional Details...

Tags:

hadoop, hbase, put, json

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
HBase Client ServiceController Service API:
HBaseClientService
Implementation: HBase_1_1_2_ClientService
Specifies the Controller Service to use for accessing HBase.
Table NameThe name of the HBase Table to put data into
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Row IdentifierSpecifies the Row ID to use when inserting data into HBase
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Row Identifier Field NameSpecifies the name of a JSON element whose value should be used as the row id for the given JSON document.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Row Identifier Encoding StrategyString
  • String Stores the value of row id as a UTF-8 String.
  • Binary Stores the value of the rows id as a binary byte array. It expects that the row id is a binary formatted string.
Specifies the data type of Row ID used when inserting data into HBase. The default behavior is to convert the row id to a UTF-8 byte array. Choosing Binary will convert a binary formatted string to the correct byte[] representation. The Binary option should be used if you are using Binary row keys in HBase
Column FamilyThe Column Family to use when inserting data into HBase
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
TimestampThe timestamp for the cells being created in HBase. This field can be left blank and HBase will use the current time.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Batch Size25The maximum number of FlowFiles to process in a single execution. The FlowFiles will be grouped by table, and a single Put per table will be performed.
Complex Field StrategyText
  • Fail Route entire FlowFile to failure if any elements contain complex values.
  • Warn Provide a warning and do not include field in row sent to HBase.
  • Ignore Silently ignore and do not include in row sent to HBase.
  • Text Use the string representation of the complex field as the value of the given column.
Indicates how to handle complex fields, i.e. fields that do not have a single text value.
Field Encoding StrategyString
  • String Stores the value of each field as a UTF-8 String.
  • Bytes Stores the value of each field as the byte representation of the type derived from the JSON.
Indicates how to store the value of each field in HBase. The default behavior is to convert each value from the JSON to a String, and store the UTF-8 bytes. Choosing Bytes will interpret the type of each field from the JSON, and convert the value to the byte representation of that type, meaning an integer will be stored as the byte representation of that integer.

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
visibility.<COLUMN FAMILY>visibility label for <COLUMN FAMILY>Visibility label for everything under that column family when a specific label for a particular column qualifier is not available.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
visibility.<COLUMN FAMILY>.<COLUMN QUALIFIER>visibility label for <COLUMN FAMILY>:<COLUMN QUALIFIER>.Visibility label for the specified column qualifier qualified by a configured column family.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)

Relationships:

NameDescription
successA FlowFile is routed to this relationship after it has been successfully stored in HBase
failureA FlowFile is routed to this relationship if it cannot be sent to HBase

Reads Attributes:

None specified.

Writes Attributes:

None specified.

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.