PutKudu

Description:

Reads records from an incoming FlowFile using the provided Record Reader, and writes those records to the specified Kudu's table. The schema for the Kudu table is inferred from the schema of the Record Reader. If any error occurs while reading records from the input, or writing records to Kudu, the FlowFile will be routed to failure

Additional Details...

Tags:

put, database, NoSQL, kudu, HDFS, record

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, whether a property supports the NiFi Expression Language, and whether a property is considered "sensitive", meaning that its value will be encrypted. Before entering a value in a sensitive property, ensure that the nifi.properties file has an entry for the property nifi.sensitive.props.key.

NameDefault ValueAllowable ValuesDescription
Kudu MastersComma separated addresses of the Kudu masters to connect to.
Supports Expression Language: true (will be evaluated using variable registry only)
Table NameThe name of the Kudu Table to put data into
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Failure Strategyroute-to-failure
  • Route to Failure The FlowFile containing the Records that failed to insert will be routed to the 'failure' relationship
  • Rollback Session If any Record cannot be inserted, all FlowFiles in the session will be rolled back to their input queue. This means that if data cannot be pushed, it will block any subsequent data from be pushed to Kudu as well until the issue is resolved. However, this may be advantageous if a strict ordering is required.
If one or more Records in a batch cannot be transferred to Kudu, specifies how to handle the failure
Kerberos Credentials ServiceController Service API:
KerberosCredentialsService
Implementation: KeytabCredentialsService
Specifies the Kerberos Credentials to use for authentication
Kerberos PrincipalThe principal to use when specifying the principal and password directly in the processor for authenticating via Kerberos.
Supports Expression Language: true (will be evaluated using variable registry only)
Kerberos PasswordThe password to use when specifying the principal and password directly in the processor for authenticating via Kerberos.
Sensitive Property: true
Skip head linefalse
  • true
  • false
Deprecated. Used to ignore header lines, but this should be handled by a RecordReader (e.g. "Treat First Line as Header" property of CSVReader)
Lowercase Field NamesfalseConvert column names to lowercase when finding index of Kudu table columns
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Handle Schema DriftfalseIf set to true, when fields with names that are not in the target Kudu table are encountered, the Kudu table will be altered to include new columns for those fields.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Record ReaderController Service API:
RecordReaderFactory
Implementations: ScriptedReader
XMLReader
GrokReader
AvroReader
WindowsEventLogReader
JsonPathReader
ReaderLookup
CSVReader
Syslog5424Reader
SyslogReader
JsonTreeReader
ParquetReader
The service for reading records from incoming flow files.
Data RecordPathIf specified, this property denotes a RecordPath that will be evaluated against each incoming Record and the Record that results from evaluating the RecordPath will be sent to Kudu instead of sending the entire incoming Record. If not specified, the entire incoming Record will be published to Kudu.
Operation RecordPathIf specified, this property denotes a RecordPath that will be evaluated against each incoming Record in order to determine the Kudu Operation Type. When evaluated, the RecordPath must evaluate to one of hte valid Kudu Operation Types, or the incoming FlowFile will be routed to failure. If this property is specified, the <Kudu Operation Type> property will be ignored.
Kudu Operation TypeINSERTSpecify operationType for this processor. Valid values are: INSERT, INSERT_IGNORE, UPSERT, UPDATE, DELETE, UPDATE_IGNORE, DELETE_IGNORE. This Property will be ignored if the <Operation RecordPath> property is set.
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Flush ModeAUTO_FLUSH_BACKGROUND
  • AUTO_FLUSH_SYNC
  • AUTO_FLUSH_BACKGROUND
  • MANUAL_FLUSH
Set the new flush mode for a kudu session. AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it throws an exception. AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to the buffer. This call should normally perform only fast in-memory operations but it may have to wait when the buffer is full and there's another buffer being flushed. MANUAL_FLUSH: the call returns when the operation has been added to the buffer, else it throws a KuduException if the buffer is full.
FlowFiles per Batch1The maximum number of FlowFiles to process in a single execution, between 1 - 100000. Depending on your memory size, and data size per row set an appropriate batch size for the number of FlowFiles to process per client connection setup.Gradually increase this number, only if your FlowFiles typically contain a few records.
Supports Expression Language: true (will be evaluated using variable registry only)
Max Records per Batch100The maximum number of Records to process in a single Kudu-client batch, between 1 - 100000. Depending on your memory size, and data size per row set an appropriate batch size. Gradually increase this number to find out the best one for best performances.
Supports Expression Language: true (will be evaluated using variable registry only)
Ignore NULLfalseIgnore NULL on Kudu Put Operation, Update only non-Null columns if set true
Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)
Kudu Operation Timeout30000msDefault timeout used for user operations (using sessions and scanners)
Supports Expression Language: true (will be evaluated using variable registry only)
Kudu Keep Alive Period Timeout15000msDefault timeout used for user operations
Supports Expression Language: true (will be evaluated using variable registry only)
Kudu Client Worker Count24The maximum number of worker threads handling Kudu client read and write operations. Defaults to the number of available processors multiplied by 2.
Kudu SASL Protocol NamekuduThe SASL protocol name to use for authenticating via Kerberos. Must match the service principal name.
Supports Expression Language: true (will be evaluated using variable registry only)

Relationships:

NameDescription
successA FlowFile is routed to this relationship after it has been successfully stored in Kudu
failureA FlowFile is routed to this relationship if it cannot be sent to Kudu

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
record.countNumber of records written to Kudu

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

ResourceDescription
MEMORYAn instance of this component can cause high usage of this system resource. Multiple instances or high concurrency settings may result a degradation of performance.