PutBigQuery 2.0.0

Bundle
org.apache.nifi | nifi-gcp-nar
Description
Writes the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets.
Tags
bigquery, bq, google, google cloud
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
  • Additional Details for PutBigQuery 2.0.0

    PutBigQuery

    Streaming Versus Batching Data

    PutBigQuery is record based and is relying on the gRPC based Write API using protocol buffers. The underlying stream supports both streaming and batching approaches.

    Streaming

    With streaming the appended data to the stream is instantly available in BigQuery for reading. It is configurable how many records (rows) should be appended at once. Only one stream is established per flow file so at the conclusion of the FlowFile processing the used stream is closed and a new one is opened for the next FlowFile. Supports exactly once delivery semantics via stream offsets.

    Batching

    Similarly to the streaming approach one stream is opened for each FlowFile and records are appended to the stream. However data is not available in BigQuery until it is committed by the processor at the end of the FlowFile processing.

    Improvement opportunities

    • The table has to exist on BigQuery side it is not created automatically
    • The Write API supports multiple streams for parallel execution and transactionality across streams. This is not utilized at the moment as this would be covered on NiFI framework level.

    The official Google Write API documentation provides additional details.

Properties
Relationships
Name Description
failure FlowFiles are routed to this relationship if the Google BigQuery operation fails.
success FlowFiles are routed to this relationship after a successful Google BigQuery operation.
Writes Attributes
Name Description
bq.records.count Number of records successfully inserted