Streaming Versus Batching Data

PutBigQuery is record based and is relying on the gRPC based Write API using protocol buffers. The underlying stream supports both streaming and batching approaches.

Streaming

With streaming the appended data to the stream is instantly available in BigQuery for reading. It is configurable how many records (rows) should be appended at once. Only one stream is established per flow file so at the conclusion of the FlowFile processing the used stream is closed and a new one is opened for the next FlowFile. Supports exactly once delivery semantics via stream offsets.

Batching

Similarly to the streaming approach one stream is opened for each FlowFile and records are appended to the stream. However data is not available in BigQuery until it is committed by the processor at the end of the FlowFile processing.

Improvement opportunities

The official Google Write API documentation provides additional details.