Description:

Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.

Tags:

sql, select, jdbc, query, database

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

NameDefault ValueAllowable ValuesDescription
Database Connection Pooling ServiceController Service API:
DBCPService
Implementation:
DBCPConnectionPool
The Controller Service that is used to obtain a connection to the database.
Database TypeGeneric
  • Generic
  • Oracle
The type/flavor of database, used for generating database-specific code. In many cases the Generic type should suffice, but some databases (such as Oracle) require custom SQL clauses.
Table NameThe name of the database table to be queried.
Columns to ReturnA comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned.
Maximum-value ColumnsA comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact.
Max Wait Time0 secondsThe maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero.
Fetch Size0The number of result rows to be fetched from the result set at a time. This is a hint to the driver and may not be honored and/or exact. If the value specified is zero, then the hint is ignored.
Max Rows Per Flow File0The maximum number of result rows that will be included in a single FlowFile. This will allow you to break up very large result sets into multiple FlowFiles. If the value specified is zero, then all rows are returned in a single FlowFile.
Normalize Table/Column Namesfalse
  • true
  • false
Whether to change non-Avro-compatible characters in column names to Avro-compatible characters. For example, colons and periods will be changed to underscores in order to build a valid Avro record.

Dynamic Properties:

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
Initial Max ValueAttribute Expression LanguageSpecifies an initial max value for max value columns. Properties should be added in the format `initial.maxvalue.{max_value_column}`.

Relationships:

NameDescription
successSuccessfully created FlowFile from SQL query result set.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
querydbtable.row.countNot Specified
fragment.identifierIf 'Max Rows Per Flow File' is set then all FlowFiles from the same query result set will have the same value for the fragment.identifier attribute. This can then be used to correlate the results.
fragment.countIf 'Max Rows Per Flow File' is set then this is the total number of FlowFiles produced by a single ResultSet. This can be used in conjunction with the fragment.identifier attribute in order to know how many FlowFiles belonged to the same incoming ResultSet.
fragment.indexIf 'Max Rows Per Flow File' is set then the position of this FlowFile in the list of outgoing FlowFiles that were all derived from the same result set FlowFile. This can be used in conjunction with the fragment.identifier attribute to know which FlowFiles originated from the same query result set and in what order FlowFiles were produced
maxvalue.*Each attribute contains the observed maximum value of a specified 'Maximum-value Column'. The suffix of the attribute is the name of the column

State management:

ScopeDescription
CLUSTERAfter performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation

Restricted: