GenerateTableFetch

Description:

Generates SQL select queries that fetch "pages" of rows from a table. The partition size property, along with the table's row count, determine the size and number of pages and generated FlowFiles. In addition, incremental fetching can be achieved by setting Maximum-Value Columns, which causes the processor to track the columns' maximum values, thus only fetching rows whose columns' values exceed the observed maximums. This processor is intended to be run on the Primary Node only. This processor can accept incoming connections; the behavior of the processor is different whether incoming connections are provided: - If no incoming connection(s) are specified, the processor will generate SQL queries on the specified processor schedule. Expression Language is supported for many fields, but no flow file attributes are available. However the properties will be evaluated using the Variable Registry. - If incoming connection(s) are specified and no flow file is available to a processor task, no work will be performed. - If incoming connection(s) are specified and a flow file is available to a processor task, the flow file's attributes may be used in Expression Language for such fields as Table Name and others. However, the Max-Value Columns and Columns to Return fields must be empty or refer to columns that are available in each specified table.

Tags:

sql, select, jdbc, query, database, fetch, generate

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

NameDefault ValueAllowable ValuesDescription
Database Connection Pooling ServiceController Service API:
DBCPService
Implementations: HiveConnectionPool
DBCPConnectionPool
The Controller Service that is used to obtain a connection to the database.
Database TypeGeneric
  • Generic Generates ANSI SQL
  • Oracle Generates Oracle compliant SQL
  • MS SQL 2012+ Generates MS SQL Compatible SQL, for version 2012 or greater
  • MS SQL 2008 Generates MS SQL Compatible SQL for version 2008
The type/flavor of database, used for generating database-specific code. In many cases the Generic type should suffice, but some databases (such as Oracle) require custom SQL clauses.
Table NameThe name of the database table to be queried.
Supports Expression Language: true
Columns to ReturnA comma-separated list of column names to be used in the query. If your database requires special treatment of the names (quoting, e.g.), each name should include such treatment. If no column names are supplied, all columns in the specified table will be returned. NOTE: It is important to use consistent column names for a given table for incremental fetch to work properly.
Supports Expression Language: true
Maximum-value ColumnsA comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running. Using multiple columns implies an order to the column list, and each column's values are expected to increase more slowly than the previous columns' values. Thus, using multiple columns implies a hierarchical structure of columns, which is usually used for partitioning tables. This processor can be used to retrieve only those rows that have been added/updated since the last retrieval. Note that some JDBC types such as bit/boolean are not conducive to maintaining maximum value, so columns of these types should not be listed in this property, and will result in error(s) during processing. If no columns are provided, all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly.
Supports Expression Language: true
Max Wait Time0 secondsThe maximum amount of time allowed for a running SQL select query , zero means there is no limit. Max time less than 1 second will be equal to zero.
Supports Expression Language: true
Partition Size10000The number of result rows to be fetched by each generated SQL statement. The total number of rows in the table divided by the partition size gives the number of SQL statements (i.e. FlowFiles) generated. A value of zero indicates that a single FlowFile is to be generated whose SQL statement will fetch all rows in the table.
Supports Expression Language: true

Relationships:

NameDescription
successSuccessfully created FlowFile from SQL query result set.
failureThis relationship is only used when SQL query execution (using an incoming FlowFile) failed. The incoming FlowFile will be penalized and routed to this relationship. If no incoming connection(s) are specified, this relationship is unused.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
generatetablefetch.sql.errorIf the processor has incoming connections, and processing an incoming flow file causes a SQL Exception, the flow file is routed to failure and this attribute is set to the exception message.
generatetablefetch.tableNameThe name of the database table to be queried.
generatetablefetch.columnNamesThe comma-separated list of column names used in the query.
generatetablefetch.whereClauseWhere clause used in the query to get the expected rows.
generatetablefetch.maxColumnNamesThe comma-separated list of column names used to keep track of data that has been returned since the processor started running.
generatetablefetch.limitThe number of result rows to be fetched by the SQL statement.
generatetablefetch.offsetOffset to be used to retrieve the corresponding partition.

State management:

ScopeDescription
CLUSTERAfter performing a query on the specified table, the maximum values for the specified column(s) will be retained for use in future executions of the query. This allows the Processor to fetch only those records that have max values greater than the retained values. This can be used for incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor per the State Management documentation

Restricted:

This component is not restricted.

Input requirement:

This component allows an incoming relationship.

See Also:

QueryDatabaseTable, ExecuteSQL, ListDatabaseTables