PaginatedJsonQueryElasticsearch 2.1.0

Bundle
org.apache.nifi | nifi-elasticsearch-restapi-nar
Description
A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid "sort" field.
Tags
elasticsearch, elasticsearch5, elasticsearch6, elasticsearch7, elasticsearch8, json, page, query, read, scroll
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
  • Additional Details for PaginatedJsonQueryElasticsearch 2.1.0

    PaginatedJsonQueryElasticsearch

    This processor is intended for use with the Elasticsearch JSON DSL and Elasticsearch 5.X and newer. It is designed to be able to take a JSON query (e.g. from Kibana) and execute it as-is against an Elasticsearch cluster in a paginated manner. Like all processors in the “restapi” bundle, it uses the official Elastic client APIs, so it supports leader detection.

    The query JSON to execute can be provided either in the Query configuration property or in the content of the flowfile. If the Query Attribute property is configured, the executed query JSON will be placed in the attribute provided by this property.

    The query is paginated in Elasticsearch using one of the available methods - “Scroll” or “Search After” (optionally with a “Point in Time” for Elasticsearch 7.10+ with XPack enabled). The number of results per page can be controlled using the size parameter in the Query JSON. For Search After functionality, a sort parameter must be present within the Query JSON.

    Search results and aggregation results can be split up into multiple flowfiles. Aggregation results will only be split at the top level because nested aggregations lose their context (and thus lose their value) if separated from their parent aggregation. Additionally, the results from all pages can be combined into a single flowfile (but the processor will only load each page of data into memory at any one time).

    The following is an example query that would be accepted:

    {
      "query": {
        "size": 10000,
        "sort": {
          "product": "desc"
        },
        "match": {
          "restaurant.keyword": "Local Pizzaz FTW Inc"
        }
      },
      "aggs": {
        "weekly_sales": {
          "date_histogram": {
            "field": "date",
            "interval": "week"
          },
          "aggs": {
            "items": {
              "terms": {
                "field": "product",
                "size": 10
              }
            }
          }
        }
      }
    }
    
Properties
Dynamic Properties
System Resource Considerations
Resource Description
MEMORY Care should be taken on the size of each page because each response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles.
Relationships
Name Description
aggregations Aggregations are routed to this relationship.
failure All flowfiles that fail for reasons unrelated to server availability go to this relationship.
original All original flowfiles that don't cause an error to occur go to this relationship.
hits Search hits are routed to this relationship.
Writes Attributes
Name Description
mime.type application/json
aggregation.name The name of the aggregation whose results are in the output flowfile
aggregation.number The number of the aggregation whose results are in the output flowfile
page.number The number of the page (request), starting from 1, in which the results were returned that are in the output flowfile
hit.count The number of hits that are in the output flowfile
elasticsearch.query.error The error message provided by Elasticsearch if there is an error querying the index.
See Also