SearchElasticsearch

Description:

A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid "sort" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the query will restart with the first page of results being retrieved.

Additional Details...

Tags:

elasticsearch, elasticsearch5, elasticsearch6, elasticsearch7, elasticsearch8, query, scroll, page, search, json

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Query Definition Styleel-rest-query-definition-styleFULL_QUERY
  • FULL_QUERY Provide the full Query.
  • BUILD_QUERY Build the Query from separate JSON objects.
How the JSON Query will be defined for use by the processor.
Queryel-rest-queryA query in JSON syntax, not Lucene syntax. Ex: {"query":{"match":{"somefield":"somevalue"}}}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "FULL_QUERY".
Query Clauseel-rest-query-clauseA "query" clause in JSON syntax, not Lucene syntax. Ex: {"match":{"somefield":"somevalue"}}. If the query is empty, a default JSON Object will be used, which will result in a "match_all" query in Elasticsearch.
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Sizees-rest-sizeThe maximum number of documents to retrieve in the query. If the query is paginated, this "size" applies to each page of the query, not the "size" of the entire result set.
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Sortes-rest-query-sortSort results by one or more fields, in JSON syntax. Ex: [{"price" : {"order" : "asc", "mode" : "avg"}}, {"post_date" : {"format": "strict_date_optional_time_nanos"}}]
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Aggregationses-rest-query-aggsOne or more query aggregations (or "aggs"), in JSON syntax. Ex: {"items": {"terms": {"field": "product", "size": 10}}}
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Fieldses-rest-query-fieldsFields of indexed documents to be retrieved, in JSON syntax. Ex: ["user.id", "http.response.*", {"field": "@timestamp", "format": "epoch_millis"}]
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Script Fieldses-rest-query-script-fieldsFields to created using script evaluation at query runtime, in JSON syntax. Ex: {"test1": {"script": {"lang": "painless", "source": "doc['price'].value * 2"}}, "test2": {"script": {"lang": "painless", "source": "doc['price'].value * params.factor", "params": {"factor": 2.0}}}}
Supports Expression Language: true (will be evaluated using variable registry only)

This Property is only considered if the [Query Definition Style] Property has a value of "BUILD_QUERY".
Query Attributeel-query-attributeIf set, the executed query will be set on each result flowfile in the specified attribute.
Supports Expression Language: true (will be evaluated using variable registry only)
Indexel-rest-fetch-indexThe name of the index to use.
Supports Expression Language: true (will be evaluated using variable registry only)
Typeel-rest-typeThe type of this document (used by Elasticsearch for indexing and searching).
Supports Expression Language: true (will be evaluated using variable registry only)
Client Serviceel-rest-client-serviceController Service API:
ElasticSearchClientService
Implementation: ElasticSearchClientServiceImpl
An Elasticsearch client service to use for running queries.
Search Results Splitel-rest-split-up-hitsPER_RESPONSE
  • PER_HIT Flowfile per hit.
  • PER_RESPONSE Flowfile per response.
  • PER_QUERY Combine results from all query responses (one flowfile per entire paginated result set of hits). Note that aggregations cannot be paged, they are generated across the entire result set and returned as part of the first page. Results are output with one JSON object per line (allowing hits to be combined from multiple pages without loading all results into memory).
Output a flowfile containing all hits or one flowfile for each individual hit or one flowfile containing all hits from all paged responses.
Search Results Formatel-rest-format-hitsFULL
  • FULL Contains full Elasticsearch Hit, including Document Source and Metadata.
  • SOURCE_ONLY Document Source only (where present).
  • METADATA_ONLY Hit Metadata only.
Format of Hits output.
Aggregation Results Splitel-rest-split-up-aggregationsPER_RESPONSE
  • PER_HIT Flowfile per hit.
  • PER_RESPONSE Flowfile per response.
Output a flowfile containing all aggregations or one flowfile for each individual aggregation.
Aggregation Results Formatel-rest-format-aggregationsFULL
  • FULL Contains full Elasticsearch Aggregation, including Buckets and Metadata.
  • BUCKETS_ONLY Bucket Content only.
  • METADATA_ONLY Aggregation Metadata only.
Format of Aggregation output.
Output No Hitsel-rest-output-no-hitsfalse
  • true
  • false
Output a "hits" flowfile even if no hits found for query. If true, an empty "hits" flowfile will be output even if "aggregations" are output.
Pagination Typeel-rest-pagination-typeSCROLL
  • SCROLL Use Elasticsearch "_scroll" API to page results. Does not accept additional query parameters.
  • SEARCH_AFTER Use Elasticsearch "search_after" _search API to page sorted results.
  • POINT_IN_TIME Use Elasticsearch (7.10+ with XPack) "point in time" _search API to page sorted results.
Pagination method to use. Not all types are available for all Elasticsearch versions, check the Elasticsearch docs to confirm which are applicable and recommended for your service.
Pagination Keep Aliveel-rest-pagination-keep-alive10 minsPagination "keep_alive" period. Period Elasticsearch will keep the scroll/pit cursor alive in between requests (this is not the time expected for all pages to be returned, but the maximum allowed time for requests between page retrievals).

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
The name of a URL query parameter to addThe value of the URL query parameterAdds the specified property name/value as a query parameter in the Elasticsearch URL used for processing. These parameters will override any matching parameters in the query request body. For SCROLL type queries, these parameters are only used in the initial (first page) query as the Elasticsearch Scroll API does not support the same query parameters for subsequent pages of data.
Supports Expression Language: true (will be evaluated using variable registry only)

Relationships:

NameDescription
aggregationsAggregations are routed to this relationship.
hitsSearch hits are routed to this relationship.

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
mime.typeapplication/json
aggregation.nameThe name of the aggregation whose results are in the output flowfile
aggregation.numberThe number of the aggregation whose results are in the output flowfile
page.numberThe number of the page (request), starting from 1, in which the results were returned that are in the output flowfile
hit.countThe number of hits that are in the output flowfile
elasticsearch.query.errorThe error message provided by Elasticsearch if there is an error querying the index.

State management:

ScopeDescription
LOCALThe pagination state (scrollId, searchAfter, pitId, hitCount, pageCount, pageExpirationTimestamp) is retained in between invocations of this processor until the Scroll/PiT has expired (when the current time is later than the last query execution plus the Pagination Keep Alive interval).

Restricted:

This component is not restricted.

Input requirement:

This component does not allow an incoming relationship.

System Resource Considerations:

ResourceDescription
MEMORYCare should be taken on the size of each page because each response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles.

See Also:

PaginatedJsonQueryElasticsearch, ConsumeElasticsearch