The Elasticsearch Sniffer can be used to locate Elasticsearch Nodes within a Cluster to which you are connecting. This can be beneficial if your cluster dynamically changes over time, e.g. new Nodes are added to maintain performance during heavy load.
Sniffing can also be used to update the list of Hosts within the Cluster if a connection Failure is encountered during operation. In order to "Sniff on Failure", you must also enable "Sniff Cluster Nodes".
Not all situations make sense to use Sniffing, for example if:
There may also be need to set some of the
Elasticsearch Networking Advanced Settings, such as network.publish_host
to ensure that
the HTTP Hosts found by the Sniffer are accessible by NiFi. For example, Elasticsearch may use a network internal
publish_host
that is inaccessible to NiFi, but instead should use an address/IP that NiFi understands.
It may also be necessary to add this same address to Elasticsearch's network.bind_host
list.
See Elasticsearch sniffing best practices: What, when, why, how for more details of the best practices.
This Elasticsearch client relies on a RestClient
using the Apache HTTP Async Client. By default, it will start one
dispatcher thread, and a number of worker threads used by the connection manager. There will be as many worker thread as the number
of locally detected processors/cores on the NiFi host. Consequently, it is highly recommended to have only one instance of this
controller service per remote Elasticsearch destination and have this controller service shared across all of the Elasticsearch
processors of the NiFi flows. Having a very high number of instances could lead to resource starvation and result in OOM errors.