ScanContent

Description:

Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute

Tags:

aho-corasick, scan, content, byte sequence, search, find, dictionary

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Dictionary FileDictionary FileThe filename of the terms dictionary

This property requires exactly one file to be provided..
Dictionary EncodingDictionary Encodingtext
  • text
  • binary
Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself

Relationships:

NameDescription
unmatchedFlowFiles that do not match any term in the dictionary are routed to this relationship
matchedFlowFiles that match at least one term in the dictionary are routed to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
matching.termThe term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

None specified.