ExecuteStreamCommand 2.0.0

Bundle
org.apache.nifi | nifi-standard-nar
Description
The ExecuteStreamCommand processor provides a flexible way to integrate external commands and scripts into NiFi data flows. ExecuteStreamCommand can pass the incoming FlowFile's content to the command that it executes similarly how piping works.
Tags
command, command execution, execute, stream
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
true
  • Additional Details for ExecuteStreamCommand 2.0.0

    ExecuteStreamCommand

    Description

    The ExecuteStreamCommand processor provides a flexible way to integrate external commands and scripts into NiFi data flows. ExecuteStreamCommand can pass the incoming FlowFile’s content to the command that it executes similarly how piping works.

    Configuration options

    Working Directory

    If not specified, NiFi root will be the default working directory.

    Configuring command arguments

    The ExecuteStreamCommand processor provides two ways to specify command arguments: using Dynamic Properties and the Command Arguments field.

    Command Arguments field

    This is the default. If there are multiple arguments, they need to be separated by a character specified in the Argument Delimiter field. When needed, ‘-’ and ‘–’ can be provided, but in these cases Argument Delimiter should be a different character.

    Consider that we want to list all files in a directory which is different from the working directory:

    Command Path Command Arguments
    ls -lah;/path/to/dir

    NOTE: the command should be on $PATH or it should be in the working directory, otherwise path also should be specified.

    Dynamic Properties

    Arguments can be specified with Dynamic Properties. Dynamic Properties with the pattern of ' command.argument.’ will be appended to the command in ascending order.

    The above example with dynamic properties would look like this:

    Property Name Property Value
    command.argument.0 -lah
    command.argument.1 /path/to/dir

    Configuring environment variables

    In addition to specifying command arguments using the Command Argument field or Dynamic Properties, users can also use environment variables with the ExecuteStreamCommand processor. Environment variables are a set of key-value pairs that can be accessed by processes running on the system. ExecuteStreamCommand will treat every Dynamic Property as an environment variable that doesn’t match the pattern ‘command.argument.’.

    Consider that we want to execute a Maven command with the processor. If there are multiple Java versions installed on the system, you can specify which version will be used by setting the JAVA_HOME environment variable. The output FlowFile will looke like this if we run mvn command with --version argument:

    Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63) 
    Maven home: /path/to/maven/home Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: /path/to/default/java/home 
    Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "13.1", arch: "x86_64", family: "mac"
    
    Property Name Property Value
    JAVA_HOME path/to/another/java/home

    After specifying the JAVA_HOME property, you can notice that maven is using a different runtime:

    Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63) 
    Maven home: /path/to/maven/home Java version: 11.0.18, vendor: Eclipse Adoptium, runtime: /path/to/another/java/home 
    Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "13.1", arch: "x86_64", family: "mac"
    

    Streaming input to the command

    ExecuteStreamCommand passes the incoming FlowFile’s content to the command that it executes similarly how piping works. It is possible to disable this behavior with the Ignore STDIN property. In the above examples we didn’t use the incoming FlowFile’s content, so in this case we could leverage this property for additional performance gain.

    To utilize the streaming capability, consider that we want to use grep on the FlowFile. Let’s presume that we need to list all POST requests from an Apache HTTPD log:

    127.0.0.1 - - [03/May/2023:13:54:26 +0000] "GET /example-page HTTP/1.1" 200 4825 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0" 
    127.0.0.1 - - [03/May/2023:14:05:32 +0000] "POST /submit-form HTTP/1.1" 302 0 "http://localhost/example-page" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0" 
    127.0.0.1 - - [03/May/2023:14:10:48 +0000] "GET /image.jpg HTTP/1.1" 200 35785 "http://localhost/example-page" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0" 
    127.0.0.1 - - [03/May/2023:14:20:15 +0000] "GET /example-page HTTP/1.1" 200 4825 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0" 
    127.0.0.1 - - [03/May/2023:14:30:42 +0000] "GET /example-page HTTP/1.1" 200 4825 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
    

    Processor configuration:

    Working Directory Command Path Command Arguments Strategy Command Arguments Argument Delimiter Ignore STDIN Output Destination Attribute Max Attribute Length
    grep Command Arguments Property POST ; false 256

    With this the emitted FlowFile on the “output stream” relationship should be:

    127.0.0.1 - - [03/May/2023:14:05:32 +0000] "POST /submit-form HTTP/1.1" 302 0 "http://localhost/example-page" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
    
Properties
Dynamic Properties
Restrictions
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Relationships
Name Description
output stream The destination path for the flow file created from the command's output, if the returned status code is zero.
original The original FlowFile will be routed. It will have new attributes detailing the result of the script execution.
nonzero status The destination path for the flow file created from the command's output, if the returned status code is non-zero. All flow files routed to this relationship will be penalized.
Writes Attributes
Name Description
execution.command The name of the command executed
execution.command.args The semi-colon delimited list of arguments. Sensitive properties will be masked
execution.status The exit status code returned from executing the command
execution.error Any error messages returned from executing the command
mime.type Sets the MIME type of the output if the 'Output MIME Type' property is set and 'Output Destination Attribute' is not set