ScriptedPartitionRecord 2.0.0

Bundle
org.apache.nifi | nifi-scripting-nar
Description
Receives Record-oriented data (i.e., data that can be read by the configured Record Reader) and evaluates the user provided script against each record in the incoming flow file. Each record is then grouped with other records sharing the same partition and a FlowFile is created for each groups of records. Two records shares the same partition if the evaluation of the script results the same return value for both. Those will be considered as part of the same partition.
Tags
groovy, group, organize, partition, record, script, segment, split
Input Requirement
Supports Sensitive Dynamic Properties
false
  • Additional Details for ScriptedPartitionRecord 2.0.0

    ScriptedPartitionRecord

    Description

    The ScriptedPartitionRecord provides the ability to use a scripting language, such as Groovy, to quickly and easily partition a Record based on its contents. There are multiple ways to reach the same behaviour such as using PartitionRecord but working with user provided scripts opens a wide range of possibilities on the decision logic of partitioning the individual records.

    The provided script is evaluated once for each Record that is encountered in the incoming FlowFile. Each time that the script is invoked, it is expected to return an object or a null value. The string representation of the return value is used as the record’s “partition”. The null value is handled separately without conversion into string. All Records with the same partition then will be batched to one FlowFile and routed to the success Relationship.

    This Processor maintains a Counter with the name of “Records Processed”. This represents the number of processed Records regardless of partitioning.

    Variable Bindings

    While the script provided to this Processor does not need to provide boilerplate code or implement any classes/interfaces, it does need some way to access the Records and other information that it needs in order to perform its task. This is accomplished by using Variable Bindings. Each time that the script is invoked, each of the following variables will be made available to the script:

    Variable Name Description Variable Class
    record The Record that is to be processed. Record
    recordIndex The zero-based index of the Record in the FlowFile. Long (64-bit signed integer)
    log The Processor’s Logger. Anything that is logged to this logger will be written to the logs as if the Processor itself had logged it. Additionally, a bulletin will be created for any log message written to this logger (though by default, the Processor will hide any bulletins with a level below WARN). ComponentLog
    attributes Map of key/value pairs that are the Attributes of the FlowFile. Both the keys and the values of this Map are of type String. This Map is immutable. Any attempt to modify it will result in an UnsupportedOperationException being thrown. java.util.Map

    Return Value

    The script is invoked separately for each Record. It is acceptable to return any Object might be represented as string. This string value will be used as the partition of the given Record. Additionally, the script may return null.

    Example

    The following script will partition the input on the value of the “stellarType” field.

    Example Input (CSV):

    starSystem, stellarType Wolf 359, M Epsilon Eridani, K Tau Ceti, G Groombridge 1618, K Gliese 1, M
    

    Example Output 1 (CSV) - for partition “M”:

    starSystem, stellarType Wolf 359,M Gliese 1,M
    

    Example Output 2 (CSV) - for partition “K”:

    starSystem, stellarType Epsilon Eridani,K Groombridge 1618,K
    

    Example Output 3 (CSV) - for partition “G”:

    starSystem, stellarType Tau Ceti,G
    

    Note: the order of the outgoing FlowFiles is not guaranteed.

    Example Script (Groovy):

    return record.getValue("stellarType")
    
Properties
Restrictions
Required Permission Explanation
execute code Provides operator the ability to execute arbitrary code assuming all permissions that NiFi has.
Relationships
Name Description
success FlowFiles that are successfully partitioned will be routed to this relationship
failure If a FlowFile cannot be partitioned from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
original Once all records in an incoming FlowFile have been partitioned, the original FlowFile is routed to this relationship.
Writes Attributes
Name Description
partition The partition of the outgoing flow file. If the script indicates that the partition has a null value, the attribute will be set to the literal string "<null partition>" (without quotes). Otherwise, the attribute is set to the String representation of whatever value is returned by the script.
mime.type Sets the mime.type attribute to the MIME Type specified by the Record Writer
record.count The number of records within the flow file.
record.error.message This attribute provides on failure the error message encountered by the Reader or Writer.
fragment.index A one-up number that indicates the ordering of the partitioned FlowFiles that were created from a single parent FlowFile
fragment.count The number of partitioned FlowFiles generated from the parent FlowFile
See Also