ForkRecord 2.7.2

Bundle: org.apache.nifi | nifi-standard-nar
Description: This processor allows the user to fork a record into multiple records. The user must specify at least one Record Path, as a dynamic property, pointing to a field of type ARRAY containing RECORD objects. The processor accepts two modes: 'split' and 'extract'. In both modes, there is one record generated per element contained in the designated array. In the 'split' mode, each generated record will preserve the same schema as given in the input but the array will contain only one element. In the 'extract' mode, the element of the array must be of record type and will be the generated record. Additionally, in the 'extract' mode, it is possible to specify if each generated record should contain all the fields of the parent records from the root level to the extracted record. This assumes that the fields to add in the record are defined in the schema of the Record Writer controller service. See examples in the additional details documentation of this processor.
Tags: array, content, event, fork, record, stream
Input Requirement: REQUIRED
Supports Sensitive Dynamic Properties: false

ForkRecord

ForkRecord allows the user to fork a record into multiple records. To do that, the user must specify one or multiple RecordPath (as dynamic properties of the processor) pointing to a field of type ARRAY containing RECORD elements.

The processor accepts two modes:

Split mode - in this mode, the generated records will have the same schema as the input. For every element in the array, one record will be generated and the array will only contain this element.
Extract mode - in this mode, the generated records will be the elements contained in the array. Besides, it is also possible to add in each record all the fields of the parent records from the root level to the record element being forked. However it supposes the fields to add are defined in the schema of the Record Writer controller service.

Examples

EXTRACT mode

To better understand how this Processor works, we will lay out a few examples. For the sake of these examples, let’s assume that our input data is JSON formatted and looks like this:

[
  {
    "id": 1,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "accounts": [
      {
        "id": 42,
        "balance": 4750.89
      },
      {
        "id": 43,
        "balance": 48212.38
      }
    ]
  },
  {
    "id": 2,
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "accounts": [
      {
        "id": 45,
        "balance": 6578.45
      },
      {
        "id": 46,
        "balance": 34567.21
      }
    ]
  }
]

Example 1 - Extracting without parent fields

For this case, we want to create one record per account and we don’t care about the other fields. We’ll add a dynamic property “path” set to /accounts. The resulting flow file will contain 4 records and will look like (assuming the Record Writer schema is correctly set):

[
  {
    "id": 42,
    "balance": 4750.89
  },
  {
    "id": 43,
    "balance": 48212.38
  },
  {
    "id": 45,
    "balance": 6578.45
  },
  {
    "id": 46,
    "balance": 34567.21
  }
]

Example 2 - Extracting with parent fields

Now, if we set the property “Include parent fields” to true, this will recursively include the parent fields into the output records assuming the Record Writer schema allows it. In case multiple fields have the same name (like we have in this example for id), the child field will have the priority over all the parent fields sharing the same name. In this case, the id of the array accounts will be saved in the forked records. The resulting flow file will contain 4 records and will look like:

[
  {
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "id": 42,
    "balance": 4750.89
  },
  {
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "id": 43,
    "balance": 48212.38
  },
  {
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "id": 45,
    "balance": 6578.45
  },
  {
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "id": 46,
    "balance": 34567.21
  }
]

Example 3 - Multi-nested arrays

Now let’s say that the input record contains multi-nested arrays like the below example:

[
  {
    "id": 1,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "accounts": [
      {
        "id": 42,
        "balance": 4750.89,
        "transactions": [
          {
            "id": 5,
            "amount": 150.31
          },
          {
            "id": 6,
            "amount": -15.31
          }
        ]
      },
      {
        "id": 43,
        "balance": 48212.38,
        "transactions": [
          {
            "id": 7,
            "amount": 36.78
          },
          {
            "id": 8,
            "amount": -21.34
          }
        ]
      }
    ]
  }
]

If we want to have one record per transaction for each account, then the Record Path should be set to /accounts[*]/transactions. If we have the following schema for our Record Reader:

{
  "type": "record",
  "name": "bank",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    },
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "zipCode",
      "type": "string"
    },
    {
      "name": "country",
      "type": "string"
    },
    {
      "name": "accounts",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "accounts",
          "fields": [
            {
              "name": "id",
              "type": "int"
            },
            {
              "name": "balance",
              "type": "double"
            },
            {
              "name": "transactions",
              "type": {
                "type": "array",
                "items": {
                  "type": "record",
                  "name": "transactions",
                  "fields": [
                    {
                      "name": "id",
                      "type": "int"
                    },
                    {
                      "name": "amount",
                      "type": "double"
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }
  ]
}

And if we have the following schema for our Record Writer:

{
  "type": "record",
  "name": "bank",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    },
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "zipCode",
      "type": "string"
    },
    {
      "name": "country",
      "type": "string"
    },
    {
      "name": "amount",
      "type": "double"
    },
    {
      "name": "balance",
      "type": "double"
    }
  ]
}

Then, if we include the parent fields, we’ll have 4 records as below:

[
  {
    "id": 5,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "amount": 150.31,
    "balance": 4750.89
  },
  {
    "id": 6,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "amount": -15.31,
    "balance": 4750.89
  },
  {
    "id": 7,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "amount": 36.78,
    "balance": 48212.38
  },
  {
    "id": 8,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "amount": -21.34,
    "balance": 48212.38
  }
]

SPLIT mode

Example

Assuming we have the below data and we added a property “path” set to /accounts:

[
  {
    "id": 1,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "accounts": [
      {
        "id": 42,
        "balance": 4750.89
      },
      {
        "id": 43,
        "balance": 48212.38
      }
    ]
  },
  {
    "id": 2,
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "accounts": [
      {
        "id": 45,
        "balance": 6578.45
      },
      {
        "id": 46,
        "balance": 34567.21
      }
    ]
  }
]

Then we’ll get 4 records as below:

[
  {
    "id": 1,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "accounts": [
      {
        "id": 42,
        "balance": 4750.89
      }
    ]
  },
  {
    "id": 1,
    "name": "John Doe",
    "address": "123 My Street",
    "city": "My City",
    "state": "MS",
    "zipCode": "11111",
    "country": "USA",
    "accounts": [
      {
        "id": 43,
        "balance": 48212.38
      }
    ]
  },
  {
    "id": 2,
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "accounts": [
      {
        "id": 45,
        "balance": 6578.45
      }
    ]
  },
  {
    "id": 2,
    "name": "Jane Doe",
    "address": "345 My Street",
    "city": "Her City",
    "state": "NY",
    "zipCode": "22222",
    "country": "USA",
    "accounts": [
      {
        "id": 46,
        "balance": 34567.21
      }
    ]
  }
]

Properties

Include Parent Fields
This parameter is only valid with the 'extract' mode. If set to true, all the fields from the root level to the given array will be added as fields of each element of the array to fork.
Display Name

Include Parent Fields

Description

This parameter is only valid with the 'extract' mode. If set to true, all the fields from the root level to the given array will be added as fields of each element of the array to fork.

API Name

Include Parent Fields

Default Value

false

Allowable Values
- true
- false
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Mode
Specifies the forking mode of the processor
Display Name

Mode

Description

Specifies the forking mode of the processor

API Name

Mode

Default Value

split

Allowable Values
- Extract
- Split
Expression Language Scope

Not Supported

Sensitive

false

Required

true
Record Reader
Specifies the Controller Service to use for reading incoming data

Display Name

Record Reader

Description

Specifies the Controller Service to use for reading incoming data

API Name

Record Reader

Service Interface

org.apache.nifi.serialization.RecordReaderFactory

Service Implementations

org.apache.nifi.avro.AvroReader

org.apache.nifi.cef.CEFReader

org.apache.nifi.csv.CSVReader

org.apache.nifi.excel.ExcelReader

org.apache.nifi.grok.GrokReader

org.apache.nifi.json.JsonPathReader

org.apache.nifi.json.JsonTreeReader

org.apache.nifi.services.protobuf.ProtobufReader

org.apache.nifi.lookup.ReaderLookup

org.apache.nifi.record.script.ScriptedReader

org.apache.nifi.services.protobuf.StandardProtobufReader

org.apache.nifi.syslog.Syslog5424Reader

org.apache.nifi.syslog.SyslogReader

org.apache.nifi.windowsevent.WindowsEventLogReader

org.apache.nifi.xml.XMLReader

org.apache.nifi.yaml.YamlTreeReader

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Record Writer
Specifies the Controller Service to use for writing out the records

Display Name

Record Writer

Description

Specifies the Controller Service to use for writing out the records

API Name

Record Writer

Service Interface

org.apache.nifi.serialization.RecordSetWriterFactory

Service Implementations

org.apache.nifi.avro.AvroRecordSetWriter

org.apache.nifi.csv.CSVRecordSetWriter

org.apache.nifi.text.FreeFormTextRecordSetWriter

org.apache.nifi.json.JsonRecordSetWriter

org.apache.nifi.lookup.RecordSetWriterLookup

org.apache.nifi.record.script.ScriptedRecordSetWriter

org.apache.nifi.xml.XMLRecordSetWriter

Expression Language Scope

Not Supported

Sensitive

false

Required

true

Dynamic Properties

Record Path property
A Record Path value, pointing to a field of type ARRAY containing RECORD objects

Name

Record Path property

Description

A Record Path value, pointing to a field of type ARRAY containing RECORD objects

Value

The Record Path value

Expression Language Scope

FLOWFILE_ATTRIBUTES

Relationships

Name	Description
failure	In case a FlowFile generates an error during the fork operation, it will be routed to this relationship
fork	The FlowFiles containing the forked records will be routed to this relationship
original	The original FlowFiles will be routed to this relationship

Writes Attributes

Name	Description
record.count	The generated FlowFile will have a 'record.count' attribute indicating the number of records that were written to the FlowFile.
mime.type	The MIME Type indicated by the Record Writer
<Attributes from Record Writer>	Any Attribute that the configured Record Writer returns will be added to the FlowFile.