RemoveRecordField 2.4.0

Bundle: org.apache.nifi | nifi-standard-nar
Description: Modifies the contents of a FlowFile that contains Record-oriented data (i.e. data that can be read via a RecordReader and written by a RecordWriter) by removing selected fields. This Processor requires that at least one user-defined Property be added. The name of the property is ignored by the processor, but could be a meaningful identifier for the user. The value of the property should indicate a RecordPath that determines the field to be removed. The processor executes the removal in the order in which these properties are added to the processor. Set the "Record Writer" to "Inherit Record Schema" in order to use the updated Record Schema modified when removing Fields.
Tags: avro, csv, delete, freeform, generic, json, record, remove, schema, text, update
Input Requirement: REQUIRED
Supports Sensitive Dynamic Properties: false

Additional Details for RemoveRecordField 2.4.0

RemoveRecordField

RemoveRecordField processor usage with examples

The RemoveRecordField processor is capable of removing fields from a NiFi record. The fields that should be removed from the record are identified by a RecordPath expression. To learn about RecordPath, please read the RecordPath Guide.

RemoveRecordField will update all Records within the FlowFile based upon the RecordPath(s) configured for removal. The Schema associated with the Record Reader configured to read the FlowFile content will be updated based upon the same RecordPath(s) and considering the values remaining within the Record’s Fields after removal. This updated schema can be used for output if the Record Writer has a Schema Access Strategy of Inherit Record Schema, otherwise the schema updates will be lost and the Records output using the Schema configured upon the Writer.

Below are some examples that are intended to explain how to use the processor. In these examples the input data, the record schema, the output data and the output schema are all in JSON format for easy understanding. We assume that the processor is configured to use a JsonTreeReader and JsonRecordSetWriter controller service, but of course the processor works with other Reader and Writer controller services as well. In the examples it is also assumed that the record schema is provided explicitly as a FlowFile attribute (avro.schema attribute), and the Reader uses this schema to work with the FlowFile. The Writer’s Schema Access Strategy is “Inherit Record Schema” so that all modifications made to the schema by the processor are considered by the Writer. Schema Write Strategy of the Writer is set to “Set ‘avro.schema’ Attribute” so that the output FlowFile contains the schema as an attribute value.

Example 1:

Removing a field from a simple record

Input data:

{
  "id": 1,
  "name": "John Doe",
  "dateOfBirth": "1980-01-01"
}

Input schema:

{
  "type": "record",
  "name": "PersonRecord",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "dateOfBirth",
      "type": "string"
    }
  ]
}

Field to remove:

/dateOfBirth

In this case the dateOfBirth field is removed from the record as well as the schema.

Output data:

{
  "id": 1,
  "name": "John Doe"
}

Output schema:

{
  "type": "record",
  "name": "PersonRecord",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    }
  ]
}

Note, that removing a field from a record differs from setting a field’s value to null. With RemoveRecordField a field is completely removed from the record and its schema regardless of the field being nullable or not.

Example 2:

Removing fields from a complex record

Let’s suppose we have an input record that contains a homeAddress and a mailingAddress field both of which contain a zip field and we want to remove the zip field from both of them.

Input data:

{
  "id": 1,
  "name": "John Doe",
  "homeAddress": {
    "zip": 1111,
    "street": "Main",
    "number": 24
  },
  "mailingAddress": {
    "zip": 1121,
    "street": "Airport",
    "number": 12
  }
}

Input schema:

{
  "name": "PersonRecord",
  "type": "record",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "homeAddress",
      "type": {
        "name": "address",
        "type": "record",
        "fields": [
          {
            "name": "zip",
            "type": "int"
          },
          {
            "name": "street",
            "type": "string"
          },
          {
            "name": "number",
            "type": "int"
          }
        ]
      }
    },
    {
      "name": "mailingAddress",
      "type": "address"
    }
  ]
}

The zip field from both addresses can be removed by specifying “Field to remove 1” property on the processor as " /homeAddress/zip" and adding a dynamic property with the value “/mailingAddress/zip”. Or we can use a wildcard expression in the RecordPath in “Field To Remove 1” (no need to specify a dynamic property).

Field to remove:

/*/zip

The zip field is removed from both addresses.

Output data:

{
  "id": 1,
  "name": "John Doe",
  "homeAddress": {
    "street": "Main",
    "number": 24
  },
  "mailingAddress": {
    "street": "Airport",
    "number": 12
  }
}

The zip field is removed from the schema of both the homeAddress field and the mailingAddress field. However, if only “/homeAddress/zip” was specified to be removed, the schema of mailingAddress would be intact regardless of the fact that originally these two addresses shared the same schema.

Example 3:

Arrays

Let’s suppose we have an input record that contains an array of addresses.

Input data:

{
  "id": 1,
  "name": "John Doe",
  "addresses": [
    {
      "zip": 1111,
      "street": "Main",
      "number": 24
    },
    {
      "zip": 1121,
      "street": "Airport",
      "number": 12
    }
  ]
}

Input schema:

{
  "name": "PersonRecord",
  "type": "record",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "addresses",
      "type": {
        "type": "array",
        "items": {
          "name": "address",
          "type": "record",
          "fields": [
            {
              "name": "zip",
              "type": "int"
            },
            {
              "name": "street",
              "type": "string"
            },
            {
              "name": "number",
              "type": "int"
            }
          ]
        }
      }
    }
  ]
}

Case 1: removing one element from the array

Field to remove:

/addresses[0]

Output data:

{
  "id": 1,
  "name": "John Doe",
  "addresses": [
    {
      "zip": 1121,
      "street": "Airport",
      "number": 12
    }
  ]
}

Output schema:

{
  "type": "record",
  "name": "nifiRecord",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "addresses",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "addressesType",
          "fields": [
            {
              "name": "zip",
              "type": "int"
            },
            {
              "name": "street",
              "type": "string"
            },
            {
              "name": "number",
              "type": "int"
            }
          ]
        }
      }
    }
  ]
}

The first element of the array is removed. The schema of the output data is structurally the same as the input schema. Note that the name “PersonRecord” of the input schema changed to “nifiRecord” and the name “address” changed to " addressesType". This is normal, NiFi generates these names for the output schema. These name changes occur regardless of the schema actually being modified or not.

Case 2: removing all elements from the array

Field to remove:

/addresses[*]

Output data:

{
  "id": 1,
  "name": "John Doe",
  "addresses": []
}

All elements of the array are removed, the result is an empty array. The output schema is the same as in Case 1, no structural changes made to the schema.

Case 3: removing a field from certain elements of the array

Field to remove:

/addresses[0]/zip

Output data:

{
  "id": 1,
  "name": "John Doe",
  "addresses": [
    {
      "zip": null,
      "street": "Main",
      "number": 24
    },
    {
      "zip": 1121,
      "street": "Airport",
      "number": 12
    }
  ]
}

The output schema is the same as in Case 1, no structural changes. The zip field of the array’s first element is set to null since the value had to be deleted but the schema could not be modified since deletion is not applied to all elements in the array. In a case like this, the value of the field is set to null regardless of the field being nullable or not.

Case 4: removing a field from all elements of an array

Field to remove:

/addresses[*]/zip

Output data:

{
  "id": 1,
  "name": "John Doe",
  "addresses": [
    {
      "street": "Main",
      "number": 24
    },
    {
      "street": "Airport",
      "number": 12
    }
  ]
}

Output schema:

{
  "type": "record",
  "name": "nifiRecord",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "addresses",
      "type": {
        "type": "array",
        "items": {
          "type": "record",
          "name": "addressesType",
          "fields": [
            {
              "name": "street",
              "type": "string"
            },
            {
              "name": "number",
              "type": "int"
            }
          ]
        }
      }
    }
  ]
}

The zip field is removed from all elements of the array, and the schema is modified, the zip field is removed from the array’s element type.

The examples shown in Case 1, Case 2, Case 3 and Case 4 apply to both kinds of collections: arrays and maps. The schema of an array or a map is only modified if the field removal applies to all elements of the collection. Selecting all elements of an array can be performed with the [*] as well as the [0..-1] operator.

Important note: if there are e.g. 3 elements in the addresses array, and “/addresses[*]/zip” is removed, then the zip field is removed from the schema because it applies explicitly for all elements regardless of the actual number of elements in the array. However, if the path says “/addresses[0,1,2]/zip” then the schema is NOT modified (even though [0,1,2] means all the elements in this particular array), because it selects the first, second and third elements individually and does not express the intention to apply removal to all elements of the array regardless of the number of elements.

Case 5: removing multiple elements from an array

Fields to remove:

/addresses[0] /addresses[0]

In this example we want to remove the first two elements of the array. To do that we need to specify two separate path expressions, each pointing to one array element. Each removal is executed on the result of the previous removal, and removals are executed in the order in which the properties containing record paths are specified on the Processor. First, “/addresses[0]” is removed, that is the address with zip code 1111 in the example. After this removal, the addresses array has a new first element, which is the second element of the original array (the address with zip code 1121). To remove this element, we need to issue “/addresses[0]” again. Trying to remove “/addresses[0,1]”, or filtering array elements with predicates when the target of the removal is multiple different array elements may produce unexpected results.

Case 6: array within an array

Let’s suppose we have a complex input record that has an array within an array.

Input data:

{
  "id": 1,
  "people": [
    {
      "id": 11,
      "addresses": [
        {
          "zip": 1111,
          "street": "Main",
          "number": 24
        },
        {
          "zip": 1121,
          "street": "Airport",
          "number": 12
        }
      ]
    },
    {
      "id": 22,
      "addresses": [
        {
          "zip": 2222,
          "street": "Ocean",
          "number": 24
        },
        {
          "zip": 2232,
          "street": "Sunset",
          "number": 12
        }
      ]
    },
    {
      "id": 33,
      "addresses": [
        {
          "zip": 3333,
          "street": "Dawn",
          "number": 24
        },
        {
          "zip": 3323,
          "street": "Spring",
          "number": 12
        }
      ]
    }
  ]
}

The following table summarizes what happens to the record and the schema for different RecordPaths

Field To Remove	Is the schema modified?	What happens to the record?
/people[0]/addresses[1]/zip	No	The zip field of the first person’s second address is set to null.
/people[*]/addresses[1]/zip	No	The zip field of all people’s second address is set to null.
/people[0]/addresses[*]/zip	No	The zip field of the first person’s every address is set to null.
/people[]/addresses[]/zip	Yes	The zip field every person’s every address is removed (from the schema AND the data).

The rules and examples shown for arrays apply for maps as well.

Example 4:

Choice datatype

Let’s suppose we have an input schema that contains a field of type CHOICE.

Input data:

{
  "id": 12,
  "name": "John Doe"
}

Input schema:

{
  "type": "record",
  "name": "nameRecord",
  "namespace": "org.apache.nifi",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "name",
      "type": [
        "string",
        {
          "type": "record",
          "name": "nameType",
          "fields": [
            {
              "name": "firstName",
              "type": "string"
            },
            {
              "name": "lastName",
              "type": "string"
            }
          ]
        }
      ]
    }
  ]
}

In this example, the schema specifies the name field as CHOICE, but in the data it is a simple string. If we remove " /name/firstName" then there is no modifications to the data, but the schema is modified, the firstName field gets removed from the schema only.

Properties

Record Reader
Specifies the Controller Service to use for reading incoming data

Display Name

Record Reader

Description

Specifies the Controller Service to use for reading incoming data

API Name

Record Reader

Service Interface

org.apache.nifi.serialization.RecordReaderFactory

Service Implementations

org.apache.nifi.avro.AvroReader

org.apache.nifi.cef.CEFReader

org.apache.nifi.csv.CSVReader

org.apache.nifi.excel.ExcelReader

org.apache.nifi.grok.GrokReader

org.apache.nifi.json.JsonPathReader

org.apache.nifi.json.JsonTreeReader

org.apache.nifi.services.protobuf.ProtobufReader

org.apache.nifi.lookup.ReaderLookup

org.apache.nifi.record.script.ScriptedReader

org.apache.nifi.syslog.Syslog5424Reader

org.apache.nifi.syslog.SyslogReader

org.apache.nifi.windowsevent.WindowsEventLogReader

org.apache.nifi.xml.XMLReader

org.apache.nifi.yaml.YamlTreeReader

Expression Language Scope

Not Supported

Sensitive

false

Required

true
Record Writer
Specifies the Controller Service to use for writing out the records

Display Name

Record Writer

Description

Specifies the Controller Service to use for writing out the records

API Name

Record Writer

Service Interface

org.apache.nifi.serialization.RecordSetWriterFactory

Service Implementations

org.apache.nifi.avro.AvroRecordSetWriter

org.apache.nifi.csv.CSVRecordSetWriter

org.apache.nifi.text.FreeFormTextRecordSetWriter

org.apache.nifi.json.JsonRecordSetWriter

org.apache.nifi.lookup.RecordSetWriterLookup

org.apache.nifi.record.script.ScriptedRecordSetWriter

org.apache.nifi.xml.XMLRecordSetWriter

Expression Language Scope

Not Supported

Sensitive

false

Required

true

Dynamic Properties

A description of the field to remove
Any field that matches the RecordPath set as the value will be removed.

Name

A description of the field to remove

Description

Any field that matches the RecordPath set as the value will be removed.

Value

A RecordPath to the field to be removed.

Expression Language Scope

FLOWFILE_ATTRIBUTES

Relationships

Name	Description
failure	If a FlowFile cannot be transformed from the configured input format to the configured output format, the unchanged FlowFile will be routed to this relationship
success	FlowFiles that are successfully transformed will be routed to this relationship

Writes Attributes

Name	Description
record.error.message	This attribute provides on failure the error message encountered by the Reader or Writer.

Use Cases

Remove one or more fields from a Record, where the names of the fields to remove are known.

Description

Remove one or more fields from a Record, where the names of the fields to remove are known.

Keywords

record, field, drop, remove, delete, expunge, recordpath

Configuration

Configure the Record Reader according to the incoming data format.
Configure the Record Writer according to the desired output format.

For each field that you want to remove, add a single new property to the Processor.
The name of the property can be anything but it's recommended to use a brief description of the field.
The value of the property is a RecordPath that matches the field to remove.

For example, to remove the `name` and `email` fields, add two Properties:
`name` = `/name`
`email` = `/email`