The XMLReader Controller Service reads XML content and creates Record objects. The Controller Service must be configured with a schema that describes the structure of the XML data. Fields in the XML data that are not defined in the schema will be skipped. Depending on whether the property "Expect Records as Array" is set to "false" or "true", the reader either expects a single record or an array of records for each FlowFile.

Example: Single record

                <record>
                  <field1>content</field1>
                  <field2>content</field2>
                </record>
            

An array of records has to be enclosed by a root tag. Example: Array of records

                <root>
                  <record>
                    <field1>content</field1>
                    <field2>content</field2>
                  </record>
                  <record>
                    <field1>content</field1>
                    <field2>content</field2>
                  </record>
                </root>
            

Example: Simple Fields

The simplest kind of data within XML data are tags / fields only containing content (no attributes, no embedded tags). They can be described in the schema by simple types (e. g. INT, STRING, ...).

                <root>
                  <record>
                    <simple_field>content</simple_field>
                  </record>
                </root>
            

This record can be described by a schema containing one field (e. g. of type string). By providing this schema, the reader expects zero or one occurrences of "simple_field" in the record.

                {
                  "namespace": "nifi",
                  "name": "test",
                  "type": "record",
                  "fields": [
                    { "name": "simple_field", "type": "string" }
                  ]
                }
            

Example: Arrays with Simple Fields

Arrays are considered as repetitive tags / fields in XML data. For the following XML data, "array_field" is considered to be an array enclosing simple fields, whereas "simple_field" is considered to be a simple field not enclosed in an array.

                <record>
                  <array_field>content</array_field>
                  <array_field>content</array_field>
                  <simple_field>content</simple_field>
                </record>
            

This record can be described by the following schema:

                {
                  "namespace": "nifi",
                  "name": "test",
                  "type": "record",
                  "fields": [
                    { "name": "array_field", "type":
                      { "type": "array", "items": string }
                    },
                    { "name": "simple_field", "type": "string" }
                  ]
                }
            

If a field in a schema is embedded in an array, the reader expects zero, one or more occurrences of the field in a record. The field "array_field" principally also could be defined as a simple field, but then the second occurrence of this field would replace the first in the record object. Moreover, the field "simple_field" could also be defined as an array. In this case, the reader would put it into the record object as an array with one element.

Example: Tags with Attributes

XML fields frequently not only contain content, but also attributes. The following record contains a field with an attribute "attr" and content:

                <record>
                  <field_with_attribute attr="attr_content">content of field</field_with_attribute>
                </record>
            

To parse the content of the field "field_with_attribute" together with the attribute "attr", two requirements have to be fulfilled:

For the example above, the following property settings are assumed:

Property Name Property Value
Field Name for Content field_name_for_content
Attribute Prefix prefix_

The schema can be defined as follows:

                {
                  "name": "test",
                  "namespace": "nifi",
                  "type": "record",
                  "fields": [
                    {
                      "name": "field_with_attribute",
                      "type": {
                        "name": "RecordForTag",
                        "type": "record",
                        "fields" : [
                          {"name": "attr", "type": "string"},
                          {"name": "field_name_for_content", "type": "string"}
                        ]
                    }
                  ]
                }
            

Note that the field "field_name_for_content" not only has to be defined in the property section, but also in the schema, whereas the prefix for attributes is not part of the schema. It will be appended when an attribute named "attr" is found at the respective position in the XML data and added to the record. The record object of the above example will be structured as follows:

                Record (
                    Record "field_with_attribute" (
                        RecordField "prefix_attr" = "attr_content",
                        RecordField "field_name_for_content" = "content of field"
                    )
                )
            

Principally, the field "field_with_attribute" could also be defined as a simple field. In this case, the attributes simply would be ignored. Vice versa, the simple field in example 1 above could also be defined as a record (assuming that the property "Field Name for Content" is set.

Example: Tags within tags

XML data is frequently nested. In this case, tags enclose other tags:

                <record>
                  <field_with_embedded_fields attr="attr_content">
                    <embedded_field>embedded content</embedded_field>
                    <another_embedded_field>another embedded content</another_embedded_field>
                  </field_with_embedded_fields>
                </record>
            

The enclosing fields always have to be defined as records, irrespective whether they include attributes to be parsed or not. In this example, the tag "field_with_embedded_fields" encloses the fields "embedded_field" and "another_embedded_field", which are both simple fields. The schema can be defined as follows:

                {
                  "name": "test",
                  "namespace": "nifi",
                  "type": "record",
                  "fields": [
                    {
                      "name": "field_with_embedded_fields",
                      "type": {
                        "name": "RecordForEmbedded",
                        "type": "record",
                        "fields" : [
                          {"name": "attr", "type": "string"},
                          {"name": "embedded_field", "type": "string"},
                          {"name": "another_embedded_field", "type": "string"}
                        ]
                    }
                  ]
                }
            

Notice that this case does not require the property "Field Name for Content" to be set as this is only required for tags containing attributes and content.

Example: Array of records

For further explanation of the logic of this reader, an example of an array of records shall be demonstrated. The following record contains the field "array_field", which repeatedly occurs. The field contains two embedded fields.

                <record>
                  <array_field>
                    <embedded_field>embedded content 1</embedded_field>
                    <another_embedded_field>another embedded content 1</another_embedded_field>
                  </array_field>
                  <array_field>
                    <embedded_field>embedded content 2</embedded_field>
                    <another_embedded_field>another embedded content 2</another_embedded_field>
                  </array_field>
                </record>
            

This XML data can be parsed similarly to the data in example 4. However, the record defined in the schema of example 4 has to be embedded in an array.

                {
                  "namespace": "nifi",
                  "name": "test",
                  "type": "record",
                  "fields": [
                    { "name": "array_field",
                      "type": {
                        "type": "array",
                        "items": {
                          "name": "RecordInArray",
                          "type": "record",
                          "fields" : [
                            {"name": "embedded_field", "type": "string"},
                            {"name": "another_embedded_field", "type": "string"}
                          ]
                        }
                      }
                    }
                  ]
                }
            

Example: Array in record

In XML data, arrays are frequently enclosed by tags:

                <record>
                  <field_enclosing_array>
                    <element>content 1</element>
                    <element>content 2</element>
                  </field_enclosing_array>
                  <field_without_array> content 3</field_without_array>
                </record>
            

For the schema, embedded tags have to be described by records. Therefore, the field "field_enclosing_array" is a record that embeds an array with elements of type string:

                {
                  "namespace": "nifi",
                  "name": "test",
                  "type": "record",
                  "fields": [
                    { "name": "field_enclosing_array",
                      "type": {
                        "name": "EmbeddedRecord",
                        "type": "record",
                        "fields" : [
                          {
                            "name": "element",
                            "type": {
                              "type": "array",
                              "items": "string"
                            }
                          }
                        ]
                      }
                    },
                    { "name": "field_without_array", "type": "string" }
                  ]
                }
            

Example: Maps

A map is a field embedding fields with different names:

                <record>
                  <map_field>
                    <field1>content</field1>
                    <field2>content</field2>
                    ...
                  </map_field>
                  <simple_field>content</simple_field>
                </record>
            

This data can be processed using the following schema:

                {
                  "namespace": "nifi",
                  "name": "test",
                  "type": "record",
                  "fields": [
                    { "name": "map_field", "type":
                      { "type": "map", "items": string }
                    },
                    { "name": "simple_field", "type": "string" }
                  ]
                }