SplitXml

Description:

Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element

Tags:

xml, split

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Split DepthSplit Depth1Indicates the XML-nesting depth to start splitting XML fragments. A depth of 1 means split the root's children, whereas a depth of 2 means split the root's children's children and so forth.

Relationships:

NameDescription
failureIf a FlowFile fails processing for any reason (for example, the FlowFile is not valid XML), it will be routed to this relationship
originalThe original FlowFile that was split into segments. If the FlowFile fails processing, nothing will be sent to this relationship
splitAll segments of the original FlowFile will be routed to this relationship

Reads Attributes:

None specified.

Writes Attributes:

NameDescription
fragment.identifierAll split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.indexA one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.countThe number of split FlowFiles generated from the parent FlowFile
segment.original.filename The filename of the parent FlowFile

State management:

This component does not store state.

Restricted:

This component is not restricted.

Input requirement:

This component requires an incoming relationship.

System Resource Considerations:

ResourceDescription
MEMORYThe entirety of the FlowFile's content (as a Document object) is read into memory, in addition to all of the generated FlowFiles representing the split XML. A Document object can take approximately 10 times as much memory as the size of the XML. For example, a 1 MB XML document may use 10 MB of memory. If many splits are generated due to the size of the XML, a two-phase approach may be necessary to avoid excessive use of memory.