ValidateXml 2.0.0

Bundle
org.apache.nifi | nifi-standard-nar
Description
Validates XML contained in a FlowFile. By default, the XML is contained in the FlowFile content. If the 'XML Source Attribute' property is set, the XML to be validated is contained in the specified attribute. It is not recommended to use attributes to hold large XML documents; doing so could adversely affect system performance. Full schema validation is performed if the processor is configured with the XSD schema details. Otherwise, the only validation performed is to ensure the XML syntax is correct and well-formed, e.g. all opening tags are properly closed.
Tags
schema, validation, xml, xsd
Input Requirement
REQUIRED
Supports Sensitive Dynamic Properties
false
  • Additional Details for ValidateXml 2.0.0

    ValidateCsv

    Usage Information

    In order to fully validate XML, a schema must be provided. The ValidateXML processor allows the schema to be specified in the property ‘Schema File’. The following example illustrates how an XSD schema and XML data work together.

    Example XSD specification

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://namespace/1"
               xmlns:tns="http://namespace/1" elementFormDefault="unqualified">
        <xs:element name="bundle" type="tns:BundleType"></xs:element>
    
        <xs:complexType name="BundleType">
            <xs:sequence>
                <xs:element name="node" type="tns:NodeType" maxOccurs="unbounded" minOccurs="0"></xs:element>
            </xs:sequence>
        </xs:complexType>
        <xs:complexType name="NodeType">
            <xs:sequence>
                <xs:element name="subNode" type="tns:SubNodeType" maxOccurs="unbounded" minOccurs="0"></xs:element>
            </xs:sequence>
        </xs:complexType>
        <xs:complexType name="SubNodeType">
            <xs:sequence>
                <xs:element name="value" type="xs:string"></xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:schema>
    

    Given the schema defined in the above XSD, the following are valid XML data.

    <ns:bundle xmlns:ns="http://namespace/1">
        <node>
            <subNode>
                <value>Hello</value>
            </subNode>
            <subNode>
                <value>World!</value>
            </subNode>
        </node>
    </ns:bundle>
    
    <ns:bundle xmlns:ns="http://namespace/1">
        <node>
            <subNode>
                <value>Hello World!</value>
            </subNode>
        </node>
    </ns:bundle>
    

    The following are invalid XML data. The resulting validatexml.invalid.error attribute is shown.

    <ns:bundle xmlns:ns="http://namespace/1">
        <node>Hello World!</node>
    </ns:bundle>
    
    validatexml.invalid.error: cvc-complex-type.2.3: Element 'node' cannot have character \[children\], because the type's content type is element-only.
    
    <ns:bundle xmlns:ns="http://namespace/1">
        <node>
            <value>Hello World!</value>
        </node>
    </ns:bundle>
    
    validatexml.invalid.error: cvc-complex-type.2.4.a: Invalid content was found starting with element 'value'. One of '{subNode}' is expected.
    
Properties
System Resource Considerations
Resource Description
MEMORY While this processor supports processing XML within attributes, it is strongly discouraged to hold large amounts of data in attributes. In general, attribute values should be as small as possible and hold no more than a couple hundred characters.
Restrictions
Required Permission Explanation
reference remote resources Schema configuration can reference resources over HTTP
Relationships
Name Description
invalid FlowFiles that are not valid according to the specified schema or contain invalid XML are routed to this relationship
valid FlowFiles that are successfully validated against the schema, if provided, or verified to be well-formed XML are routed to this relationship
Writes Attributes
Name Description
validatexml.invalid.error If the flow file is routed to the invalid relationship the attribute will contain the error message resulting from the validation failure.