StartAwsTextractJob 2.0.0

Bundle
org.apache.nifi | nifi-aws-nar
Description
Trigger a AWS Textract job. It should be followed by GetAwsTextractJobStatus processor in order to monitor job status.
Tags
AWS, Amazon, ML, Machine Learning, Textract
Input Requirement
Supports Sensitive Dynamic Properties
false
  • Additional Details for StartAwsTextractJob 2.0.0

    Amazon Textract

    StartAwsTextractJob

    Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

    Usage

    Amazon ML Processors are implemented to utilize ML services based on the official AWS API Reference. You can find example json payload in the documentation at the Request Syntax sections. For more details please check the official Textract API reference With this processor you will trigger a startDocumentAnalysis, startDocumentTextDetection or startExpenseAnalysis async call according to your type of textract settings. You can define json payload as property or provide as a flow file content. Property has higher precedence. After the job is triggered the serialized json response will be written to the output flow file. The awsTaskId attribute will be populated, so it makes it easier to query job status by the corresponding get job status processor.

    Three different type of textract task are supported: Documnet Analysis, Text Detection, Expense Analysis.

    DocumentAnalysis

    Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements. API Reference

    Example payload:

    {
      "ClientRequestToken": "string",
      "DocumentLocation": {
        "S3Object": {
          "Bucket": "string",
          "Name": "string",
          "Version": "string"
        }
      },
      "FeatureTypes": [
        "string"
      ],
      "JobTag": "string",
      "KMSKeyId": "string",
      "NotificationChannel": {
        "RoleArn": "string",
        "SNSTopicArn": "string"
      },
      "OutputConfig": {
        "S3Bucket": "string",
        "S3Prefix": "string"
      },
      "QueriesConfig": {
        "Queries": [
          {
            "Alias": "string",
            "Pages": [
              "string"
            ],
            "Text": "string"
          }
        ]
      }
    }
    

    ExpenseAnalysis

    Starts the asynchronous analysis of invoices or receipts for data like contact information, items purchased, and vendor names. API Reference

    Example payload:

    {
      "ClientRequestToken": "string",
      "DocumentLocation": {
        "S3Object": {
          "Bucket": "string",
          "Name": "string",
          "Version": "string"
        }
      },
      "JobTag": "string",
      "KMSKeyId": "string",
      "NotificationChannel": {
        "RoleArn": "string",
        "SNSTopicArn": "string"
      },
      "OutputConfig": {
        "S3Bucket": "string",
        "S3Prefix": "string"
      }
    }
    

    StartDocumentTextDetection

    Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text. API Reference

    Example payload:

    {
      "ClientRequestToken": "string",
      "DocumentLocation": {
        "S3Object": {
          "Bucket": "string",
          "Name": "string",
          "Version": "string"
        }
      },
      "JobTag": "string",
      "KMSKeyId": "string",
      "NotificationChannel": {
        "RoleArn": "string",
        "SNSTopicArn": "string"
      },
      "OutputConfig": {
        "S3Bucket": "string",
        "S3Prefix": "string"
      }
    }
    
Properties
Relationships
Name Description
original Upon successful completion, the original FlowFile will be routed to this relationship.
success FlowFiles are routed to success relationship
failure FlowFiles are routed to failure relationship
Writes Attributes
Name Description
awsTaskId The task ID that can be used to poll for Job completion in GetAwsTextractJobStatus
awsTextractType The selected Textract type, which can be used in GetAwsTextractJobStatus
See Also