-
Processors
- AttributeRollingWindow
- AttributesToCSV
- AttributesToJSON
- CalculateRecordStats
- CaptureChangeMySQL
- CompressContent
- ConnectWebSocket
- ConsumeAMQP
- ConsumeAzureEventHub
- ConsumeElasticsearch
- ConsumeGCPubSub
- ConsumeIMAP
- ConsumeJMS
- ConsumeKafka
- ConsumeKinesisStream
- ConsumeMQTT
- ConsumePOP3
- ConsumeSlack
- ConsumeTwitter
- ConsumeWindowsEventLog
- ControlRate
- ConvertCharacterSet
- ConvertRecord
- CopyAzureBlobStorage_v12
- CopyS3Object
- CountText
- CryptographicHashContent
- DebugFlow
- DecryptContentAge
- DecryptContentPGP
- DeduplicateRecord
- DeleteAzureBlobStorage_v12
- DeleteAzureDataLakeStorage
- DeleteByQueryElasticsearch
- DeleteDynamoDB
- DeleteFile
- DeleteGCSObject
- DeleteGridFS
- DeleteMongo
- DeleteS3Object
- DeleteSFTP
- DeleteSQS
- DetectDuplicate
- DistributeLoad
- DuplicateFlowFile
- EncodeContent
- EncryptContentAge
- EncryptContentPGP
- EnforceOrder
- EvaluateJsonPath
- EvaluateXPath
- EvaluateXQuery
- ExecuteGroovyScript
- ExecuteProcess
- ExecuteScript
- ExecuteSQL
- ExecuteSQLRecord
- ExecuteStreamCommand
- ExtractAvroMetadata
- ExtractEmailAttachments
- ExtractEmailHeaders
- ExtractGrok
- ExtractHL7Attributes
- ExtractRecordSchema
- ExtractText
- FetchAzureBlobStorage_v12
- FetchAzureDataLakeStorage
- FetchBoxFile
- FetchDistributedMapCache
- FetchDropbox
- FetchFile
- FetchFTP
- FetchGCSObject
- FetchGoogleDrive
- FetchGridFS
- FetchS3Object
- FetchSFTP
- FetchSmb
- FilterAttribute
- FlattenJson
- ForkEnrichment
- ForkRecord
- GenerateFlowFile
- GenerateRecord
- GenerateTableFetch
- GeoEnrichIP
- GeoEnrichIPRecord
- GeohashRecord
- GetAsanaObject
- GetAwsPollyJobStatus
- GetAwsTextractJobStatus
- GetAwsTranscribeJobStatus
- GetAwsTranslateJobStatus
- GetAzureEventHub
- GetAzureQueueStorage_v12
- GetDynamoDB
- GetElasticsearch
- GetFile
- GetFTP
- GetGcpVisionAnnotateFilesOperationStatus
- GetGcpVisionAnnotateImagesOperationStatus
- GetHubSpot
- GetMongo
- GetMongoRecord
- GetS3ObjectMetadata
- GetSFTP
- GetShopify
- GetSmbFile
- GetSNMP
- GetSplunk
- GetSQS
- GetWorkdayReport
- GetZendesk
- HandleHttpRequest
- HandleHttpResponse
- IdentifyMimeType
- InvokeHTTP
- InvokeScriptedProcessor
- ISPEnrichIP
- JoinEnrichment
- JoltTransformJSON
- JoltTransformRecord
- JSLTTransformJSON
- JsonQueryElasticsearch
- ListAzureBlobStorage_v12
- ListAzureDataLakeStorage
- ListBoxFile
- ListDatabaseTables
- ListDropbox
- ListenFTP
- ListenHTTP
- ListenOTLP
- ListenSlack
- ListenSyslog
- ListenTCP
- ListenTrapSNMP
- ListenUDP
- ListenUDPRecord
- ListenWebSocket
- ListFile
- ListFTP
- ListGCSBucket
- ListGoogleDrive
- ListS3
- ListSFTP
- ListSmb
- LogAttribute
- LogMessage
- LookupAttribute
- LookupRecord
- MergeContent
- MergeRecord
- ModifyBytes
- ModifyCompression
- MonitorActivity
- MoveAzureDataLakeStorage
- Notify
- PackageFlowFile
- PaginatedJsonQueryElasticsearch
- ParseEvtx
- ParseNetflowv5
- ParseSyslog
- ParseSyslog5424
- PartitionRecord
- PublishAMQP
- PublishGCPubSub
- PublishJMS
- PublishKafka
- PublishMQTT
- PublishSlack
- PutAzureBlobStorage_v12
- PutAzureCosmosDBRecord
- PutAzureDataExplorer
- PutAzureDataLakeStorage
- PutAzureEventHub
- PutAzureQueueStorage_v12
- PutBigQuery
- PutBoxFile
- PutCloudWatchMetric
- PutDatabaseRecord
- PutDistributedMapCache
- PutDropbox
- PutDynamoDB
- PutDynamoDBRecord
- PutElasticsearchJson
- PutElasticsearchRecord
- PutEmail
- PutFile
- PutFTP
- PutGCSObject
- PutGoogleDrive
- PutGridFS
- PutKinesisFirehose
- PutKinesisStream
- PutLambda
- PutMongo
- PutMongoBulkOperations
- PutMongoRecord
- PutRecord
- PutRedisHashRecord
- PutS3Object
- PutSalesforceObject
- PutSFTP
- PutSmbFile
- PutSNS
- PutSplunk
- PutSplunkHTTP
- PutSQL
- PutSQS
- PutSyslog
- PutTCP
- PutUDP
- PutWebSocket
- PutZendeskTicket
- QueryAirtableTable
- QueryAzureDataExplorer
- QueryDatabaseTable
- QueryDatabaseTableRecord
- QueryRecord
- QuerySalesforceObject
- QuerySplunkIndexingStatus
- RemoveRecordField
- RenameRecordField
- ReplaceText
- ReplaceTextWithMapping
- RetryFlowFile
- RouteHL7
- RouteOnAttribute
- RouteOnContent
- RouteText
- RunMongoAggregation
- SampleRecord
- ScanAttribute
- ScanContent
- ScriptedFilterRecord
- ScriptedPartitionRecord
- ScriptedTransformRecord
- ScriptedValidateRecord
- SearchElasticsearch
- SegmentContent
- SendTrapSNMP
- SetSNMP
- SignContentPGP
- SplitAvro
- SplitContent
- SplitExcel
- SplitJson
- SplitPCAP
- SplitRecord
- SplitText
- SplitXml
- StartAwsPollyJob
- StartAwsTextractJob
- StartAwsTranscribeJob
- StartAwsTranslateJob
- StartGcpVisionAnnotateFilesOperation
- StartGcpVisionAnnotateImagesOperation
- TagS3Object
- TailFile
- TransformXml
- UnpackContent
- UpdateAttribute
- UpdateByQueryElasticsearch
- UpdateCounter
- UpdateDatabaseTable
- UpdateRecord
- ValidateCsv
- ValidateJson
- ValidateRecord
- ValidateXml
- VerifyContentMAC
- VerifyContentPGP
- Wait
-
Controller Services
- ADLSCredentialsControllerService
- ADLSCredentialsControllerServiceLookup
- AmazonGlueSchemaRegistry
- ApicurioSchemaRegistry
- AvroReader
- AvroRecordSetWriter
- AvroSchemaRegistry
- AWSCredentialsProviderControllerService
- AzureBlobStorageFileResourceService
- AzureCosmosDBClientService
- AzureDataLakeStorageFileResourceService
- AzureEventHubRecordSink
- AzureStorageCredentialsControllerService_v12
- AzureStorageCredentialsControllerServiceLookup_v12
- CEFReader
- ConfluentEncodedSchemaReferenceReader
- ConfluentEncodedSchemaReferenceWriter
- ConfluentSchemaRegistry
- CSVReader
- CSVRecordLookupService
- CSVRecordSetWriter
- DatabaseRecordLookupService
- DatabaseRecordSink
- DatabaseTableSchemaRegistry
- DBCPConnectionPool
- DBCPConnectionPoolLookup
- DistributedMapCacheLookupService
- ElasticSearchClientServiceImpl
- ElasticSearchLookupService
- ElasticSearchStringLookupService
- EmailRecordSink
- EmbeddedHazelcastCacheManager
- ExcelReader
- ExternalHazelcastCacheManager
- FreeFormTextRecordSetWriter
- GCPCredentialsControllerService
- GCSFileResourceService
- GrokReader
- HazelcastMapCacheClient
- HikariCPConnectionPool
- HttpRecordSink
- IPLookupService
- JettyWebSocketClient
- JettyWebSocketServer
- JMSConnectionFactoryProvider
- JndiJmsConnectionFactoryProvider
- JsonConfigBasedBoxClientService
- JsonPathReader
- JsonRecordSetWriter
- JsonTreeReader
- Kafka3ConnectionService
- KerberosKeytabUserService
- KerberosPasswordUserService
- KerberosTicketCacheUserService
- LoggingRecordSink
- MapCacheClientService
- MapCacheServer
- MongoDBControllerService
- MongoDBLookupService
- PropertiesFileLookupService
- ProtobufReader
- ReaderLookup
- RecordSetWriterLookup
- RecordSinkServiceLookup
- RedisConnectionPoolService
- RedisDistributedMapCacheClientService
- RestLookupService
- S3FileResourceService
- ScriptedLookupService
- ScriptedReader
- ScriptedRecordSetWriter
- ScriptedRecordSink
- SetCacheClientService
- SetCacheServer
- SimpleCsvFileLookupService
- SimpleDatabaseLookupService
- SimpleKeyValueLookupService
- SimpleRedisDistributedMapCacheClientService
- SimpleScriptedLookupService
- SiteToSiteReportingRecordSink
- SlackRecordSink
- SmbjClientProviderService
- StandardAsanaClientProviderService
- StandardAzureCredentialsControllerService
- StandardDropboxCredentialService
- StandardFileResourceService
- StandardHashiCorpVaultClientService
- StandardHttpContextMap
- StandardJsonSchemaRegistry
- StandardKustoIngestService
- StandardKustoQueryService
- StandardOauth2AccessTokenProvider
- StandardPGPPrivateKeyService
- StandardPGPPublicKeyService
- StandardPrivateKeyService
- StandardProxyConfigurationService
- StandardRestrictedSSLContextService
- StandardS3EncryptionService
- StandardSSLContextService
- StandardWebClientServiceProvider
- Syslog5424Reader
- SyslogReader
- UDPEventRecordSink
- VolatileSchemaCache
- WindowsEventLogReader
- XMLFileLookupService
- XMLReader
- XMLRecordSetWriter
- YamlTreeReader
- ZendeskRecordSink
FetchS3Object 2.0.0
- Bundle
- org.apache.nifi | nifi-aws-nar
- Description
- Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
- Tags
- AWS, Amazon, Fetch, Get, S3
- Input Requirement
- REQUIRED
- Supports Sensitive Dynamic Properties
- false
Properties
-
AWS Credentials Provider Service
The Controller Service that is used to obtain AWS credentials provider
- Display Name
- AWS Credentials Provider Service
- Description
- The Controller Service that is used to obtain AWS credentials provider
- API Name
- AWS Credentials Provider service
- Service Interface
- org.apache.nifi.processors.aws.credentials.provider.service.AWSCredentialsProviderService
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Bucket
The S3 Bucket to interact with
- Display Name
- Bucket
- Description
- The S3 Bucket to interact with
- API Name
- Bucket
- Default Value
- ${s3.bucket}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- true
-
Communications Timeout
The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
- Display Name
- Communications Timeout
- Description
- The amount of time to wait in order to establish a connection to AWS or receive data from AWS before timing out.
- API Name
- Communications Timeout
- Default Value
- 30 secs
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Custom Signer Class Name
Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth.Signer interface.
- Display Name
- Custom Signer Class Name
- Description
- Fully qualified class name of the custom signer class. The signer must implement com.amazonaws.auth.Signer interface.
- API Name
- custom-signer-class-name
- Expression Language Scope
- Environment variables defined at JVM level and system properties
- Sensitive
- false
- Required
- true
- Dependencies
-
- Signer Override is set to any of [CustomSignerType]
-
Custom Signer Module Location
Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).
- Display Name
- Custom Signer Module Location
- Description
- Comma-separated list of paths to files and/or directories which contain the custom signer's JAR file and its dependencies (if any).
- API Name
- custom-signer-module-location
- Expression Language Scope
- Environment variables defined at JVM level and system properties
- Sensitive
- false
- Required
- false
- Dependencies
-
- Signer Override is set to any of [CustomSignerType]
-
Encryption Service
Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions.
- Display Name
- Encryption Service
- Description
- Specifies the Encryption Service Controller used to configure requests. PutS3Object: For backward compatibility, this value is ignored when 'Server Side Encryption' is set. FetchS3Object: Only needs to be configured in case of Server-side Customer Key, Client-side KMS and Client-side Customer Key encryptions.
- API Name
- encryption-service
- Service Interface
- org.apache.nifi.processors.aws.s3.AmazonS3EncryptionService
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Endpoint Override URL
Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
- Display Name
- Endpoint Override URL
- Description
- Endpoint URL to use instead of the AWS default including scheme, host, port, and path. The AWS libraries select an endpoint URL based on the AWS region, but this property overrides the selected endpoint URL, allowing use with other S3-compatible endpoints.
- API Name
- Endpoint Override URL
- Expression Language Scope
- Environment variables defined at JVM level and system properties
- Sensitive
- false
- Required
- false
-
Object Key
The S3 Object Key to use. This is analogous to a filename for traditional file systems.
- Display Name
- Object Key
- Description
- The S3 Object Key to use. This is analogous to a filename for traditional file systems.
- API Name
- Object Key
- Default Value
- ${filename}
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- true
-
Proxy Configuration Service
Specifies the Proxy Configuration Controller Service to proxy network requests. Supported proxies: HTTP + AuthN
- Display Name
- Proxy Configuration Service
- Description
- Specifies the Proxy Configuration Controller Service to proxy network requests. Supported proxies: HTTP + AuthN
- API Name
- proxy-configuration-service
- Service Interface
- org.apache.nifi.proxy.ProxyConfigurationService
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Range Length
The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object.
- Display Name
- Range Length
- Description
- The number of bytes to download from the object, starting from the Range Start. An empty value or a value that extends beyond the end of the object will read to the end of the object.
- API Name
- range-length
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
-
Range Start
The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object.
- Display Name
- Range Start
- Description
- The byte position at which to start reading from the object. An empty value or a value of zero will start reading at the beginning of the object.
- API Name
- range-start
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
-
Region
The AWS Region to connect to.
- Display Name
- Region
- Description
- The AWS Region to connect to.
- API Name
- Region
- Default Value
- us-west-2
- Allowable Values
-
- AWS GovCloud (US)
- AWS GovCloud (US-East)
- US East (N. Virginia)
- US East (Ohio)
- US West (N. California)
- US West (Oregon)
- EU (Ireland)
- EU (London)
- EU (Paris)
- EU (Frankfurt)
- EU (Zurich)
- EU (Stockholm)
- EU (Milan)
- EU (Spain)
- Asia Pacific (Hong Kong)
- Asia Pacific (Mumbai)
- Asia Pacific (Hyderabad)
- Asia Pacific (Singapore)
- Asia Pacific (Sydney)
- Asia Pacific (Jakarta)
- Asia Pacific (Melbourne)
- Asia Pacific (Tokyo)
- Asia Pacific (Seoul)
- Asia Pacific (Osaka)
- South America (Sao Paulo)
- China (Beijing)
- China (Ningxia)
- Canada (Central)
- Canada West (Calgary)
- Middle East (UAE)
- Middle East (Bahrain)
- Africa (Cape Town)
- US ISO East
- US ISOB East (Ohio)
- US ISO West
- Israel (Tel Aviv)
- Use 's3.region' Attribute
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Requester Pays
If true, indicates that the requester consents to pay any charges associated with retrieving objects from the S3 bucket. This sets the 'x-amz-request-payer' header to 'requester'.
- Display Name
- Requester Pays
- Description
- If true, indicates that the requester consents to pay any charges associated with retrieving objects from the S3 bucket. This sets the 'x-amz-request-payer' header to 'requester'.
- API Name
- requester-pays
- Default Value
- false
- Allowable Values
-
- True
- False
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- true
-
Signer Override
The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
- Display Name
- Signer Override
- Description
- The AWS S3 library uses Signature Version 4 by default but this property allows you to specify the Version 2 signer to support older S3-compatible services or even to plug in your own custom signer implementation.
- API Name
- Signer Override
- Default Value
- Default Signature
- Allowable Values
-
- Default Signature
- Signature Version 4
- Signature Version 2
- Custom Signature
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
SSL Context Service
Specifies an optional SSL Context Service that, if provided, will be used to create connections
- Display Name
- SSL Context Service
- Description
- Specifies an optional SSL Context Service that, if provided, will be used to create connections
- API Name
- SSL Context Service
- Service Interface
- org.apache.nifi.ssl.SSLContextService
- Service Implementations
- Expression Language Scope
- Not Supported
- Sensitive
- false
- Required
- false
-
Version
The Version of the Object to download
- Display Name
- Version
- Description
- The Version of the Object to download
- API Name
- Version
- Expression Language Scope
- Environment variables and FlowFile Attributes
- Sensitive
- false
- Required
- false
Relationships
Name | Description |
---|---|
failure | If the Processor is unable to process a given FlowFile, it will be routed to this Relationship. |
success | FlowFiles are routed to this Relationship after they have been successfully processed. |
Writes Attributes
Name | Description |
---|---|
s3.url | The URL that can be used to access the S3 object |
s3.bucket | The name of the S3 bucket |
path | The path of the file |
absolute.path | The path of the file |
filename | The name of the file |
hash.value | The MD5 sum of the file |
hash.algorithm | MD5 |
mime.type | If S3 provides the content type/MIME type, this attribute will hold that file |
s3.etag | The ETag that can be used to see if the file has changed |
s3.exception | The class name of the exception thrown during processor execution |
s3.additionalDetails | The S3 supplied detail from the failed operation |
s3.statusCode | The HTTP error code (if available) from the failed operation |
s3.errorCode | The S3 moniker of the failed operation |
s3.errorMessage | The S3 exception message from the failed operation |
s3.expirationTime | If the file has an expiration date, this attribute will be set, containing the milliseconds since epoch in UTC time |
s3.expirationTimeRuleId | The ID of the rule that dictates this object's expiration time |
s3.sseAlgorithm | The server side encryption algorithm of the object |
s3.version | The version of the S3 object |
s3.encryptionStrategy | The name of the encryption strategy that was used to store the S3 object (if it is encrypted) |
Use Cases
-
Fetch a specific file from S3
- Description
- Fetch a specific file from S3
- Configuration
The "Bucket" property should be set to the name of the S3 bucket that contains the file. Typically this is defined as an attribute on an incoming FlowFile, so this property is set to `${s3.bucket}`. The "Object Key" property denotes the fully qualified filename of the file to fetch. Typically, the FlowFile's `filename` attribute is used, so this property is set to `${filename}`. The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_REGION}`. The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the file.
Use Cases Involving Other Components
-
Retrieve all files in an S3 bucket
- Description
- Retrieve all files in an S3 bucket
- Keywords
- s3, state, retrieve, fetch, all, stream
- Processor Configurations
-
org.apache.nifi.processors.aws.s3.ListS3
The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_SOURCE_BUCKET}`. The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_SOURCE_REGION}`. The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket. The 'success' Relationship of this Processor is then connected to FetchS3Object.
org.apache.nifi.processors.aws.s3.FetchS3Object"Bucket" = "${s3.bucket}" "Object Key" = "${filename}" The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket. The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.
-
Retrieve only files from S3 that meet some specified criteria
- Description
- Retrieve only files from S3 that meet some specified criteria
- Keywords
- s3, state, retrieve, filter, select, fetch, criteria
- Processor Configurations
-
org.apache.nifi.processors.aws.s3.ListS3
The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_SOURCE_BUCKET}`. The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize this property by setting it to something like `#{S3_SOURCE_REGION}`. The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket. The 'success' Relationship of this Processor is then connected to RouteOnAttribute.
org.apache.nifi.processors.standard.RouteOnAttributeIf you would like to "OR" together all of the conditions (i.e., the file should be retrieved if any of the conditions are met), set "Routing Strategy" to "Route to 'matched' if any matches". If you would like to "AND" together all of the conditions (i.e., the file should only be retrieved if all of the conditions are met), set "Routing Strategy" to "Route to 'matched' if all match". For each condition that you would like to filter on, add a new property. The name of the property should describe the condition. The value of the property should be an Expression Language expression that returns `true` if the file meets the condition or `false` if the file does not meet the condition. Some attributes that you may consider filtering on are: - `filename` (the name of the file) - `s3.length` (the number of bytes in the file) - `s3.tag.<tag name>` (the value of the s3 tag with the name `tag name`) - `s3.user.metadata.<key name>` (the value of the user metadata with the key named `key name`) For example, to fetch only files that are at least 1 MB and have a filename ending in `.zip` we would set the following properties: - "Routing Strategy" = "Route to 'matched' if all match" - "At least 1 MB" = "${s3.length:ge(1000000)}" - "Ends in .zip" = "${filename:endsWith('.zip')}" Auto-terminate the `unmatched` Relationship. Connect the `matched` Relationship to the FetchS3Object processor.
org.apache.nifi.processors.aws.s3.FetchS3Object"Bucket" = "${s3.bucket}" "Object Key" = "${filename}" The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket. The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.
-
Retrieve new files as they arrive in an S3 bucket
- Description
- Retrieve new files as they arrive in an S3 bucket
- Notes
- This method of retrieving files from S3 is more efficient than using ListS3 and more cost effective. It is the pattern recommended by AWS. However, it does require that the S3 bucket be configured to place notifications on an SQS queue when new files arrive. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html
- Processor Configurations
-
org.apache.nifi.processors.aws.sqs.GetSQS
The "Queue URL" must be set to the appropriate URL for the SQS queue. It is recommended that this property be parameterized, using a value such as `#{SQS_QUEUE_URL}`. The "Region" property must be set to denote the SQS region that the queue resides in. It's a good idea to parameterize this property by setting it to something like `#{SQS_REGION}`. The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket. The 'success' relationship is connected to EvaluateJsonPath.
org.apache.nifi.processors.standard.EvaluateJsonPath"Destination" = "flowfile-attribute" "s3.bucket" = "$.Records[0].s3.bucket.name" "filename" = "$.Records[0].s3.object.key" The 'success' relationship is connected to FetchS3Object.
org.apache.nifi.processors.aws.s3.FetchS3Object"Bucket" = "${s3.bucket}" "Object Key" = "${filename}" The "Region" property must be set to the same value as the "Region" property of the GetSQS Processor. The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
See Also