ReportLineageToAtlas

Description:

Report NiFi flow data set level lineage to Apache Atlas. End-to-end lineages across NiFi environments and other systems can be reported if those are connected by different protocols and data set, such as NiFi Site-to-Site, Kafka topic or Hive tables ... etc. Atlas lineage reported by this reporting task can be useful to grasp the high level relationships between processes and data sets, in addition to NiFi provenance events providing detailed event level lineage. See 'Additional Details' for further description and limitations.

Additional Details...

Tags:

atlas, lineage

Properties:

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Atlas URLsatlas-urlsComma separated URL of Atlas Servers (e.g. http://atlas-server-hostname:21000 or https://atlas-server-hostname:21443). For accessing Atlas behind Knox gateway, specify Knox gateway URL (e.g. https://knox-hostname:8443/gateway/{topology-name}/atlas). If not specified, 'atlas.rest.address' in Atlas Configuration File is used.
Supports Expression Language: true (will be evaluated using Environment variables only)
Atlas Configuration Directoryatlas-conf-dirDirectory path that contains 'atlas-application.properties' file. If not specified and 'Create Atlas Configuration File' is disabled, then, 'atlas-application.properties' file under root classpath is used.

This property requires exactly one directory to be provided..

Supports Expression Language: true (will be evaluated using Environment variables only)
Create Atlas Configuration Fileatlas-conf-createfalse
  • true
  • false
If enabled, 'atlas-application.properties' file will be created in 'Atlas Configuration Directory' automatically when this Reporting Task starts. Note that the existing configuration file will be overwritten.
Atlas Default Metadata Namespaceatlas-default-cluster-nameNamespace for Atlas entities reported by this ReportingTask. If not specified, 'atlas.metadata.namespace' or 'atlas.cluster.name' (the former having priority) in Atlas Configuration File is used. Multiple mappings can be configured by user defined properties. See 'Additional Details...' for more.
Supports Expression Language: true (will be evaluated using Environment variables only)
Lineage Strategynifi-lineage-strategySimple Path
  • Simple Path Map NiFi provenance events and target Atlas DataSets to statically created 'nifi_flow_path' Atlas Processes. See also 'Additional Details'.
  • Complete Path Create separate 'nifi_flow_path' Atlas Processes for each distinct input and output DataSet combinations by looking at the complete route for a given FlowFile. See also 'Additional Details.
Specifies granularity on how NiFi data flow should be reported to Atlas. NOTE: It is strongly recommended to keep using the same strategy once this reporting task started to keep Atlas data clean. Switching strategies will not delete Atlas entities created by the old strategy. Having mixed entities created by different strategies makes Atlas lineage graph noisy. For more detailed description on each strategy and differences, refer 'NiFi Lineage Strategy' section in Additional Details.
Provenance Record Start Positionprovenance-start-positionBeginning of Stream
  • Beginning of Stream Start reading provenance Events from the beginning of the stream (the oldest event first)
  • End of Stream Start reading provenance Events from the end of the stream, ignoring old events
If the Reporting Task has never been run, or if its state has been reset by a user, specifies where in the stream of Provenance Events the Reporting Task should start
Provenance Record Batch Sizeprovenance-batch-size1000Specifies how many records to send in a single batch, at most.
NiFi URL for Atlasatlas-nifi-urlNiFi URL is used in Atlas to represent this NiFi cluster (or standalone instance). It is recommended to use one that can be accessible remotely instead of using 'localhost'.
Supports Expression Language: true (will be evaluated using Environment variables only)
Atlas Authentication Methodatlas-authentication-methodBasic
  • Basic Use username and password.
  • Kerberos Use Kerberos keytab file.
Specify how to authenticate this reporting task to Atlas server.
Atlas Usernameatlas-usernameUser name to communicate with Atlas.
Supports Expression Language: true (will be evaluated using Environment variables only)
Atlas Passwordatlas-passwordPassword to communicate with Atlas.
Sensitive Property: true
Supports Expression Language: true (will be evaluated using Environment variables only)
Kerberos Credentials Servicekerberos-credentials-serviceController Service API:
KerberosCredentialsService
Implementation: KeytabCredentialsService
Specifies the Kerberos Credentials Controller Service that should be used for authenticating with Kerberos
Kerberos Principalnifi-kerberos-principalThe Kerberos principal for this NiFi instance to access Atlas API and Kafka brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.
Supports Expression Language: true (will be evaluated using Environment variables only)
Kerberos Keytabnifi-kerberos-keytabThe Kerberos keytab for this NiFi instance to access Atlas API and Kafka brokers. If not set, it is expected to set a JAAS configuration file in the JVM properties defined in the bootstrap.conf file. This principal will be set into 'sasl.jaas.config' Kafka's property.

This property requires exactly one file to be provided..

Supports Expression Language: true (will be evaluated using Environment variables only)
SSL Context Servicessl-context-serviceController Service API:
SSLContextService
Implementations: StandardRestrictedSSLContextService
StandardSSLContextService
Specifies the SSL Context Service to use for communicating with Atlas and Kafka.
Kafka Bootstrap Serverskafka-bootstrap-serversKafka Bootstrap Servers to send Atlas hook notification messages based on NiFi provenance events. E.g. 'localhost:9092' NOTE: Once this reporting task has started, restarting NiFi is required to changed this property as Atlas library holds a unmodifiable static reference to Kafka client.
Supports Expression Language: true (will be evaluated using Environment variables only)
Kafka Security Protocolkafka-security-protocolPLAINTEXT
  • PLAINTEXT PLAINTEXT
  • SSL SSL
  • SASL_PLAINTEXT SASL_PLAINTEXT
  • SASL_SSL SASL_SSL
Protocol used to communicate with Kafka brokers to send Atlas hook notification messages. Corresponds to Kafka's 'security.protocol' property.
Kafka Kerberos Service Namekafka-kerberos-service-namekafkaThe service name that matches the primary name of the Kafka server configured in the broker JAAS file. This can be defined either in Kafka's JAAS config or in Kafka's config. Corresponds to Kafka's 'security.protocol' property. It is ignored unless one of the SASL options of the <Security Protocol> are selected.
Supports Expression Language: true (will be evaluated using Environment variables only)
Atlas Connect Timeoutatlas-connect-timeout60 secMax wait time for connection to Atlas.
Atlas Read Timeoutatlas-read-timeout60 secMax wait time for response from Atlas.
AWS S3 Model Versionaws-s3-model-versionv2
  • v1 Creates AWS S3 directory entities version 1 (aws_s3_pseudo_dir).
  • v2 Creates AWS S3 directory entities version 2 (aws_s3_v2_directory).
Specifies what type of AWS S3 directory entities will be created in Atlas for s3a:// transit URIs (eg. PutHDFS with S3 integration). NOTE: It is strongly recommended to keep using the same AWS S3 entity model version once this reporting task started to keep Atlas data clean. Switching versions will not delete existing Atlas entities created by the old version, nor migrate them to the new version.
Filesystem Path Entities Levelfilesystem-paths-levelFile
  • File Creates File level paths.
  • Directory Creates Directory level paths.
Specifies how the filesystem path entities (fs_path and hdfs_path) will be logged in Atlas: File or Directory level. In case of File level, each individual file entity will be sent to Atlas as a separate entity with the full path including the filename. Directory level only logs the path of the parent directory without the filename. This setting affects processors working with files, like GetFile or PutHDFS. NOTE: Although the default value is File level for backward compatibility reasons, it is highly recommended to set it to Directory level because File level logging can generate a huge number of entities in Atlas.

Dynamic Properties:

Supports Sensitive Dynamic Properties: No

Dynamic Properties allow the user to specify both the name and value of a property.

NameValueDescription
hostnamePattern.<namespace>hostname Regex patternsWhite space delimited (including new line) Regular Expressions to resolve a namespace from a hostname or IP address of a transit URI of NiFi provenance record.
Supports Expression Language: true (will be evaluated using Environment variables only)

State management:

ScopeDescription
LOCALStores the Reporting Task's last event Id so that on restart the task knows where it left off.

Restricted:

This component is not restricted.

System Resource Considerations:

None specified.