PutGridFS

Description:

This processor puts a file with one or more user-defined metadata values into GridFS in the configured bucket. It allows the user to define how big each file chunk will be during ingestion and provides some ability to intelligently attempt to enforce file uniqueness using filename or hash values instead of just relying on a database index.

GridFS File Attributes

PutGridFS allows for flowfile attributes that start with a configured prefix to be added to the GridFS document. These can be very useful later when working with GridFS for providing metadata about a file.

Chunk Size

GridFS splits up file into chunks within Mongo documents as the file is ingested into the database. The chunk size configuration parameter configures the maximum size of each chunk. This field should be left at its default value unless there is a specific business case to increase or decrease it.

Uniqueness Enforcement

There are four operating modes:

No enforcement at the application level.
Enforce by unique file name.
Enforce by unique hash value.
Use both hash and file name.

The hash value by default is taken from the attribute hash.value which can be generated by configuring a HashContent processor upstream of PutGridFS. Both this and the name option use a query on the existing data to see if a file matching that criteria exists before attempting to write the flowfile contents.