ListHDFS Filter Modes

There are three filter modes available for ListHDFS that determine how the regular expression in the File Filter property will be applied to listings in HDFS.

Examples:

For the given examples, the following directory structure is used:

data
├── readme.txt
├── bin
│   ├── readme.txt
│   ├── 1.bin
│   ├── 2.bin
│   └── 3.bin
├── csv
│   ├── readme.txt
│   ├── 1.csv
│   ├── 2.csv
│   └── 3.csv
└── txt
       ├── readme.txt
       ├── 1.txt
       ├── 2.txt
       └── 3.txt


Directories and Files

This mode is useful when the listing should match the names of directories and files with the regular expression defined in File Filter. When Recurse Subdirectories is true, this mode allows the user to filter for files in subdirectories with names that match the regular expression defined in File Filter.

ListHDFS configuration:
PropertyValue
Directory/data
Recurse Subdirectoriestrue
File Filter.*txt.*
Filter ModeDirectories and Files

ListHDFS results:

Files Only

This mode is useful when the listing should match only the names of files with the regular expression defined in File Filter. Directory names will not be matched against the regular expression defined in File Filter. When Recurse Subdirectories is true, this mode allows the user to filter for files in the entire subdirectory tree of the directory specified in the Directory property.

ListHDFS configuration:
PropertyValue
Directory/data
Recurse Subdirectoriestrue
File Filter[^\.].*\.txt
Filter ModeFiles Only

ListHDFS results:

Full Path

This mode is useful when the listing should match the entire path of a file with the regular expression defined in File Filter. When Recurse Subdirectories is true, this mode allows the user to filter for files in the entire subdirectory tree of the directory specified in the Directory property while allowing filtering based on the full path of each file.

ListHDFS configuration:
PropertyValue
Directory/data
Recurse Subdirectoriestrue
File Filter(/.*/)*csv/.*
Filter ModeFull Path

ListHDFS results: