HDFS Reader
Reads files from Hadoop Distributed File System (HDFS ) volumes. You can create HDFSReader sources in the web UI using Source Preview.
See Supported reader-parser combinations) for parsing options.
The output type is WAevent except when using JSONParser.
HDFS Reader properties
property | type | default value | notes |
---|---|---|---|
Authentication Policy | String | If the HDFS cluster uses Kerberos authentication, provide credentials in the format | |
Compression Type | String | Set to | |
Directory | String | optional directory from which the files specified by the wildcard property will be read; otherwise files will be read relative to the Hadoop URL | |
EOF Delay | Integer | 100 | milliseconds to wait after reaching the end of a file before starting the next read operation |
Hadoop Configuration Path | String | If using Kerberos authentication, specify the path to Hadoop configuration files such as core-site.xml and hdfs-site.xml. If this path is incorrect or the configuration changes, authentication may fail. | |
Hadoop URL | String | The URI for the HDFS cluster NameNode. See below for an example. The default HDFS NameNode IPC port is 8020 or 9000 (depending on the distribution). Port 50070 is for the web UI and should not be specified here. For an HDFS cluster with high availability, use the value of the dfs.nameservices property from hdfs-site.xml with the syntax In MapRFSReader, you may start the URL with | |
Include Subdirectories | Boolean | False | Set to True to read files in subdirectories. |
Position by EOF | Boolean | True | If set to True, reading starts at the end of the file, so only new data is acquired. If set to False, reading starts at the the beginning of the file and then continues with new data. |
Rollover Style | String | Default | Do not change. |
Skip BOM | Boolean | True | If set to True, when the wildcard value specifies multiple files, Striim will read the Byte Order Mark (BOM) in the first file and skip the BOM in all other files. If set to False, it will read the BOM in every file. |
Wildcard | String | name of the file, or a wildcard pattern to match multiple files (for example, *.xml) |
HDFS Reader example
CREATE SOURCE CSVSource USING HDFSReader ( hadoopurl:'hdfs://myserver:9000/', WildCard:'posdata.csv', positionByEOF:false )