HDFS Writer
Writes to files in the Hadoop Distributed File System (HDFS).
Warning
If your version of Hadoop does not include the fix for HADOOP-10786, HDFSWriter may terminate due to Kerberos ticket expiration.
To write to MapR-FS, use MapRFSWriter. HDFSWriter and MapRFSWriter use the same properties except for the difference in hadoopurl noted below and the different names for the configuration path property.
HDFS Writer properties
property | type | default value | notes |
---|---|---|---|
authentication policy | String | If the HDFS cluster uses Kerberos authentication, provide credentials in the format | |
Directory | String | The full path to the directory in which to write the files. See Setting output names and rollover / upload policies for advanced options. | |
File Name | String | The base name of the files to be written. See Setting output names and rollover / upload policies. | |
flush policy | String |
| If data is not flushed properly with the default setting, you may use this property to specify how many events Striim will accumulate before writing and/or the maximum number of seconds that will elapse between writes. For example:
Note that changing this setting may significantly degrade performance. With a setting of |
hadoopConfigurationPath | String | If using Kerberos authentication, specify the path to Hadoop configuration files such as core-site.xml and hdfs-site.xml. If this path is incorrect or the configuration changes, authentication may fail. | |
hadoopurl | String | The URI for the HDFS cluster NameNode. See below for an example. The default HDFS NameNode IPC port is 8020 or 9000 (depending on the distribution). Port 50070 is for the web UI and should not be specified here. For an HDFS cluster with high availability, use the value of the dfs.nameservices property from hdfs-site.xml with the syntax When using MapRFSWriter, you may start the URL with | |
MapRDBConfigurationPath | String | see notes for hadoopConfigurationPath | |
Rollover on DDL | Boolean | True | Has effect only when the input stream is the output stream of a CDC reader source. With the default value of True, rolls over to a new file when a DDL event is received. Set to False to keep writing to the same file. |
Rollover Policy | String |
|
This adapter has a choice of formatters. See Supported writer-formatter combinations for more information.
HDFS Writer sample application
The following sample writes some of the PosApp sample data to the file /output/hdfstestOut
in the specified HDFS instance:
CREATE SOURCE CSVSource USING FileReader ( directory:'Samples/PosApp/AppData', WildCard:'posdata.csv', positionByEOF:false ) PARSE USING DSVParser ( header:'yes' ) OUTPUT TO CsvStream; CREATE TYPE CSVType ( merchantId String, dateTime DateTime, hourValue Integer, amount Double, zip String ); CREATE STREAM TypedCSVStream OF CSVType; CREATE CQ CsvToPosData INSERT INTO TypedCSVStream SELECT data[1], TO_DATEF(data[4],'yyyyMMddHHmmss'), DHOURS(TO_DATEF(data[4],'yyyyMMddHHmmss')), TO_DOUBLE(data[7]), data[9] FROM CsvStream; CREATE TARGET hdfsOutput USING HDFSWriter( filename:'hdfstestOut.txt', hadoopurl:'hdfs://node8057.example.com:8020', flushpolicy:'interval:10,eventcount:5000', authenticationpolicy:'Kerberos,Principal:striim/node8057.example.com@STRIIM.COM, KeytabPath:/etc/security/keytabs/striim.service.keytab', hadoopconfigurationpath:'/etc/hadoop/conf', directory:'/user/striim/PosAppOutput' ) FORMAT USING DSVFormatter ( ) INPUT FROM TypedCSVStream;