File Reader

Reads files from disk using a compatible parser.

You can create FileReader sources in the web UI using Source Preview.

See Supported reader-parser combinations) for parsing options.

File Reader properties

property	type	default value	notes
Block Size	Integer	64	amount of data in KB for each read operation
Compression Type	String		Set to `gzip` when `wildcard` specifies a file or files in gzip format. Otherwise, leave blank.
Directory	String		Specify the path to the directory containing the file(s). The path may be relative to the Striim installation directory (for example, Samples/PosApp/appdata) or from the root.
Include Subdirectories	Boolean	False	Set to True if the files are written to subdirectories of the `Directory` path, for example, if each day's files are in a subdirectory named by date. When this property is False, the filename metadata in the File Reader output includes only the file name. When this property is True, the filename metadata in the File Reader output includes the absolute path to the file.
Position By EOF	Boolean	True	If set to True, reading starts at the end of the file, so only new data is acquired. If set to False, reading starts at the the beginning of the file and then continues with new data. When FileReader is used with a cache, this setting is ignored and reading always begins from the beginning of the file. When you create a a FileReader using Source Preview, this is set to False.
Rollover Style	String	Default	Set to `log4j` if reading Log4J files created using RollingFileAppender.
Skip BOM	Boolean	True	If set to True, when the wildcard value specifies multiple files, Striim will read the Byte Order Mark (BOM) in the first file and skip the BOM in all other files. If set to False, it will read the BOM in every file.
Wildcard	String		Specify the name of the file, or a wildcard pattern to match multiple files (for example, *.xml). Do not modify this property when recovery is enabled for the application. When reading multiple files, Striim will read them in the default order for the operating system. FileReader is lexicographically aware only when a restart of an application takes place. For example: a source checkpoint at B.txt will not capture A.txt if introduced immediately after restart. To overcome this, C.txt could be added followed by A.txt to bypass the lexicographical constraint. While File Reader is reading a file, it will ignore any changes to the portion of the file that has already been read. An already read file if modified will result in duplicates, as it will be read in its entirety. If a file is modified after File Reader has read it, it will be read again, resulting in it sending duplicate events. When an app is down, changes to any files are not captured.

The output type is WAevent except when using Avro Parser or JSONParser.

File Reader sample code

When used with DSV Parser, the type for the output stream can be created automatically from the file header (see Creating the FileReader output stream type automatically).

Striim also provides wizards for creating applications that read from files and write to various targets. See Creating an application using a wizard for details.

An example from the PosApp sample application:

CREATE SOURCE CsvDataSource USING FileReader (
  directory:'Samples/PosApp/appData',
  wildcard:'posdata.csv',
  positionByEOF:false
)
PARSE USING DSVParser (
  header:Yes,
  trimquote:false
)
OUTPUT TO CsvStream;

See PosApp for a detailed explanation and MultiLogApp for additional examples.

Creating the output stream type automatically

When FileReader is used with DSV Parser, the type for the output stream can be created automatically from the file header using OUTPUT TO <stream name> MAP(filename:'<source file name>') . For example:

CREATE SOURCE CsvDataSource USING FileReader (
  directory:'Samples/PosApp/appData',
  wildcard:'posdata*.csv',
  positionByEOF:false
)
PARSE USING DSVParser (
  header:Yes,
  trimquote:false
)
OUTPUT TO CsvStream MAP(filename:’posdata*.csv’);

Notes:

The specified source file must exist when the source is created.
The header must be the first line of the file (the HeaderLineNo setting is ignored by MAP).
If multiple files are specified by the wildcard property, the header will be taken from the first one read.
All files must be like the first one read, with headers in the first line and the same number of fields.

Creating the FileReader output stream type automatically

CREATE SOURCE PosSource USING FileReader (
  wildcard: 'PosDataPreview*.csv',
  directory: 'Samples/PosApp/appData',
  positionByEOF:false )
PARSE USING DSVParser (
  header:Yes,
  trimquote:false
)
OUTPUT TO PosSource_Stream,
OUTPUT TO PosSource_Mapped_Stream MAP(filename:'PosDataPreview.csv');