Skip to main content

Incremental Batch Reader

Works like DatabaseReader but has two additional properties, Check Column and Start Position, which allow you to specify that reading will begin at a user-selected position. To specify the starting point, the table(s) to be read must have a column containing either a timestamp or a sequential number. The most common use case is for populating data warehouses.

See also Spanner Batch Reader.

See the notes for Connection URL in Database Reader properties for a list of supported databases and DatabaseReader data type support and correspondence for supported data types.

Warning

For all databases, when this adapter is deployed to a Forwarding Agent, the appropriate JDBC driver must be installed as described in Installing third-party drivers in the Forwarding Agent.

Incremental Batch Reader properties

property

type

default value

notes

Check Column

String

Specify the name of the column containing the start position value. The column must have an integer or timestamp data type (such as the creation timestamp or an employee ID number) and the values must be unique and continuously increasing.

MySQL and Oracle names are case-sensitive, SQL Server names are not. Use the syntax <schema name>.<table name>=<column name> for MySQL and Oracle and <database name>.<schema name>.<table name>=<column name> for SQL Server.

If you specify multiple tables in the Tables property, you may specify different check columns for the tables separated by semicolons. In this case, you may specify the check column for the remaining tables using wildcards: for example, MYSCHEMA.TABLE1=UUID; MYSCHEMA.%=LAST_UPDATED would use UUID as the start column for TABLE1 and LAST_UPDATED as the start column for the other tables.

When rows have the same Check Column value, they will be buffered in memory until the a row with a different Check Column value is read. To avoid memory usage spikes, do not set Check Column so that many rows will have the same value. For example, selecting a single Check Column containing the day's date (rather than a timestamp) might result in thousands or millions of rows with the same value.

Connection URL

String

See Connection URL notes in Database Reader.Database Reader

Database Provider Type

String

Default

Controls which icon appears in the Flow Designer.

Excluded Tables

String

Data for any tables specified here will not be returned. For example, if Tables uses a wildcard, data from any tables specified here will be omitted. Multiple table names (separated by semicolons) and wildcards may be used exactly as for Tables.

Fetch Size

Integer

100

maximum number of records to be fetched from the database in a single JDBC method execution (see the discussion of fetchsize in the documentation for the your JDBC driver)

Password

encrypted password

The password for the specified user. See Encrypted passwords.

Polling Interval

String

120sec

This property controls how often the adapter reads from the source. By default, it checks the source for new data every two minutes (120 seconds). If there is new data, the adapter reads it and sends it to the adapter's output stream. The value may be specified in seconds (as in the default) or milliseconds (for example, 500ms).

Return DateTime As

String

Joda

Set to String to return timestamp values as strings rather than Joda timestamps. The primary purpose of this option is to avoid losing precision when microsecond timestamps are converted to Joda milliseconds. The format of the string is yyyy-mm-dd hh:mm:ss.ffffff.

SSL Config

String

If the source is Oracle and it uses SSL, specify the required SSL properties (see the notes on SSL Config in Oracle Reader properties).

Start Position

String

The value in the specified check column from which Striim will start reading. Striim will read rows in which the check column's value is the same as or greater or later than this value and skip the other rows. Since Check Column may specify multiple tables you must specify the corresponding table name or wildcard for each value. With the Check Column example above, the Start Position value could be MYSCHEMA.TABLE1=1234; MYSCHEMA.%=2018-OCT-07 18:37:55. Note that this timestamp format is just an example: your source may require a different java.sql.timestamp format, such as yyyy-mm-dd hh:mm:ss, to match the check column's data type.

Tables

String

The table(s) or view(s) to be read. MySQL, Oracle, and PostgreSQL names are case-sensitive, SQL Server names are not. Specify names as <database name>.<table name> for MySQL, <schema name>.<table name> for Oracle and PostgresQL, and <database name>.<schema name>.<table name> for SQL Server.

You may specify multiple tables and views as a list separated by semicolons or with the % wildcard. For example, HR% would read all the tables whose names start with HR. You may use the % wildcard only for tables, not for schemas or databases. The wildcard is allowed only at the end of the string: for example, mydb.prefix% is valid, but mydb.%suffix is not.

Username

String

the DBMS user name the adapter will use to log in to the server specified in ConnectionURL

For all databases, this user must have SELECT permission or privileges on the tables specified in the Tables property. For Oracle, this user must also have SELECT privileges on DBA_TAB_COLS and ALL_COLL_TYPES.

Incremental Batch Reader sample code

The following would read rows from TABLE1 with UUID column values equal to or greater than 1234: 

CREATE SOURCE OraSource USING IncrementalBatchReader  ( 
  Username: 'striim',
  Password: '********',
  ConnectionURL: '192.0.2.:1521:orcl',
  Tables: 'MYSCHEMA.TABLE1',
  CheckColumn: 'MYSCHEMA.TABLE1=UUID',
  StartPosition: 'MYSCHEMA.TABLE1=1234'
) 
OUTPUT TO OraSourceOutput;

If IncrementalBatchReader sends duplicate records to a DatabaseWriter target, by default the application will terminate. This can happen, for example, when recovery is enabled (see Recovering applications), there are multiple rows with the same CheckColumn timestamp, and only some of them were written before a system failure,  To avoid this, specify the appropriate IgnorableException in the target: for example, for CosmosDBWriter, RESOURCE_ALREADY_EXISTS.Recovering applications