Skip to main content

Incremental Batch Reader

You can use Incremental Batch Reader to get new data from a supported database or data warehouse. The most common use case for Incremental Batch Reader is to continuously update targets with new source data after initial load using Database Reader has completed, particularly when a CDC reader is not available for the source or when you choose not to use the available CDC reader.

To read incrementally from Spanner, see Spanner Batch Reader.

Incremental Batch Reader supports the same data types as Database Reader (see DatabaseReader data type support and correspondence for supported data types).

BigQuery (see Runtime considerations when using Database Reader or Incremental Batch Reader with BigQuery sources)
Db2 for z/OS
HP NonStop SQL/MX (and SQL/MP via aliases in SQL/MX) versions 3.2.1 to 3.8.1
MariaDB (including Amazon RDS for MariaDB, MariaDB Galera Cluster, MariaDB Xpand, and SkySQL) versions 10.1 and later
MySQL (including Amazon Aurora for MySQL, Amazon RDS for MySQL, Azure Database for MySQL, and Google Cloud SQL for MySQL) versions 5 and 8
Oracle (including Amazon RDS for Oracle) versions 11G, 12C, 18C, 19C, and 21C
PostgreSQL (including Amazon Aurora with PostgreSQL compatibility, Amazon RDS for PostgreSQL, Azure Database for PostgreSQL, Azure Database for PostgreSQL - Flexible Server, Crunchy Postgres, Google AlloyDB for PostgreSQL, Google Cloud SQL for PostgreSQL), and Neon, versions 8.4 to 17
Snowflake
SQL Server (including Amazon RDS for SQL Server and Azure SQL Database) versions 2012, 2014, 2016, and 2017
Sybase ASE versions 16.0 and later (Linux)
Teradata
YugabyteDB versions 2.18 and later

Three properties control incremental reading:

Check Column specifies the column Incremental Batch Reader will use to identify which data is new.
Start Position specifies the Check Column value from which reading will start.
Polling Interval controls how often Incremental Batch Reader checks for new data. The first batch will begin with the Start Position value. Each successive batch will begin with the table rows immediately after the last row included in the previous batch.

Warning

For all databases, when this adapter is deployed to a Forwarding Agent, the appropriate JDBC driver must be installed as described in Installing third-party drivers in the Forwarding Agent.

Differences between Incremental Batch Reader and CDC readers

SQL CDC readers capture INSERT, UPDATE, and DELETE operations, so if a writer supports Merge mode, all changes in the source can be replicated in the target. Incremental Batch Reader treats both INSERT and UPDATE operations as inserts and does not capture deletes, so a writer in Merge mode will not be able to replicate all changes to the source in the target, and in Append Only mode deletes in the source will not be captured in the target.

Incremental Batch Reader properties

property	type	default value	notes
Check Column	String		Specify the name of the column containing the start position value. The column must have an integer or timestamp data type (such as the creation timestamp or an employee ID number) and the values must be unique and continuously increasing. For Db2 for z/OS, specify a BIGINT, INTEGER, or TIMESTAMP column. MySQL and Oracle names are case-sensitive, SQL Server names are not. Use the syntax `<schema name>.<table name>=<column name>` for MySQL and Oracle and `<database name>.<schema name>.<table name>=<column name>` for SQL Server. If you specify multiple tables in the Tables property, you may specify different check columns for the tables separated by semicolons. In this case, you may specify the check column for the remaining tables using wildcards: for example, `MYSCHEMA.TABLE1=UUID; MYSCHEMA.%=LAST_UPDATED` would use UUID as the start column for TABLE1 and LAST_UPDATED as the start column for the other tables. When rows have the same Check Column value, they will be buffered in memory until the a row with a different Check Column value is read. To avoid memory usage spikes, do not set Check Column so that many rows will have the same value. For example, selecting a single Check Column containing the day's date (rather than a timestamp) might result in thousands or millions of rows with the same value.
Connection Pool Size	Integer	1	Specify the maximum number of connections that Striim will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that ThreadPool Size is set to the same or a higher number.
Connection Profile Name	enum		Appears in Flow Designer only when Database Provider Type is BigQuery and Use Connection Profile is True. See Introducing connection profiles.
Connection URL	String		See Connection URL notes in Database Reader.Database Reader
Database Provider Type	String	Default	Controls which icon appears in the Flow Designer and whether Use Connection Profile appears in the web UI.
Excluded Tables	String		Data for any tables specified here will not be returned. For example, if `Tables` uses a wildcard, data from any tables specified here will be omitted. Multiple table names (separated by semicolons) and wildcards may be used exactly as for `Tables`.
Fetch Size	Integer	100	Sets the maximum number of records to be fetched from the database in a single JDBC method execution (see the discussion of fetchsize in the documentation for your JDBC driver).
Password	encrypted password		The password for the specified user. See Encrypted passwords.
Polling Interval	String	120sec	This property controls how often the adapter reads from the source. By default, it checks the source for new data every two minutes (120 seconds). If there is new data, the adapter reads it and sends it to the adapter's output stream. The value may be specified in seconds (as in the default) or milliseconds (for example, 500ms).
Return DateTime As	String		If no value is specified, timestamp values are re turned as Joda timestamps. Set to `String` to return timestamp values as strings. The primary purpose of this option is to avoid losing precision when microsecond timestamps are converted to Joda milliseconds. The format of the string is yyyy-mm-dd hh:mm:ss.ffffff. When the source is Db2 for z/OS, set to `Java8DateTime` or `String`. When the source is BigQuery or Snowflake, this setting is ignored and timestamp values are always returned as Java DateTime.
SSL Config	String		If the source is Oracle and it uses SSL, specify the required SSL properties (see the notes on SSL Config in Oracle Reader properties).
Start Position	String		The value in the specified check column from which Striim will start reading. Striim will read rows in which the check column's value is the same as or greater or later than this value and skip the other rows. Since Check Column may specify multiple tables you must specify the corresponding table name or wildcard for each value. With the Check Column example above, the Start Position value could be `MYSCHEMA.TABLE1=1234; MYSCHEMA.%=2018-OCT-07 18:37:55`. Note that this timestamp format is just an example: your source may require a different java.sql.timestamp format, such as `yyyy-mm-dd hh:mm:ss`, to match the check column's data type.
Tables	String		Do not modify this property when recovery is enabled for the application. The table(s) or view(s) to be read. MySQL, Oracle, and PostgreSQL names are case-sensitive, SQL Server names are not. Specify names as `<dataset name>.<table name>` for BigQuery, `<database name>.<table name>` for MySQL, `<schema name>.<table name>` for Oracle and PostgresQL, `<schema name>.<table name>` for Snowflake, and `<database name>.<schema name>.<table name>` for SQL Server. You may specify multiple tables and views as a list separated by semicolons or with the `%` wildcard. For example, `HR%` would read all the tables whose names start with HR. You may use the `%` wildcard only for tables, not for schemas or databases. The wildcard is allowed only at the end of the string: for example, `mydb.prefix%` is valid, but `mydb.%suffix` is not.
ThreadPool Size	Integer	5	Specify the maximum number of threads the adapter will use. This is the maximum number of tables Striim will be able to read simultaneously, provided that Connection Pool Size is set to the same or a higher number.
Use Connection Profile	Boolean		See Introducing connection profiles. Appears in web UI only when Database Provider Type is BigQuery or Snowflake.
Username	String		Specify the DBMS user name the adapter will use to log in to the server specified in ConnectionURL. For all databases, this user must have SELECT permission or privileges on the tables specified in the Tables property. For Db2 on z/OS, this user must also have SELECT privileges on the SYSCOLUMNS, SYSTABLES, and SYSVIEWS catalog tables. For Oracle, this user must also have SELECT privileges on DBA_TAB_COLS and ALL_COLL_TYPES.

Incremental Batch Reader sample code

The following would read rows from TABLE1 with UUID column values equal to or greater than 1234:

CREATE SOURCE OraSource USING IncrementalBatchReader  ( 
  Username: 'striim',
  Password: '********',
  ConnectionURL: '192.0.2.:1521:orcl',
  Tables: 'MYSCHEMA.TABLE1',
  CheckColumn: 'MYSCHEMA.TABLE1=UUID',
  StartPosition: 'MYSCHEMA.TABLE1=1234'
) 
OUTPUT TO OraSourceOutput;

If IncrementalBatchReader sends duplicate records to a DatabaseWriter target, by default the application will terminate. This can happen, for example, when recovery is enabled (see Recovering applications), there are multiple rows with the same CheckColumn timestamp, and only some of them were written before a system failure, To avoid this, specify the appropriate IgnorableException in the target: for example, for CosmosDBWriter, RESOURCE_ALREADY_EXISTS.Recovering applications

Would you like to provide feedback? Just click here to suggest edits.