Skip to main content

BigQuery Writer programmer's reference

BigQuery Writer properties

property

type

default value

notes

Allow Quoted Newlines

Boolean

False

This property has been deprecated and has no effect. It will be removed in a future release.

Batch Policy

String

eventCount:1000000, Interval:90

This property defines how much data can accumulate and how much time can pass before Striim sends a batch of data to BigQuery. As discussed below, it is important that these settings do not individually or in combination result in your application exceeding BigQuery's quotas or limits.

With the default setting, data will be written every 90 seconds or when the buffer accumulates 1,000,000 events.

The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded.

To disable batching, set to EventCount:1,Interval:0. This may be useful when developing and debugging an application, but would result in unacceptable performance in production.

When Streaming Upload is False, use Interval:60 so as not to exceed the quota for 1500 a day (60 seconds = one minute, and there are 1,440 minutes in a day). Limits are subject to change by Google.

Do not exceed BigQuery's quotas or limits (see Load jobs for the load method or Query jobs for the streaming method in the "Quotas and limits" section of Google's BigQuery documentation). For example, if you exceed the quota of batches per table per day day, BigQueryWriter will throw an exception such as error code 500, "An internal error occurred and the request could not be completed," and stop the application. To avoid this, reduce the number of batches by increasing the event count and/or interval. Contact Striim support if you need assistance in keeping within Google's quotas.

When Optimized Merge is true, when an event includes a primary key update, the batch is sent to the target immediately, without waiting to reach the eventCount or interval.

Monitoring reports and MON output for BigQuery Writer targets include Queued Batches Size Bytes, which reports the total current size of the buffer in bytes.

CDDL Action

String

Process

See Handling schema evolution.

When the captured source DDL may include TRUNCATE commands, to avoid deletion of events in the target, place a CQ with the select statement SELECT * FROM <input stream name> WHERE META(x, OperationName).toString() != 'Truncate'; (replacing <input stream name> with the name of the writer's input stream) between the reader and BigQuery Writer. Note that the target will have no record of affected events being deleted.

Column Delimiter

String

| (UTF-8 007C)

This property defines which character(s) will be used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the | character, change the default value to a sequence of characters that will not appear in the data.

Connection Retry Policy

String

totalTimeout=600, initialRetryDelay=10, retryDelayMultiplier=2.0, maxRetryDelay=60 , maxAttempts=5, jittered=True, initialRpcTimeout=10, rpcTimeoutMultiplier=2.0, maxRpcTimeout=30

Altering these properties can have significant effects on performance and stability. Do not change unless instructed to by Striim support.

Data Location

String

If necessary, specify the dataset's Data location property value, which specifies its Google Cloud region(s) (see BigQuery > Documentation > Resources > BigQuery locations).

Encoding

String

UTF-8

This property selects the encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see "Encoding" in BigQuery > Documentation > Guides > Introduction to loading data > Choosing a data ingestion method).

Excluded Tables

String

When a wildcard is specified for Tables, you may specify here any tables you wish to exclude from the query. Specify the value exactly as for Tables. For example, to include data from all tables in mydataset except the table named ignore:

Tables:'mydataset.%',
ExcludedTables:'mydataset.ignore'

Ignorable Exception Code

String

Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information.

No other values are supported.

Strim writes ignored exceptions to the application's exception store (see CREATE EXCEPTIONSTORE).

Include Insert ID

Boolean

True

When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer. When using the Storage Write API, this setting is ignored.

When using the legacy streaming API:

Mode

String

APPENDONLY

See Choose which writing mode to use.

Null Marker

String

NULL

When Streaming Upload is False, this property specifies a string to be inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string NULL, change this to a sequence of characters that will not appear in the data.

When Streaming Upload is True, this setting has no effect.

Optimized Merge

Boolean

false

When the source events will always include full records, leave this set to False.

Set to True only when Mode is MERGE, BigQuery Writer's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source, and the source events will include partial records. (For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates.)

Parallel Threads

Integer

See Creating multiple writer instances.

Private Service Connect Endpoint

String

If using Private Service Connect with BigQuery, specify the endpoint created in the target Virtual Private Cloud, as described in Private Service Connect support for Google cloud adapters.

Project Id

String

Specify the BigQuery project ID of the dataset. See Get the project ID for BigQuery.

Quote Character

String

"

This property specifies the character(s) used to escape field values in the delimited text files in which the adapter accumulates batches. If the data will contain the " (UTF-8 0022) character, change the value to a sequence of characters that will not appear in the data.

Service Account Key

String

Specify the path (from the UploadedFiles directory) and file name of the .json credentials file downloaded from Google (see Download the service account key).

Standard SQL

Boolean

True

With the default setting of True, BigQuery Writer constrains timestamp values to legacy SQL. Set to False to use GoogleSQL. See BigQuery > Documentation > Reference > Migrating to GoogleSQL for more information.

Do not set to false if you have partitioned your target tables (see Create the dataset(s) and target tables in BigQuery; partitioning tables).

Streaming Configuration

String

MaxRequestSizeInMB=5, MaxParallelRequests=5, ApplicationCreated­StreamMode=None, UseLegacy­StreamingApi=False

This property enables you to configure various options when using the Storage Write API or legacy streaming API. When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer.

For best performance, adjust the values of the sub-properties so as not to exceed Google's quotas (see BigQuery > Documentation > Resources > Quotas and limits). If you need assistance in keeping within Google's quotas, Contact Striim support.

When using the Storage Write API (that is, when UseLegacyStreamingApi=False):

  • ApplicationCreatedStreamMode is active only when using the Storage Write API. See BigQuery > Documentation > Guides > Batch load and stream data with BigQuery Storage Write API > Overview of the Storage Write API> Application-created streams for more information on "Committed type" application-created streams (Striim does not support the other types).

    With the default value of None, the Storage Write API's default mode is used. In this mode, Striim guarantees at-least-once processing (A1P), which means that after connection retries or recovery, the target may have duplicate records, but none will be missing.

    If you have frequent retries due to transient connectivity issues resulting in numerous duplicates, set to ApplicationCreatedStreamMode=CommittedMode to reduce the number of duplicates.

  • MaxRequestSizeInMB: size in MB which denotes the maximum size of each streaming request with maximum value of 10 MB. See BigQuery > Documentation > Resources > Quotas and limits > API quotas and limits > Storage Write API.

  • MaxParallelRequests in default mode only (ignored in Committed Mode) sets the maximum number of concurrent connections Striim will create to write to BigQuery. Setting a higher number will decrease the time required to write each streaming request.

    When the input for BigQuery Writer is from a CDC source and the mode is Append Only, set MaxParallelRequests to 1 to preserve the sequence of events. This will degrade performance, so we do not recommend setting MaxParallelRequests=1 in other situations.

When using the legacy streaming API (that is, when UseLegacyStreamingApi=True):

  • MaxRequestSizeInMB: maximum size in MB of each streaming request. The maximum value is 10. Changing to a value higher than 5 may cause “Invalid Errors” since internal metadata will be added along with the payload. See BigQuery > Documentation > Resources > Quotas and limits > Streaming inserts.

  • MaxRecordsPerRequest: use only when UseLegacyStreamingApi is True. We recommend setting this to the "Maximum rows per request" limit (see BigQuery > Documentation > Resources > Quotas and limits > Streaming inserts). Limits are subject to change by Google.

  • MaxParallelRequests sets the maximum number of concurrent connections Striim will create to write to BigQuery. Setting a higher number will decrease the time required to write each streaming request.

    When the input for BigQuery Writer is from a CDC source and the mode is Append Only, set MaxParallelRequests to 1 to preserve the sequence of events. This will degrade performance, so we do not recommend setting MaxParallelRequests=1 in other situations.

Streaming Upload

Boolean

False

With the default value of False, the writer uses the load method. Set to True to use a streaming API. See Choose which writing method to use

Tables

String

The name(s) of the table(s) to write to, in the format <dataset>.<table>. Dataset and table names are case-sensitive. The table(s) must exist when the application is started.

When the target's input stream is a user-defined event, specify a single table.

When the input stream is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), BigQuery Writer can write to multiple tables. In this case, specify the names of both the source and target tables. Uuse the % wildcard only for tables, not for schemas, databases, or datasets. If the reader uses three-part names, you must use them here as well. Note that Oracle CDB/PDB source table names must be specified in two parts when the source is Database Reader or Incremental Batch reader (schema.%,schema.%) but in three parts when the source is Oracle Reader or OJet ((database.schema.%,schema.%). Note that SQL Server source table names must be specified in three parts when the source is Database Reader or Incremental Batch Reader (database.schema.%,schema.%) but in two parts when the source is MS SQL Reader or MS Jet (schema.%,schema.%). Examples:

source.emp,target.emp
source.db1,target.db1;source.db2,target.db2
source.%,target.%
source.mydatabase.emp%,target.mydb.%
source1.%,target1.%;source2.%,target2.%

MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as <schema name>.<table name> for MySQL and Oracle and as <database name>.<schema name>.<table name> for SQL Server.

When the columns in the source and target are not in the same order, the application will halt, even if the column names are the same. To avoid this, use ColumnMap to map at least one column.

See Mapping columns for additional options.

Transport Options

String

connectTimeout=300, readTimeout=120

Sets HTTP transport timeout options in seconds (see Java > Documentation > Reference > Class HttpTransportOptions when using the legacy streaming API or Load method. With the default setting, the connect timeout is five minutes and the read timeout is two minutes. This property is ignored when using the Storage Write API.

BigQuery Writer data type support and correspondence

The following tables describe which source data types are supported by BigQuery Writer and how the source data types are mapped to BigQuery data types. See also Data type support & mapping for schema conversion & evolution.

with an Oracle source

Oracle type

BigQuery type

BFILE

unsupported

BINARY_DOUBLE

NUMERIC

BINARY_FLOAT

FLOAT64

BLOB

BYTES

CHAR

STRING

CLOB

STRING

An insert or update that contains a column of this type generates two CDC log entries: an insert or update in which the value for this column is null, followed by an update including the value.

DATE

DATE

FLOAT

FLOAT64

INTERVALDAYTOSECOND

STRING

INTERVALYEARTOMONTH

STRING

LONG

unsupported

LONG RAW

unsupported

NCHAR

STRING

NCLOB

STRING

NESTED TABLE

unsupported

NUMBER

NUMERIC

NVARCHAR2

STRING

RAW

BYTES

ROWID

unsupported

TIMESTAMP

DATETIME

TIMESTAMP WITHLOCALTIMEZONE

TIMESTAMP

TIMESTAMP WITHTIMEZONE

TIMESTAMP

UROWID

unsupported

VARCHAR2

STRING

VARRAY

unsupported

XMLTYPE

unsupported

with a PostgreSQL source

PostgreSQL type

BigQuery type

bigint

INT64

bigserial

INT64

bit

unsupported

bit varying

unsupported

boolean

BOOLEAN

box

unsupported

bytea

BYTES

character

STRING

character varying

STRING

cidr

unsupported

circle

unsupported

date

STRING

double precision

FLOAT64

inet

INT64

integer

INT64

int2

INT64

int4

INT64

int4range

STRING

int8

LONG

int8range

STRING

integer

INTEGER

interval

STRING

json

STRING

jsonb

STRING

line

unsupported

lseg

unsupported

macaddr

unsupported

money

unsupported

numeric

NUMERIC

path

unsupported

pg_lan

unsupported

point

unsupported

polygon

unsupported

real

REAL

smallint

INT64

smallserial

INT64

serial

INT64

text

STRING

time

STRING

time with time zone

TIMESTAMP

timestamp

DATETIME

timestamp with time zone

TIMESTAMP

tsquery

unsupported

tsvector

unsupported

txid_snapshot

unsupported

uuid

unsupported

xml

STRING