BigQuery Writer programmer's reference
BigQuery Writer properties
property | type | default value | notes |
---|---|---|---|
Allow Quoted Newlines | Boolean | False | This property has been deprecated and has no effect. It will be removed in a future release. |
Batch Policy | String | eventCount:1000000, Interval:90 | This property defines how much data can accumulate and how much time can pass before Striim sends a batch of data to BigQuery. As discussed below, it is important that these settings do not individually or in combination result in your application exceeding BigQuery's quotas or limits. With the default setting, data will be written every 90 seconds or when the buffer accumulates 1,000,000 events. The batch policy includes eventCount and interval (see Setting output names and rollover / upload policies for syntax). Events are buffered locally on the Striim server and sent as a batch to the target every time either of the specified values is exceeded. When the app is stopped, any remaining data in the buffer is discarded. To disable batching, set to When Streaming Upload is False, use Do not exceed BigQuery's quotas or limits (see Load jobs for the load method or Query jobs for the streaming method in the "Quotas and limits" section of Google's BigQuery documentation). For example, if you exceed the quota of batches per table per day day, BigQueryWriter will throw an exception such as error code 500, "An internal error occurred and the request could not be completed," and stop the application. To avoid this, reduce the number of batches by increasing the event count and/or interval. Contact Striim support if you need assistance in keeping within Google's quotas. When Optimized Merge is true, when an event includes a primary key update, the batch is sent to the target immediately, without waiting to reach the eventCount or interval. Monitoring reports and MON output for BigQuery Writer targets include Queued Batches Size Bytes, which reports the total current size of the buffer in bytes. |
CDDL Action | String | Process | See Handling schema evolution. When the captured source DDL may include TRUNCATE commands, to avoid deletion of events in the target, place a CQ with the select statement |
Column Delimiter | String | | (UTF-8 007C) | This property defines which character(s) will be used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the |
Connection Retry Policy | String | totalTimeout=600, initialRetryDelay=10, retryDelayMultiplier=2.0, maxRetryDelay=60 , maxAttempts=5, jittered=True, initialRpcTimeout=10, rpcTimeoutMultiplier=2.0, maxRpcTimeout=30 | Altering these properties can have significant effects on performance and stability. Do not change unless instructed to by Striim support. |
Data Location | String | If necessary, specify the dataset's Data location property value, which specifies its Google Cloud region(s) (see BigQuery > Documentation > Resources > BigQuery locations). | |
Encoding | String | UTF-8 | This property selects the encoding for the delimited text files in which BigQueryWriter accumulates batched data. Currently the only supported encoding is UTF-8 (see "Encoding" in BigQuery > Documentation > Guides > Introduction to loading data > Choosing a data ingestion method). |
Excluded Tables | String | When a wildcard is specified for Tables:'mydataset.%', ExcludedTables:'mydataset.ignore' | |
Ignorable Exception Code | String | Set to TABLE_NOT_FOUND to prevent the application from terminating when Striim tries to write to a table that does not exist in the target. See Handling "table not found" errors for more information. No other values are supported. Strim writes ignored exceptions to the application's exception store (see CREATE EXCEPTIONSTORE). | |
Include Insert ID | Boolean | True | When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer. When using the Storage Write API, this setting is ignored. When using the legacy streaming API:
|
Mode | String | APPENDONLY | |
Null Marker | String | NULL | When Streaming Upload is False, this property specifies a string to be inserted into fields in the delimited text files in which BigQueryWriter accumulates batched data to indicate that a field has a null value. These are converted back to nulls in the target tables. If any field might contain the string When Streaming Upload is True, this setting has no effect. |
Optimized Merge | Boolean | false | When the source events will always include full records, leave this set to False. Set to True only when Mode is MERGE, BigQuery Writer's input stream is the output of an HP NonStop reader, MySQL Reader, or Oracle Reader source, and the source events will include partial records. (For example, with Oracle Reader, when supplemental logging has not been enabled for all columns, partial records are sent for updates.) |
Parallel Threads | Integer | ||
Private Service Connect Endpoint | String | If using Private Service Connect with BigQuery, specify the endpoint created in the target Virtual Private Cloud, as described in Private Service Connect support for Google cloud adapters. | |
Project Id | String | Specify the BigQuery project ID of the dataset. See Get the project ID for BigQuery. | |
Quote Character | String |
| This property specifies the character(s) used to escape field values in the delimited text files in which the adapter accumulates batches. If the data will contain the |
Service Account Key | String | Specify the path (from the UploadedFiles directory) and file name of the .json credentials file downloaded from Google (see Download the service account key). | |
Standard SQL | Boolean | True | With the default setting of True, BigQuery Writer constrains timestamp values to legacy SQL. Set to False to use GoogleSQL. See BigQuery > Documentation > Reference > Migrating to GoogleSQL for more information. Do not set to false if you have partitioned your target tables (see Create the dataset(s) and target tables in BigQuery; partitioning tables). |
Streaming Configuration | String | MaxRequestSizeInMB=5, MaxParallelRequests=5, ApplicationCreatedStreamMode=None, UseLegacyStreamingApi=False | This property enables you to configure various options when using the Storage Write API or legacy streaming API. When Streaming Upload is False, this setting is ignored, and is not displayed in the Flow Designer. For best performance, adjust the values of the sub-properties so as not to exceed Google's quotas (see BigQuery > Documentation > Resources > Quotas and limits). If you need assistance in keeping within Google's quotas, Contact Striim support. When using the Storage Write API (that is, when UseLegacyStreamingApi=False):
When using the legacy streaming API (that is, when UseLegacyStreamingApi=True):
|
Streaming Upload | Boolean | False | With the default value of False, the writer uses the load method. Set to True to use a streaming API. See Choose which writing method to use |
Tables | String | The name(s) of the table(s) to write to, in the format When the target's input stream is a user-defined event, specify a single table. When the input stream is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), BigQuery Writer can write to multiple tables. In this case, specify the names of both the source and target tables. Uuse the source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% MySQL and Oracle names are case-sensitive, SQL Server names are not. Specify names as When the columns in the source and target are not in the same order, the application will halt, even if the column names are the same. To avoid this, use ColumnMap to map at least one column. See Mapping columns for additional options. | |
Transport Options | String | connectTimeout=300, readTimeout=120 | Sets HTTP transport timeout options in seconds (see Java > Documentation > Reference > Class HttpTransportOptions when using the legacy streaming API or Load method. With the default setting, the connect timeout is five minutes and the read timeout is two minutes. This property is ignored when using the Storage Write API. |
BigQuery Writer data type support and correspondence
The following tables describe which source data types are supported by BigQuery Writer and how the source data types are mapped to BigQuery data types. See also Data type support & mapping for schema conversion & evolution.
with an Oracle source
Oracle type | BigQuery type |
---|---|
BFILE | unsupported |
BINARY_DOUBLE | NUMERIC |
BINARY_FLOAT | FLOAT64 |
BLOB | BYTES |
CHAR | STRING |
CLOB | STRING An insert or update that contains a column of this type generates two CDC log entries: an insert or update in which the value for this column is null, followed by an update including the value. |
DATE | DATE |
FLOAT | FLOAT64 |
INTERVALDAYTOSECOND | STRING |
INTERVALYEARTOMONTH | STRING |
LONG | unsupported |
LONG RAW | unsupported |
NCHAR | STRING |
NCLOB | STRING |
NESTED TABLE | unsupported |
NUMBER | NUMERIC |
NVARCHAR2 | STRING |
RAW | BYTES |
ROWID | unsupported |
TIMESTAMP | DATETIME |
TIMESTAMP WITHLOCALTIMEZONE | TIMESTAMP |
TIMESTAMP WITHTIMEZONE | TIMESTAMP |
UROWID | unsupported |
VARCHAR2 | STRING |
VARRAY | unsupported |
XMLTYPE | unsupported |
with a PostgreSQL source
PostgreSQL type | BigQuery type |
---|---|
bigint | INT64 |
bigserial | INT64 |
bit | unsupported |
bit varying | unsupported |
boolean | BOOLEAN |
box | unsupported |
bytea | BYTES |
character | STRING |
character varying | STRING |
cidr | unsupported |
circle | unsupported |
date | STRING |
double precision | FLOAT64 |
inet | INT64 |
integer | INT64 |
int2 | INT64 |
int4 | INT64 |
int4range | STRING |
int8 | LONG |
int8range | STRING |
integer | INTEGER |
interval | STRING |
json | STRING |
jsonb | STRING |
line | unsupported |
lseg | unsupported |
macaddr | unsupported |
money | unsupported |
numeric | NUMERIC |
path | unsupported |
pg_lan | unsupported |
point | unsupported |
polygon | unsupported |
real | REAL |
smallint | INT64 |
smallserial | INT64 |
serial | INT64 |
text | STRING |
time | STRING |
time with time zone | TIMESTAMP |
timestamp | DATETIME |
timestamp with time zone | TIMESTAMP |
tsquery | unsupported |
tsvector | unsupported |
txid_snapshot | unsupported |
uuid | unsupported |
xml | STRING |