Skip to main content

Creating multiple writer instances

Some writers have a Parallel Threads property, which in some circumstances may allow you to create multiple instances for better performance. For example:

CREATE TARGET KWSample USING KafkaWriter VERSION '0.9.0' (
  brokeraddress:'localhost:9092',
  topic:'test',
  ParallelThreads:'4',
  PartitionKey:'merchantId'
)
FORMAT USING DSVFormatter ()
INPUT FROM TypedCSVStream;

This would create four instances of KafkaWriter with identical settings. Each instance would run in its own thread, increasing throughput. If KWSample were deployed ON ALL to multiple servers, each server would run four instances, so the total number of KafkaWriter instances would be the ParallelThreads value times the number of servers.

Warning

Use ParallelThreads only when the target is not able to keep up with incoming events (that is, when its input stream is backpressured). Otherwise, the overhead imposed by additional threads could reduce the application's performance.

Database Reader reads tables sequentially rather than in parallel. Consequently, creating more target instances than there are Database Reader sources will not improve performance. You may be able to improve performance by creating multiple Database Reader sources that read from different tables and all output to the same stream, and using that stream as the input for a target with multiple instances.

Parallel threads are not supported when the writer's input stream is the output of Cosmos DB Reader or Mongo Cosmos DB Reader in incremental mode.

Evaluating when to use parallel threads

When you can use parallel threads to improve performance depends on the writer.

writer

event distribution

limitations

notes

  • Cassandra Cosmos DB Writer

  • Cosmos DB Writer

  • Database Writer

  • HBase Writer

  • Kudu Writer

  • MaprDB Writer

  • Spanner Writer

Events are evenly distributed among the writer instances in round-robin fashion. All instances may write to all target tables.

Enabling recovery for the application disables parallel threads.

Use only for initial load, not for continuous replication.

  • Azure Synapse

  • BigQuery Writer

  • Databricks Writer

  • Hive Writer

  • Redshift Writer

  • Salesforce Writer

  • ServiceNow Writer

  • Snowflake Writer

Events are distributed among the writer instances based on the target table name. Each target table will be written to by only one of the instances.

Enabling recovery for the application disables parallel threads.

Use only for initial load, not for continuous replication.

Creating more instances than there are target tables will not improve performance.

  • GCS Writer

  • S3 Writer

Events are distributed among the writer instances based on the target bucket name, directory name, and file name. Each file will be written to by only one of the instances.

You must use dynamic names for target buckets, directories, or files. Otherwise, parallel threads will not improve performance.

Creating more instances than there are target files will not improve performance.

Kafka Writer

Events are be distributed among the writer instances by the PartitionKey field value. Each target partition will be written to by only one of the instances.

Creating more instances than there are target partitions will not improve performance.