Redshift Writer
Writes to one or more table(s) in a Amazon Redshift store via an Amazon S3 staging area.
Before running an application containing a Redshift Writer, you must Install the Redshift JDBC driver.
Before you create a RedshiftWriter target, we suggest you first create an S3Writer for the staging area (see S3 Writer) and verify that Striim can write to it. We recommend that the Redshift cluster's zone be in the same region as the S3 bucket (see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html). If it is not, you must set the S3region property.
After the data has been written to Redshift, the files in the S3 bucket are moved to a subdirectory called archive
. They are not deleted automatically, so you should periodically delete them. This may be automated (see https://aws.amazon.com/code/Amazon-S3/943).
Specify either the access key and secret access key or an IAM role.
Redshift Writer properties
property | type | default value | notes |
---|---|---|---|
Access Key ID | String | an AWS access key ID (created on the AWS Security Credentials page) for a user with read permission on the S3 bucket (leave blank if using an IAM role) | |
Bucket Name | String | the S3 bucket name | |
Column Delimiter | String | | (UTF-8 007C) | The character(s) used to delimit fields in the delimited text files in which the adapter accumulates batched data. If the data will contain the |
Connection URL | String | copy this from the JDBC URL field on the AWS Dashboard cluster-details page for the target cluster | |
Conversion Params | String | Optionally, specify one or more of the following Redshift Data Conversion Parameters, separated by commas:
For example, | |
Mode | String | incremental | With an input stream of a user-defined type, do not change the default. See Replicating Oracle data to Amazon Redshift for more information. |
Parallel Threads | Integer | ||
Password | encrypted password | the password for the Redshift user | |
Quote Character | String | " (UTF-8 0022) | The character(s) used to quote (escape) field values in the delimited text files in which the adapter accumulates batched data. If the data will contain |
S3 IAM Role | String | an AWS IAM role with read write permission on the bucket (leave blank if using an access key) | |
S3 Region | String | If the S3 staging area is in a different AWS region (not recommended), specify it here (see AWS Regions and Endpoints). Otherwise, leave blank. | |
Secret Access Key | encrypted password | the secret access key for the S3 staging area | |
Tables | String | The name(s) of the table(s) to write to. The table(s) must exist in Redshift and the user specified in Username must have insert permission. When the input stream of the target is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source (that is, when replicating data from one database to another), it can write to multiple tables. In this case, specify the names of both the source and target tables. You may use the source.emp,target.emp source.db1,target.db1;source.db2,target.db2 source.%,target.% source.mydatabase.emp%,target.mydb.% source1.%,target1.%;source2.%,target2.% See Replicating Oracle data to Amazon Redshift for an example. | |
Upload Policy | String | eventcount:10000, interval:5m | see S3 Writer |
Username | String | a Redshift user |
The staging area in S3 will be created at the path <bucketname> / <namespace> / <target input stream type> / <table name>
.
Redshift Writer sample application
The following describes use of RedshiftWriter with an input stream of a user-defined type. When the input is the output of a DatabaseReader, IncrementalBatchReader, or SQL CDC source, see Replicating Oracle data to Amazon Redshift.
The following example would write to a table called MyTable:
CREATE SOURCE PosSource USING FileReader ( wildcard: 'PosDataPreview.csv', directory: 'Samples/PosApp/appData', positionByEOF:false ) PARSE USING DSVParser ( header:Yes, trimquote:false ) OUTPUT TO PosSource_Stream; CREATE CQ PosSource_Stream_CQ INSERT INTO PosSource_TransformedStream SELECT TO_STRING(data[1]) AS MerchantId, TO_DATE(data[4]) AS DateTime, TO_DOUBLE(data[7]) AS AuthAmount, TO_STRING(data[9]) AS Zip FROM PosSource_Stream; CREATE TARGET testRedshiftTarget USING RedshiftWriter( ConnectionURL: 'jdbc:redshift://mys3bucket.c1ffd5l3urjx.us-west-2.redshift.amazonaws.com:5439/dev', Username:'mys3user', Password:'******', bucketname:'mys3bucket', /* for striimuser */ accesskeyid:'********************', secretaccesskey:'****************************************', Tables:'mytable' ) INPUT FROM PosSource_TransformedStream;
If this application were deployed to the namespace RS1, the staging area in S3 would be mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable
.
The target table must match PosSource_TransformedStream_Type:
create table mytable( MerchantId char(35), DateTime timestamp, AuthAmount float, Zip char(5));
After the data is written to Redshift, the intermediate files will be moved to mys3bucket / RS1 / PosSource_TransformedStream_Type / mytable / archive
.
Redshift data type correspondence
Striim data type | Redshift data type |
---|---|
java.lang.Boolean | BOOLEAN |
java.lang.Double | DOUBLE PRECISION |
java.lang.Float | REAL |
java.lang.Integer | INTEGER |
java.lang.Long | BIGINT |
java.lang.Short | SMALLINT |
java.lang.String | CHAR or VARCHAR |
org.joda.time.DateTime | TIMESTAMP |