Encrypting data using Shield

Note

In this release, Google Cloud Key Management is the only supported KMS. Striim does not support Google KMS endpoints in Google Cloud for Government.

You can use Shield to encrypt one or more fields in a data stream so that those fields remain encrypted as the data flows further downstream and, ultimately, to the external target. Shield can encrypt data from any supported Striim source. In the diagram below, we show examples where Shield encrypts data in the fields that you have specified.

Striim uses Tink, an open-source encryption library built and maintained by Google, and your Google Cloud KMS keys to encrypt the data. Tink is embedded in Striim, and it manages the entire encryption process in-memory in the Striim environment. Striim does not own, create or retain the data encryption keys or your Google Cloud KMS keys. You can decrypt the data at the target at any time using Tink and your Google Cloud KMS keys.

Typical use cases for Shield include:

If your data stream contains sensitive data such as social security numbers or credit card numbers in specific fields, you can use Shield to encrypt the data in those specific fields, thus ensuring that sensitive data is encrypted before it flows further downstream.
You can share datasets with external parties after you have encrypted specific data fields that you do not wish to share with these parties. The external parties will be unable to decrypt the encrypted data unless you share your Google Cloud KMS keys that were used to encrypt the data.

How does Shield work?

Shield is available as a component in Striim, and you can place it in any stream in your application. Shield uses Tink to implement envelope encryption on records that it processes. At a high level, Shield works as follows:

To get started with Shield,
- You must specify the numbered location of the field that you want Shield to encrypt in the Shield configuration, and Shield will encrypt the data in that numbered location in every incoming record.
- You must integrate your Google Cloud KMS with Shield so that it can generate keys for the encryption process.
Tink generates a data encryption key and uses it to encrypt the data in the fields that you have specified for encryption.
Tink uses a key generated by Google Cloud KMS, also called a wrapping key, to encrypt (or “wrap”) the data encryption key, and returns the encrypted data and the encrypted data encryption key as a single encrypted message.
If you have specified more than one field in the incoming record to be encrypted, then Shield will encrypt the data in each field separately, and each encryption process will return a separate encrypted message.
Shield will then compose the outgoing record by appropriately placing the unencrypted data and encrypted messages so that the outgoing record's structure matches the incoming record.

This process is highlighted in the figure below.

The security of your encrypted data depends in part on the wrapping key that is used to protect the data encryption key. Shield requires you to integrate with Google Cloud KMS to generate the wrapping key. Therefore, only you, and those that you share the Google Cloud KMS key with, can decrypt the encrypted data.

Benefits of using Shield

Shield is an easy, performant, and robust method for you to encrypt your data to meet your business policy requirements.

You retain control over the encryption process and encrypted data. Shield requires your KMS keys to encrypt the data that is written to the output and, ultimately, to the target. Striim does not create or retain the data encryption keys or your Google Cloud KMS keys.
Shield uses Tink as the underlying encryption engine. Tink is developed and maintained by Google, and they ensure Tink's functionality, performance, and backward compatibility. You can use Tink and your Google Cloud KMS keys to decrypt the data at the target at any time.
Shield supports Google Cloud KMS key rotation for enhanced cryptographic security.
Shield is easy to use. Striim ships with Tink. You do not need to do anything to start using Shield in the Flow Designer or with TQL beyond uploading a service account key for Google Cloud Key Management to Striim.

Using Shield connection profiles

You must set up a Shield connection profile that specifies the Google Cloud Key Management key ring that you will use with the Shield component. If the Cloud KMS service is not operating normally or Shield is unable to get a new key version from Cloud KMS, then Shield will cease operations and your application will terminate.

The use of Shield connection profiles simplifies management and enhances governance because the Shield connection profile is the central place to manage the Cloud KMS integration for Shield. You can use the same Shield connection profile with multiple Shield components in one or more Striim applications, or you can choose to use a separate connection profile for every Shield component. If you use the same Shield connection profile with multiple Shield components, then each component will interact independently with Google Cloud Key Management.

Before you set up a Shield connection profile, you must first set up Google Cloud Key Management to work with Striim, and then configure Striim to use Google Cloud Key Management.

Setting up Google Cloud Key Management to work with Striim

Create a service account (see Service accounts overview) for the project containing your Google Cloud Key Management.
Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the service account (see BigQuery > Documentation > Guides > Introduction to IAM > BigQuery predefined Cloud IAM roles).
Download the service account key for the service account (see IAM > Documentation > Guides > Create and delete service account keys).

Configure Striim to use Google Cloud Key Management

Download the service account's key file. When creating a Shield connection profile, upload the file using the UI.

Creating a Shield connection profile

Select Manage Striim > Connection Profiles > Add Connection Profile.
Enter the required properties:
- Connection Profile Name: Enter a descriptive name for the profile.
- Namespace: Select the namespace in which the profile will be created. This should be a namespace on which all developers who will use the profile have SELECT and READ permissions.
- Endpoint Type: Select GoogleKMS.
- Key URL: Specify the resource ID for the key to be used by Shield components associated with this profile (see Cloud Key Management Service > Documentation > Guides > Getting a Cloud KMS resource ID > Getting the ID for a key and version), for example, projects/myproject3-412123/locations/us-west2/keyRings/MyKeyRing/cryptoKeys/MyKey.
- Service Account Key Path: Upload the file, or specify the path to and name of the service account key you uploaded previously .
Click Save.

Using Shield components in applications

Adding a Shield component in Flow Designer

The following procedure works only when the input stream is of a user-defined type (see Type). For a stream of type WAEvent, use TQL.

Drag a Shield component from the Event Transformers section of the palette into the workspace.
Set the properties as follows:
- Name: Enter a descriptive name for the Shield.
- Shield Profile: Select the connection profile to use.
- Input: Select the input stream (if there are no streams of user-defined types, the drop-down will be blank).
- Field: Select a field to be encrypted. Optionally change the Alias to change the field's name in the output stream. Optionally click Add Field to add an additional field.
- Mode: Leave set to encrypt.
- Refresh Type: Select Time (Days) or Event (Count).
- Refresh Threshold: Specify the number of days (Refresh Type = Time) or encryption events (Refresh Type = Event) that will be processed before Striim calls Google Cloud Key Management for a new key (see Cloud Key Management Service > Documentation > Guides > Key rotation). By default, this is one day. When Refresh Type = Event, a low Refresh Threshold value may significantly degrade performance. (The Refresh Threshold value for one Shield component does not affect the performance of other Shield components that use the same connection profile.)
- Output to: Enter the name of the output stream to be created. Alternatively, if you have already created an appropriate stream, select Existing Output and choose it.
Click Save.

Using Shield components in TQL

Encrypting data uses the shieldEncrypt built-in function in a CQ. Its syntax is:

shieldEncrypt("<namespace>","<connection profile>",<field>,<refresh type>,<refresh threshold>

field is the name of the field in the Shield component's input stream whose values will be encrypted.

refresh type: Specify Time or Count.

refresh threshold: Specify the number of days (Refresh Type = Time) or encryption events (Refresh Type = Count) that will be processed before Striim calls Google Cloud Key Management for a new key. When Refresh Type = Event, a low Refresh Threshold value may significantly degrade performance. (The Refresh Threshold value for one Shield component does not affect the performance of other Shield components that use the same connection profile.)

The following is an example for a stream of a user-defined type in which the social security number is encrypted:

SELECT
  customerID, 
  lastName,
  firstName,
  shieldEncrypt("test","MyShieldProfile",SSN,"Time",1) AS SSN
FROM TypedDataStream

The following is an example for a stream of type WAEvent (see Modifying and masking values in the WAEvent data array using MODIFY):

CREATE STREAM encryptedSSNstream OF Global.WAEvent;
CREATE CQ encryptSSNata 
INSERT INTO encryptedSSNstream
SELECT r FROM cleartextStream r 
MODIFY(
  r.data[4] = shieldEncrypt("test","MyShieldProfile",r.data(4),"Time",1)
  r.data[5] = shieldEncrypt("test","MyShieldProfile",r.data(5),"Time",1)
);

Decrypting data using Shield

If you encrypt data outside of Striim, you can use Tink to decrypt it. In Striim, you can use Shield to decrypt data with Tink. The syntax is:

SELECT shieldDecrypt("<namespace>","<connection profile>","<field>") AS <alias name>

Here is an example of TQL to decrypt an encrypted Source_IP field:

SELECT shieldDecrypt("admin","MyShieldProfile",Source_IP) AS Source_IP

Runtime considerations when using Shield

If you set a low Refresh Threshold value in the Shield component configuration, Shield performance will degrade because Shield will ask Google Cloud Key Management to issue a new key version more frequently. It can take up to several seconds for Shield to receive a new key version from Google Cloud Key Management, and while waiting for the new key Shield will temporarily suspend processing of events, interrupting delivery of data to downstream components and the target, and potentially creating backpressure (see Understanding and managing backpressure).
The encrypted message in the output stream from Shield is a string, regardless of the data type of the field in the input stream. All downstream components and the external target must be configured to handle the string field, or your application will halt.
If a field is set to a fixed length, then encrypting the data in that field is likely to cause errors because the encryption will cause the length of the data in that field to increase.
If you decrypt data using Shield, you will see a drop in performance because Shield must make a call to Google Cloud Key Management to obtain the original wrapping key for every record that it decrypts.