Securing Data in Striim: Cryptographic Key Management in Striim, Field-Level Encryption, and Vault Integration

Table of Contents

How Striim Keeps Enterprise Grade Streaming Workloads Safe

The Streaming Data Security Challenge

At Striim our approach to security encompasses protecting data both in motion and at rest, utilizing advanced encryption techniques and integrating with powerful tools.

In this blog, I’ll walk you through 2 small pieces of this architecture on how Striim secures sensitive data & metadata (a) in motion and (b) at rest. 

The Striim Shield: Protecting Data In Motion

One of the standout features gaining traction in this landscape is field-level encryption within data pipelines. While some startups have made this their sole focus, Striim takes it a step further by offering this capability at no extra cost through our innovative practice known as encrypted streaming.

With Shield, you can easily encrypt one or more fields within a data stream with your own self-managed encryption key. This means that as your data flows downstream—eventually reaching its external target—those sensitive fields remain securely encrypted.

In the above, you can clearly see that the SSN, or social security number, is being encrypted through the Shield component.

So, how does this work?

The system architecture is centered around a Key Management Service (KMS) that maintains a master Key Encryption Key (KEK). This KEK is utilized to encrypt the Data Encryption Key (DEK), which is generated by the Striim server using the Tink cryptographic library.

The encryption process is optimized through a batching mechanism. A Tink client is batched to utilize a specific KEK for a predetermined number of events flowing through the data pipeline. This approach balances computational efficiency with security requirements.

The implementation serves two primary technical objectives:

  1. Encryption of data prior to warehouse ingestion, ensuring protection during network transit.
  2. Maintenance of data encryption within the warehouse, providing security for data at rest.

So, what about decryption? – to decrypt the data, customers must utilize Tink in conjunction with the same KMS. This ensures that decryption is only possible for authorized entities possessing the correct cryptographic keys. 

Here’s how you can get your hands dirty with it in Striim

  1. Create a connection profile which connects to the KMS
  2. Create a shield component in the flow designer which uses this connection profile
  3. Deploy & Run your application, and sit back and enjoy encrypted real-time data streaming! 

It’s important to note that since we send the data encryption key over the network to be encrypted by the KMS, there are performance implications in terms of speed while using this feature. 

For simpler encryption than Shield, Striim offers the WITH ENCRYPTION option during application creation. This secures data streams between Striim servers or from Forwarding Agents, especially useful when data sources are external to the cluster or private network. You can apply encryption at the application or flow level to balance security and performance, encrypting only sensitive data streams as needed.

The Striim Vault: Protecting Customer Metadata 

Now that we’ve discussed how data in motion is safe in Striim at rest, we’ll dive into how data is kept safe at rest. Striim maintains customer metadata in our Metadata Repository (MDR). This includes all information such as pipeline source & target metadata, user information, pipeline deployment plans and more. While this repository is crucial for storing various configuration details, we understand that certain information—such as passwords and table names—requires an extra layer of security.

When users input passwords for database connections or secure JDBC connections, Striim doesn’t simply store this sensitive information in plain text. Instead, we’ve developed a sophisticated system to protect these credentials – Striim integrates seamlessly with external vaults, providing customers with a secure way to manage their sensitive data. This integration allows customers to avoid storing sensitive information directly in the MDR, even though we do encrypt that before persisting it as well. 

Our vault integration includes several advanced security features:

  • Token Auto-renewal: Customers can configure settings for Striim to automatically renew the vault token periodically. This ensures that even if a token is leaked, it becomes invalid after a certain time.
    The Striim server operates a timer thread within the JVM that regulates the frequency at which the HashiCorp Renew Endpoint is accessed.
  • Reference-based Storage: In Striim, we maintain a reference to the connection in the vault and the key that the customer wants to use as the password for their database or any other sensitive field.
  • Runtime Connection: Striim makes the connection to the vault at runtime. This means the sensitive value is only stored in memory and never persisted to the MDR. 

One of the many vaults we integrate with is the HashiCorp Vault. Here’s how it works:

  1. Customers maintain their keys and values on their HashiCorp server.

  2. In Striim, customers create a reference to their vault. You provide only an access token to connect to the HashiCorp Server, rather than the actual credentials.

  3. Once the vault is created, customers can see their HashiCorp Vault keys, as well as the usage of the vault in Striim Application components such as Sources & Targets.

  4. Striim users can use this vault key as the password in their source/target (e.g., Oracle Reader)  – this ensures that the raw password is never stored in our MDR, and only maintained in-memory for this connection.

  5. Deploy and Run Pipeline – This is the final step which allows you to run your data pipeline and drive mission critical business applications securely, and in real-time!

Benefits of Using Vaults in Striim

  1. Enhanced Security: Sensitive data is never stored at rest in Striim’s MDR.
  2. Flexibility: Customers can use their preferred vault solution. We integrate with Google’s Secret Manager, HashiCorp Vault, Azure Key Vault, and even provide our own homegrown Striim Vault. 
  3. Reduced Risk: Sensitive credentials existing in a completely segregated system designed for data protection allows for heightened security.

For this post we focused on protecting and data in motion with Shield, and data at rest with vaults. In an era where data breaches can cost companies millions and erode customer trust, Striim’s commitment to secure data processing provides peace of mind. By addressing security at every stage of the data lifecycle, from ingestion to processing to storage, we enable businesses to harness the power of their data without compromising on safety.

This article only scratches the surface in how we keep your data safe. There are so many more amazing features in our new release, including automated PII detection, OAuth connectivity, SSO/SAML support, and more to safely activate your siloed data in real-time.

Ready to learn more? Sign up for a demo today

More Resources