Real-Time Anomaly Detection in Trading Data Using Striim and One Class SVM

3 Minute Read

In today’s fast-paced financial markets, detecting anomalies in trading data is crucial for preventing fraudulent activities and ensuring compliance with regulatory standards. By leveraging advanced machine learning (ML) techniques, businesses can enhance their anomaly detection capabilities, leading to more robust and secure operations. In this blog, we’ll explore how to use Striim’s Change Data Capture, Stream Processing, and extensibility features to integrate an anomaly detection model built with a third-party machine learning library. We’ll use Striim’s data proximity to ensure tight integration with models, avoiding data latency and accuracy issues. For this demonstration, we’ll perform analytics using simulated day trading data, store the model in a standard Python serialization format, expose it via a REST API, and use it in Tungsten Query Language (TQL), Striim’s SQL-like language for developers to build applications.  

What is Anomaly Detection?

Anomalies are events that deviate significantly from the normal behavior in the data. Anomaly detection is an unsupervised technique used to identify these events. Anomalies can be broadly classified into three categories:

  • Outliers: Short-term patterns that appear in a non-systematic way.
  • Changes: Systematic or sudden changes from the previous normal behavior.
  • Drifts: Slow, unidirectional long-term changes in the data.

Case Study with Anti-Money Laundering App

A few years ago, Striim was hired by a financial institution to implement an app to track trader activity and detect fraud via Anti-Money Laundering (AML) rules. Recently, we extended this app by integrating an anomaly detection ML model.

The app collects trading data from an operational database, uses caches to enrich incoming events, and applies Striim’s continuous queries (CQs) and in-memory time series windows to analyze the received information. The app implements several AML rules and, once the data is processed, it is stored in final EDW targets and presented in Striim’s real-time streaming dashboards.

Trading data passed to Transaction window

ML Integration and Usage Patterns with Striim Pipelines

Integrating machine learning models with Striim pipelines allows us to enhance our analytics capabilities. In this example, we’ll use the sklearn Python library to train a model with day trading activity data.

Solution

Train the model

  1. Use sklearn python library to train a model with a collected IL data based on day trading activity of a selected stock broker. The application collects and sums trading activities for a selected person for a period of time via a time window
  2. Pick One Class SVM. One Class Classification (OCC) aims to differentiate samples of one particular class by learning from single class samples during training. It is one of the most commonly used approaches to solve Anomaly Detection, a subfield of machine learning that deals with identifying anomalous cases which the model has never seen before.
  3. Train the model: OneClassSVM(gamma=’auto’).fit(X) where X is a set of training data

Use the model 

  1. Store data via JobLib dump format providing the most efficient way to save files to disk
  2. Build a function to call model in real time via predict(K) and expose it via REST API using Flask library as an example but service can be hosted on any WSGI web server app
  3. API output comes as simple JSON string:  
    • {“anomaly”:”-1″,”inputParam”:”80″} if an anomaly is detected.
    • {“anomaly”:”1″,”inputParam”:”44″} if the behavior is normal.
  4. Access the model API via Rest API Caller OP PS built module

By integrating external ML models with Striim, you can create powerful, real-time anomaly detection systems that enhance your business operations and security measures. This approach ensures that you can quickly identify and react to unusual patterns in your data, maintaining the integrity and reliability of your systems. Ready to see how Striim can transform your data analytics and anomaly detection processes? Sign up for a free trial today and experience the power of real-time data integration and machine learning firsthand.

Real-time Data

Declarative, Fully Managed Data Streaming Pipelines

Data pipelines can be tricky business —failed jobs, re-syncs, out-of-memory errors, complex jar dependencies, making them not only messy but often disastrously unreliable. Data teams