Skip to main content

ADLS Reader

Azure Data Lake Storage (ADLS) is a scalable data store offered by Microsoft Azure Cloud. Data is stored in this storage as blobs and this can be accessed using many other Azure components. Striim can read objects from a container in an ADLS storage account with hierarchical structure enabled using its ADLS Reader.

Note

ADLS Reader and ADLS Gen2 Writer read from and write to the same Azure Data Lake Storage (ADLS) service. Microsoft dropped "Gen2" from the name of the service after it retired the ADLS Gen1 service in February 2024.

Feature summary

Feature

Supported?

Notes

Objects

   Standard Entities

-

Not applicable (reads files/blobs in containers).

   Custom Entities

-

Not applicable (reads files/blobs in containers).

Authentication

   Basic Authentication

-

   OAuth Authentication

Microsoft Entra ID (app registration). See Security and authentication.

   Custom Authentication Methods

-

Security and Governance

   TLS 1.2

   Connection Profile

Reuse credentials/settings. See Initial setup for ADLS Reader.

   Encryption

Server-side encryption (MMK/CMK/CPK). See Security and authentication.

Operations

   Initial Load

Reads existing objects on first run.

   Pull-based Incremental Load

Directory Listing / Log Analytics. See Operations and object detection modes.

   Automated mode

Initial read + continuous polling.

Building Applications / Programmability

   Wizards

   Flow Designer

   Striim TQL

Schema Handling

   Initial Schema Creation

-

Parsers define event structure; target writers manage schema.

   Output as WAEvents

Publishes WAEvents.

   Schema Evolution

-

Runtime

   Resilience

Configurable automatic retries. See Resilience and recovery.

   Recovery - A1P (at-least once processing) & E1P

A1P recovery; duplicates possible after restart. See Resilience and recovery.

   Parallel Execution

   Metrics & Auditing

See ADLS Reader monitoring metrics.

Security and authentication

The authentication and authorization required for ADLS Reader are set up using Microsoft Entra ID (formerly Azure Active Directory). Register an application and grant it access to the storage account using a suitable role (for example, Storage Blob Data Contributor or Storage Blob Data Reader).

The ADLS Reader uses the Entra application’s Client ID, Tenant ID, and Client Secret.

Encryption support: ADLS Reader supports all server-side encryption options provided by ADLS—Microsoft-managed keys (MMK), customer-managed keys (CMK), and customer-provided keys (CPK). If CPK is used, configure the same key in the reader.

Operations and object detection modes

ADLS Reader processes incremental changes (object creations and updates) using one of the following modes:

  • ADLS Directory Listing: Detects changes by comparing object metadata across polls per directory. (Default.)

  • Log Analytics: Detects changes across the container using Azure Log Analytics over the polling interval; reduces compute and network overhead. Requires additional configuration on the storage account and Log Analytics workspace.

ADLS Reader overview

ADLS Reader connects to an ADLS Gen2 storage account (a storage account with hierarchical namespace enabled). ADLS Gen2 combines ADLS Gen1 and Azure Blob Storage capabilities, providing file system semantics, file-level security, tiered storage, and high availability/DR.

On first run, ADLS Reader processes all objects in the configured container in order of modified time. Thereafter, it polls for new or updated objects and processes changes incrementally.

Typical use case and integration

A common pattern is reading from ADLS and capturing file uploads/updates via a Log Analytics workspace, with an Entra ID application authorized for both the storage account and the workspace to query logs for changes.

Object processing modes

ADLS Reader streams objects directly from the container so parsing can begin without a full download.

When the parser is Parquet, the file must be fully available before processing; the reader downloads files and throttles downloads to a maximum disk usage of 2048 MB and a file limit of 10.

Parsers

Supported parsers: AAL (Apache access log), Avro, Binary, DSV, Free Form Text, JSON, NVP (name-value pair), Parquet, XML.

Resilience and recovery

  • At-least-once processing (A1P) with checkpoints of processed file names and offsets. On restart, the reader resumes from the last checkpoint; duplicates may occur after recovery.

  • Automatic retries per the Connection Retry Policy. If connection attempts exceed the configured limit, the app halts with an error.