ADLS Reader
Azure Data Lake Storage (ADLS) is a scalable data store offered by Microsoft Azure Cloud. Data is stored in this storage as blobs and this can be accessed using many other Azure components. Striim can read objects from a container in an ADLS storage account with hierarchical structure enabled using its ADLS Reader.
Note
ADLS Reader and ADLS Gen2 Writer read from and write to the same Azure Data Lake Storage (ADLS) service. Microsoft dropped "Gen2" from the name of the service after it retired the ADLS Gen1 service in February 2024.
Feature summary
Feature | Supported? | Notes |
|---|---|---|
Objects | ||
Standard Entities | - | Not applicable (reads files/blobs in containers). |
Custom Entities | - | Not applicable (reads files/blobs in containers). |
Authentication | ||
Basic Authentication | - | |
OAuth Authentication | ✓ | Microsoft Entra ID (app registration). See Security and authentication. |
Custom Authentication Methods | - | |
Security and Governance | ||
TLS 1.2 | ✓ | |
Connection Profile | ✓ | Reuse credentials/settings. See Initial setup for ADLS Reader. |
Encryption | ✓ | Server-side encryption (MMK/CMK/CPK). See Security and authentication. |
Operations | ||
Initial Load | ✓ | Reads existing objects on first run. |
Pull-based Incremental Load | ✓ | Directory Listing / Log Analytics. See Operations and object detection modes. |
Automated mode | ✓ | Initial read + continuous polling. |
Building Applications / Programmability | ||
Wizards | ✓ | |
Flow Designer | ✓ | |
Striim TQL | ✓ | |
Schema Handling | ||
Initial Schema Creation | - | Parsers define event structure; target writers manage schema. |
Output as WAEvents | ✓ | Publishes WAEvents. |
Schema Evolution | - | |
Runtime | ||
Resilience | ✓ | Configurable automatic retries. See Resilience and recovery. |
Recovery - A1P (at-least once processing) & E1P | ✓ | A1P recovery; duplicates possible after restart. See Resilience and recovery. |
Parallel Execution | ||
Metrics & Auditing | ✓ | See ADLS Reader monitoring metrics. |
Security and authentication
The authentication and authorization required for ADLS Reader are set up using Microsoft Entra ID (formerly Azure Active Directory). Register an application and grant it access to the storage account using a suitable role (for example, Storage Blob Data Contributor or Storage Blob Data Reader).
The ADLS Reader uses the Entra application’s Client ID, Tenant ID, and Client Secret.
Encryption support: ADLS Reader supports all server-side encryption options provided by ADLS—Microsoft-managed keys (MMK), customer-managed keys (CMK), and customer-provided keys (CPK). If CPK is used, configure the same key in the reader.
Operations and object detection modes
ADLS Reader processes incremental changes (object creations and updates) using one of the following modes:
ADLS Directory Listing: Detects changes by comparing object metadata across polls per directory. (Default.)
Log Analytics: Detects changes across the container using Azure Log Analytics over the polling interval; reduces compute and network overhead. Requires additional configuration on the storage account and Log Analytics workspace.
ADLS Reader overview
ADLS Reader connects to an ADLS Gen2 storage account (a storage account with hierarchical namespace enabled). ADLS Gen2 combines ADLS Gen1 and Azure Blob Storage capabilities, providing file system semantics, file-level security, tiered storage, and high availability/DR.
On first run, ADLS Reader processes all objects in the configured container in order of modified time. Thereafter, it polls for new or updated objects and processes changes incrementally.
Typical use case and integration
A common pattern is reading from ADLS and capturing file uploads/updates via a Log Analytics workspace, with an Entra ID application authorized for both the storage account and the workspace to query logs for changes.
Object processing modes
ADLS Reader streams objects directly from the container so parsing can begin without a full download.
When the parser is Parquet, the file must be fully available before processing; the reader downloads files and throttles downloads to a maximum disk usage of 2048 MB and a file limit of 10.
Parsers
Supported parsers: AAL (Apache access log), Avro, Binary, DSV, Free Form Text, JSON, NVP (name-value pair), Parquet, XML.
Resilience and recovery
At-least-once processing (A1P) with checkpoints of processed file names and offsets. On restart, the reader resumes from the last checkpoint; duplicates may occur after recovery.
Automatic retries per the Connection Retry Policy. If connection attempts exceed the configured limit, the app halts with an error.