Skip to main content

ADLS Reader runtime considerations

ADLS Reader monitoring metrics

In addition to the standard metrics, the following monitoring metrics are published for ADLS Reader.

Metric

Description

CLOUD_OBJECT_LAST_OBJECT_NAME

The name of the cloud object whose metadata was recently fetched from the cloud.

Frequency: every cloud object in a batch.

For example:

Name of the last cloud objects metadata fetched | 0#1700554046000#normal#mt cars.parquet

CLOUD_OBJECT_LAST_OBJECT_REAL_NAME

The name of the actual object whose metadata was recently fetched from the cloud, along with its path in the ADLS container.

Frequency: every cloud object in a batch.

For example:

Actual name of the last cloud objects metadata fetched | normal/mt cars.parquet

CLOUD_OBJECT_LAST_BATCH_COUNT

The number of cloud objects whose metadata were captured in the latest fetch cycle.

Frequency: every batch fetched.

Units: count (Long)

For example:

Count of cloud objects metadata from last fetch | 160

EXTERNAL_IO_LATENCY

The latency involved in capturing the cloud metadata in the latest fetch cycle.

Frequency: every batch fetched.

Units: milliseconds (Long)

For example:

External I/O Latency | 06s:719ms

CLOUD_OBJECT_STATS

The following metrics related to the cloud objects are captured by ADLS Reader under Cloud objects statistics.

  • Count of Object metadata fetched: the total number of object metadata fetched since the start or latest restart of the application.

    Frequency: every batch fetched.

    Units: count (Long)

  • Downloaded count: the total number of objects downloaded to the Striim server from ADLS since the start or latest restart of the application. This sub metric will be displayed only when the adapter is run with the Parquet parser.

    Frequency: every object downloaded.

    Units: count (Long)

  • Processed count: the total number of objects processed by ADLS Reader since the start or latest restart of the application.

    Frequency: every event.

    Units: count (Long)

  • Missing count: the total number of objects that were deleted in ADLS after their metadata was fetched and before the object could be processed by ADLS Reader since the start or latest restart of the application.

    Frequency: every event.

    Units: count (Long)

  • Total objects size in MB: the total size of cloud object metadata fetched since the start or latest restart of the application.

    Frequency: every batch fetched.

    Units: count (Double)

  • Total downloaded size in MB: the total size of downloaded cloud objects since the start or latest restart of the application. This metric is captured only when Streaming is disabled. This sub metric will be displayed only when the adapter is run with Parquet parser.

    Frequency: every object downloaded.

    Units: count (Double)

  • Current Disk Utilization in MB: the current utilization of storage space in Striim server by the application. This metric is captured only when Streaming is disabled. This sub metric will be displayed only when the adapter is run with Parquet parser.

    Frequency: every object downloaded.

    Unit: count (Double)

For example:

{
  “Count of Objects metadata fetched”: 1,
  “Downloaded count”: 1,
  “Processed count”: 0,
  “Missing count”: 0,
  “Total objects size in MB”: 0.001,
  “Total downloaded size in MB”: 0.001,
  “Current Disk Utilization in MB”: 0.001
}

ADLS Reader limitations

The following are limitations of ADLS Reader.

  • ADLS Reader cannot capture data from the objects that get deleted from ADLS while the data capture is in progress.

  • When using Parquet Parser, ADLS Gen2 Reader can not download and process objects whose unique file name is longer than that supported by the Operating System.

  • Among compression formats, ADLS Reader is capable of reading only the objects compressed in gzip format.