Skip to main content

GCS Reader runtime considerations

Monitoring metrics

The following monitoring metrics are published by the GCS Reader:

  • Count of cloud objects metadata fetched: The number of object metadata fetched in the last fetch.

  • External I/O latency: The latency of the last metadata fetch call.

  • Name of the last cloud objects metadata fetched.

  • Cloud object statistics:

    • Count of cloud objects metadata fetched: Total objects metadata fetched so far.

    • Downloaded count: Number of files downloaded.

    • Processed count: Number of files processed.

    • Missing count: Number of files deleted in bucket after fetching metadata.

    • Total object size in MB: Total size in MB of all objects metadata fetched so far.

    • Total downloaded size in MB: Total size in MB of all downloaded objects. This metric is not published for the UseStreaming option.

    • Disk utilization in MB: Current disk utilization of the download directory (.striim/componentname/).

  • Current filename.

  • Last file read.

Performance optimizations

Object fetching mode: With the streaming approach (Use Streaming property) performance is expected to be faster as the bytes are streamed directly instead of requiring additional download steps. Local testing shows for sample data of 411 DSV files of varying size with 1M events in total, the download approach took 162 seconds vs 55 seconds by the streaming approach.

Object detection mode: The GCSAuditLogNotification object detection mode provides better performance during app recovery after a crash/stop when a bucket contains a huge number (in the order of millions) of objects. This is because the reader does not need to fetch the full metadata to locate the check-pointed object.

Limitations

The following limitations apply to the GCS Reader:

  • The GCS Reader can read Avro files with an embedded schema, but not with a separate Avro schema file.

  • The GCS Reader adapter's download mode is not supported on Windows OS.

  • If the object name is bigger than what current OS filename length supports, then you should enable the Use Streaming option to avoid exceptions from downloading a filename larger than what the OS supports.

  • If a bucket contains a huge number of objects, the reader may consume a high level of memory and CPU to fetch and process the metadata. This applies to both the GCSDirectoryListing and GCSAuditLogNotification modes.

    For the GCSDirectoryListing mode, a full metadata fetch happens when the adapter starts and for every subsequent polling fetch.

    For the GCSAuditLogNotification mode, a full metadata fetch happens when the adapter starts, and subsequent polling calls fetch only the incremental changes from the audit log.

  • In GCSDirectoryListing mode, if the bucket contains a huge number (in the order of millions) of objects, app recovery after crash/stop will take a considerable time since the full metadata has to be fetched to locate the checkpointed object. You are recommended to use the GCSAuditLogNotification mode for better performance.

  • In the GCSAuditLogNotification mode, the Google cloud provider has a set default limit (60) on the number of requests per min on reading the audit log. If you are running multiple apps then you should set the polling interval based on the number of apps you are running and the audit log read limit.

  • A time offset of 5 minutes is applied to queries to avoid a conflict during high volume data loading. To modify the 5 minutes default, contact Striim support.

Troubleshooting

This topic describes errors you may see when using the GCS Reader, and possible resolutions.

Exception

Resolutions

GoogleCloudBucketNotFoundException

Check if the specified bucket is present.

When using Private Service Connect, verify:    

  • Provide a valid PSC name   

  • Make sure the PSC is created

  • Make sure the PSC is reachable from the Striim host

GoogleCloudCredentialsException

  • Verify that the connection related properties ( ProjectID , ServiceAccountKey or Private Service Connect ) that are passed to the adapter are valid.

  • Ensure the provided Service account key file path is correct.

  • Make sure the provided credentials are in a valid format.

GoogleCloudLocalFileSystemException

  • Make sure the system has enough space to download the object.

  • Ensure app has permission to download the file at the specified download path.

  • Enable the Use Streaming property for this specific file.

  • Object key might be too long, rename object key.

CloudStorageConnectionException

  • Verify the connection properties passed to the adapter are valid.

  • Check your internet connection.

  • Make sure your VPN has permission to connect to the Google server.

  • Make sure your Private Service Connect details are valid and reachable from the Striim server.

GoogleCloudPermissionException

Ensure that the user has the following permissions on the bucket/audit log:

  • storage.objects.list

    storage.objects.get

    logging.logEntries.list