Skip to main content

Managing Smart Alerts

The following alerts are enabled by default and are sent for every server, Forwarding Agent, application, source, and target.

By default these alerts are visible only to administrators (members of the Global.admin group) in the alerts drop-down in the top right corner of the Striim web UI and in the Message Log at the bottom of the web UI. You may modify them to be sent by email or to Slack or Microsoft Teams.

Alert name

Alert condition (default)

Notes

Server_HighCpuUsage

the server average per core CPU time used by its Java process is over 90%

By default, an alert will be sent every four hours until the condition is resolved.

Server_HighMemoryUsage

the server's JVM free heap size is below 10% of the maximum heap size (Xmx)

By default, an alert will be sent every four hours until the condition is resolved.

Server_NodeUnavailable

the server is no longer connected to the cluster

Agent_HighCpuUsage

the Forwarding Agent average per core CPU time used by its Java process is over 90%

By default, an alert will be sent every four hours until the condition is resolved.

Agent_HighMemoryUsage

the Forwarding Agent's JVM free heap size is below 10% of the maximum heap size (Xmx)

By default, an alert will be sent every four hours until the condition is resolved.

Agent_NodeUnavailable

the Forwarding Agent is no longer connected to the cluster

Application_AutoResumed

the application resumed automatically (see Automatically restarting an application)

Application_Backpressured

one or more streams in the application have been backpressured for over ten minutes (see Understanding and managing backpressure)

By default, an alert will be sent every four hours until the condition is resolved.

Application_CheckpointNotProgressing

it has been over 30 minutes since the recovery checkpoint advanced and during that time at least one new event was received from a source (see Recovering applications)Recovering applications

By default, an alert will be sent every four hours until the condition is resolved.

Application_Halted

the application has halted (see Application states)

 

Application_Rebalanced

not applicable to Striim Cloud

Application_RebalanceFailed

not applicable to Striim Cloud

 

Application_Terminated

the application has terminated (see Application states)

Source_Idle

it has been over 10 minutes since the source read an event

By default, an alert will be sent every four hours until the condition is resolved.

Target_HighLee

one or more events received by the target had an end-to-end lag of over ten minutes (see Monitoring end-to-end lag (LEE))

By default, an alert will be sent every four hours until the condition is resolved.

Target_Idle

it has been over 10 minutes since the target wrote an event

By default, an alert will be sent every four hours until the condition is resolved.

Modifying a Smart Alert

The properties (which vary depending on the alert) are:

  • alertMessage: defines the text of the alert. This can be edited in the console but not in the web UI. Variables used in the alert.

    The following replacement variables can be used in alert messages. Actual values will be substituted for the variables when an alert is being issued. The values are taken from the alert definition and the monitor event being evaluated for the alert.

    • adapterName: Adapter name in alert definition (e.g. FileReader)

    • address: Address to which the alert will be sent (e.g. somebody@example.com)

    • alertName: Name of the alert (e.g. Application_CheckpointNotProgressing)

    • alertValue: Metrics value defined in the alert condition (e.g. 300)

    • comparator: Alert condition comparator (GT, LT, EQ, LIKE)

    • entityName: Actual component name in the mon event (e.g. admin.PosApp)

    • entityType: Component type (e.g. APPLICATION)

    • medium: Alerting medium (WEB, EMAIL, SLACK, TEAMS)

    • metricName: Metrics name in the alert condition (e.g. LAST_CHECKPOINT_AGE)

    • metricUnit: Unit of metrics (e.g. seconds)

    • metricValue: Actual metrics value in the mon event (e.g. 543)

    • objectName: Component name pattern in the alert definition (e.g .*\.APPLICATION\..*)

  • alertType: EMAIL, SLACK, TEAMS, or WEB (default); except for WEB, you must also specify the toAddress

  • alertValue:

    • for integer values: the time in seconds before the alert is triggered; for example, for Source_Idle, the number of seconds with no events that need to pass before an alert is sent

    • for string values: the string to search for in the error message; for example, for Application_Terminated, Application terminated

  • comparator: EQ (equals), GT (greater than), LT (less than)

    • for integer values: EQ (equals), GT (greater than), LT (less than)

    • for string values: EQ (equals), LIKE (matches if the specified string occurs anywhere in the value)

  • intervalSec: the number of seconds between alerts (the snooze interval)

  • isEnabled: true (default) or false

  • toAddress: for email, the recipient's address; for Slack or Teams, the channel

Some of these properties are displayed and editable in the web UI.

AlertManagerServerNodeUnavailable.png

To see all of an alert's properties, use the DESCRIBE command. For example:

DESCRIBE Application_Terminated;
Processing - describe Application_Terminated

SysAlertRule Application_Terminated 
  on .*\.APPLICATION\..*: 
  for LOG_ERROR 
  comparator LIKE 
  with value Application terminated  
  alert type WEB 
  snooze 0 SECOND 
  system-defined and enabled 
  message: Application {{entityName}}: {{metricValue}}.
-> SUCCESS

The property names in the DESCRIBE output correspond to the following keywords in ALERT SMARTALERT commands:

DESCRIBE output

keyword for ALTER SMARTALERT

on

can't be modified

for

can't be modified

comparator

can't be modified; the comparators are

  • for integer values: EQ (equals), GT (greater than), LT (less than)

  • for string values: EQ (equals), LIKE (matches if the specified string occurs anywhere in the value)

with value

alertValue

alert type

alertType

sending to

toAddress

snooze

intervalSec

message

alertMessage

enabled

isEnabled

Examples of modifying Smart Alert properties

To modify a Smart Alert, go to the Alert Manager and select the alert you want to modify. Which properties are available varies depending on the alert selected.

AlertManagerSelectSmartAlert.png
  • To change the alert type for Application_Terminated from in App to Email, change the Alert Type and specify the email address of the person to receive the alert:

    AlertManagerApplicationTerminated.png
  • To change the alert interval (snooze) for Source_Idle to an hour, set Snooze After Alert. This means alerts on this condition will be sent no more often than once an hour.

    AlertManagerSourceIdle.png
  • To disable Source_Idle alerts, set Enable alert off:

    AlertManagerDisableSourceIdle.png