Skip to main content

Using regular expressions (regex)

Striim supports the use of regular expressions (regex) in your TQL applications. It is important to remember that the Striim implementation of regex is Java-based (see java.util.regex.Pattern), so there are a few things to keep in mind as you develop your regex expressions:

  • The backslash character ( \ ) is recognized as an escape character in Java strings, so if you want to define something like \w in regex, use \\w in such cases.

  • In regex, \\ matches a single backslash literal. Therefore if you want to use the backslash character as a literal in the Striim Java implementation of regex, you must actually use \\\\.

  • The java.lang.String class provides you with these methods supporting regex: matches(), split(), replaceFirst(), replaceAll(). Note that the String.replace() methods do not support regex.

  • TQL supports the regex syntax and constructs from java.util.regex. Note that this has some differences from POSIX regex.

If you are new to using regular expressions, refer to the following resources to get started:

You may use regex in LIKE and NOT LIKE expressions. For example:

  • WHERE ProcessName NOT LIKE '%.tmp%': filter out data from temp files

  • WHERE instance_applications LIKE '%Apache%': select only applications with Apache in their names

  • WHERE MerchantID LIKE '45%': select only merchants with IDs that start with 45.

The following entry from the MultiLogApp sample Apache access log data includes information about a REST API call in line 4:

0: 206.130.134.68
1: -
2: AWashington
3: 25/Oct/2013:11:28:36.960 -0700
4: GET http://cloud.saas.me/query?type=ChatterMessage&id=01e33d9a-34ee-ccd0-84b9-
   14109fcf2383&jsessionId=01e33d9a-34c9-1c68-84b9-14109fcf2383 HTTP/1.1
5: 200
6: 0
7: -
8: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu 
   Chromium/28.0.1500.71 Chrome/28.0.1500.71 Safari/537.36
9: 1506

Regex is also used by the MATCH function. The MATCH function in the ParseAccessLog CQ parses the information in line 4 in to extract the session ID:

MATCH(data[4], ".*jsessionId=(.*) ")

The parsed output is:

sessionId: "01e33d9a-34c9-1c68-84b9-14109fcf2383"

The following, also from MultiLogApp, is an example of the data[2] element of a RawXMLStream WAEvent data array:

"Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] 
[user=HGonzalez] [sobject=User]","com.me.saas.SaasMultiApplication$SaasException: 
Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] 
[user=HGonzalez] [sobject=User]\n\tat com.me.saas.SaasMultiApplication.login
(SaasMultiApplication.java:1253)\n\tat 
sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)\n\tat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat 
com.me.saas.SaasMultiApplication$UserApiCall.invoke(SaasMultiApplication.java:360)\n\tat 
com.me.saas.SaasMultiApplication$Session.login(SaasMultiApplication.java:1447)\n\tat 
com.me.saas.SaasMultiApplication.main(SaasMultiApplication.java:1587)"

This is parsed by the ParseLog4J CQ as follows:

MATCH(data[2], '\\\\[api=([a-zA-Z0-9]*)\\\\]'),
MATCH(data[2], '\\\\[session=([a-zA-Z0-9\\-]*)\\\\]'),
MATCH(data[2], '\\\\[user=([a-zA-Z0-9\\-]*)\\\\]'),
MATCH(data[2], '\\\\[sobject=([a-zA-Z0-9]*)\\\\]')

The parsed output is:

api: "login"
sessionId: "01e3928f-e975-ffd4-bdc5-14109fcf2383"
userId: "HGonzalez"
sobject: "User"

See Parsing sources with regular expressions, FreeFormTextParser, and MultiFileReader for additional examples.