MultiLogApp

This sample application shows how Striim could be used to monitor and correlate logs from web and application server logs from the same web application. The following is a relatively high-level explanation. For a more in-depth examination of a sample application with more detail about the components and how they interact, see PosApp.

MultiLogApp contains 12 flows that analyze the data from one or both logs and take appropriate actions:

MonitorLogs parses the log files to create two event streams (AccessStream for access log events and Log4JStream for application log events) used by the other flows. See the detailed discussion below.
ErrorsAndWarnings selects application log error and warning messages for use by the ErrorHandling and WarningHandling flows, and creates a sliding window containing the 300 most recent errors and warnings for use by the LargeRTCheck and ZeroContentCheck flows, which join it with web server data.

The following flows send alerts regarding the web server logs and populate the dashboard's Overview page world map and the Detail - UnusualActivity page:

HackerCheck sends an alert when an access log srcIp value is on a blacklist.
LargeRTCheck sends an alert when an access log responseTime value exceeds 2000 microseconds.
ProxyCheck sends an alert when an access log srcIP value is on a list of suspicious proxies.
ZeroContentCheck sends an alert when an access log entry's code value is 200 (that is, the HTTP request succeeded) but the size value is 0 (the return had no content).

The following flows send alerts regarding the application server log and populate the dashboard's Overview page pie chart and API detail pages:

ErrorHandling sends an alert when an error message appears in the application server log.
WarningHandling sends an alert once an hour with the count of warnings for each API call for which there has been at least one alert.
InfoFlow joins application log events with user information from the MLogUserLookup cache to create the ApiEnrichedStream used by ApiFlow, CompanyApiFlow, and UserApiFlow.
ApiFlow populates the Detail - ApiActivity page.
CompanyApiFlow populates the Detail - CompanyApiActivity page and the bar chart on the Overview page. It also sends an alert when an API call is used by a company more than 1500 times during the flow's one-hour jumping window.
UserApiFlow populates the dashboard's Detail - UserApiActivity page and the US map on the Overview page. It also sends an alert when an API call is used by a user more than 125 times during the flow's one-hour window.

MonitorLogs: web server log data

The web server logs are in Apache NCSA extended/combined log format plus response time:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %D"

(See apache.org for more information.) Here are four sample log entries:

216.103.201.86 - EHernandez [10/Feb/2014:12:13:51.037 -0800]  "GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 21560 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1606
216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.487 -0800]  "GET http://cloud.saas.me/create?type=Partner&id=01e3928f-e05a-9be1-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 63523 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1113
216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.543 -0800]  "GET http://cloud.saas.me/query?type=ChatterMessage&id=01e3928f-e05a-9be2-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 46556 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 1516
216.103.201.86 - EHernandez [10/Feb/2014:12:13:52.578 -0800]  "GET http://cloud.saas.me/retrieve?type=ContractHistory&id=01e3928f-e05a-9be3-bdc5-14109fcf2383&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1" 200 44556 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" 39

In MultiLogApp, these logs are read by AccessLogSource:

CREATE SOURCE AccessLogSource USING FileReader (
  directory:'Samples/MultiLogApp/appData',
  wildcard:'access_log',
  blocksize: 10240,
  positionByEOF:false
)
PARSE USING DSVParser (
  columndelimiter:' ',
  ignoreemptycolumn:'Yes',
  quoteset:'[]~"',
  separator:'~'
)
OUTPUT TO RawAccessStream;

The log format is space-delimited, so the columndelimiter value is one space. With these quoteset and separator values, both square brackets and double quotes are recognized as delimiting strings that may contain spaces. With these settings, the first log entry above is output as a WAEvent data array with the following values:

"216.103.201.86",
"-",
"EHernandez",
"10/Feb/2014:12:13:51.037 -0800",
"GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1",
"200",
"21560",
"-",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)",
"1606"

This in turn is processed by the ParseAccessLog CQ:

CREATE CQ ParseAccessLog 
INSERT INTO AccessStream
SELECT data[0],
  data[2],
  MATCH(data[4], ".*jsessionId=(.*) "),
  TO_DATE(data[3], "dd/MMM/yyyy:HH:mm:ss.SSS Z"),
  data[4],
  TO_INT(data[5]),
  TO_INT(data[6]),
  data[7],
  data[8],
  TO_INT(data[9])
FROM RawAccessStream;

After the AccessLogEntry type is applied, the event looks like this:

srcIp: "216.103.201.86"
userId: "EHernandez"
sessionId: "01e3928f-e059-6361-bdc5-14109fcf2383"
accessTime: 1392063231037
request: "GET http://cloud.saas.me/login&jsessionId=01e3928f-e059-6361-bdc5-14109fcf2383 HTTP/1.1"
code: 200
size: 21560
referrer: "-"
userAgent: "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"
responseTime: 1606

The web server log data is now in a format that Striim can analyze. AccessStream is used by the HackerCheck, LargeRTCheck, ProxyCheck, and ZeroContentCheck flows.

MonitorLogs: application server log data

The application server logs are in Apache's Log4J format. Log4J is a standard Java logging framework used by many web-based applications. In a real-world implementation, this application could be reading many log files on many hosts. Here is a sample message:

<log4j:event logger="SaasApplication" timestamp="1392067731765" level="ERROR" thread="main">
<log4j:message><![CDATA[Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]]]></log4j:message>
<log4j:throwable><![CDATA[com.me.saas.SaasMultiApplication$SaasException: Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]
    at com.me.saas.SaasMultiApplication.login(SaasMultiApplication.java:1253)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.me.saas.SaasMultiApplication$UserApiCall.invoke(SaasMultiApplication.java:360)
    at com.me.saas.SaasMultiApplication$Session.login(SaasMultiApplication.java:1447)
    at com.me.saas.SaasMultiApplication.main(SaasMultiApplication.java:1587)
]]></log4j:throwable>
<log4j:locationInfo class="com.me.saas.SaasMultiApplication" method="login" file="SaasMultiApplication.java" line="1256"/>
</log4j:event>

Screen_Shot_2016-01-07_at_10.44.12_AM.png

Log4JSource retrieves data from …/Striim/Samples/MultiLOgApp/appData/log4jLog. This file contains around 1.45 million errors, warnings, and informational messages. The XMLParser portion of Log4JSource specifies the portions of the message that will be used by this application:

CREATE SOURCE Log4JSource USING FileReader (
  directory:'Samples/MultiLogApp/appData',
  wildcard:'log4jLog.xml',
  positionByEOF:false
) 
PARSE USING XMLParser(
  rootnode:'/log4j:event',
  columnlist:'log4j:event/@timestamp,
    log4j:event/@level,
    log4j:event/log4j:message,
    log4j:event/log4j:throwable,
    log4j:event/log4j:locationInfo/@class,
    log4j:event/log4j:locationInfo/@method,
    log4j:event/log4j:locationInfo/@file,
    log4j:event/log4j:locationInfo/@line'
)
OUTPUT TO RawXMLStream;

For example, for the sample log message above, log4j:event/@level returns WARN and log4j:event/log4j:locationInfo/@line returns 1133. These elements are output as a WAEvent data array with the following values:

"1392067731765",
"ERROR",
"Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]","com.me.saas.SaasMultiApplication$SaasException: Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]\n\tat com.me.saas.SaasMultiApplication.login(SaasMultiApplication.java:1253)\n\tat sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.me.saas.SaasMultiApplication$UserApiCall.invoke(SaasMultiApplication.java:360)\n\tat com.me.saas.SaasMultiApplication$Session.login(SaasMultiApplication.java:1447)\n\tat com.me.saas.SaasMultiApplication.main(SaasMultiApplication.java:1587)",
"com.me.saas.SaasMultiApplication",
"login",
"SaasMultiApplication.java",
"1256"

This array in turn is processed by the ParseLog4J CQ:

CREATE CQ ParseLog4J
INSERT INTO Log4JStream
SELECT TO_DATE(TO_LONG(data[0])),
  data[1],
  data[2], 
  MATCH(data[2], '\\\\[api=([a-zA-Z0-9]*)\\\\]'),
  MATCH(data[2], '\\\\[session=([a-zA-Z0-9\\-]*)\\\\]'),
  MATCH(data[2], '\\\\[user=([a-zA-Z0-9\\-]*)\\\\]'),
  MATCH(data[2], '\\\\[sobject=([a-zA-Z0-9]*)\\\\]'),
  data[3],
  data[4],
  data[5],
  data[6],
  data[7]
FROM RawXMLStream;

The elements in the array are numbered from zero, so data[0] returns the timestamp, data[1] returns the level, and so on. The MATCH functions use regular expressions to return the api, session, user, and sobject portions of the message string. (See Using regular expressions (regex) for discussion of the multiple escapes for [ and ] in the regular expressions.) After processing by the CQ, the event looks like this:

logTime: 1392067731765
level: "ERROR"
message: "Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]"
api: "login"
sessionId: "01e3928f-e975-ffd4-bdc5-14109fcf2383"
userId: "HGonzalez"
sobject: "User"
xception: "com.me.saas.SaasMultiApplication$SaasException: Problem in API call [api=login] [session=01e3928f-e975-ffd4-bdc5-14109fcf2383] [user=HGonzalez] [sobject=User]\n\tat com.me.saas.SaasMultiApplication.login(SaasMultiApplication.java:1253)\n\tat sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.me.saas.SaasMultiApplication$UserApiCall.invoke(SaasMultiApplication.java:360)\n\tat com.me.saas.SaasMultiApplication$Session.login(SaasMultiApplication.java:1447)\n\tat com.me.saas.SaasMultiApplication.main(SaasMultiApplication.java:1587)"
className: "com.me.saas.SaasMultiApplication"
method: "login"
fileName: "SaasMultiApplication.java"
lineNum: "1256"

The application server log data is now in a format that Striim can analyze. Log4JStream is used by the ErrorsAndWarnings and InfoFlow flows.

ErrorsAndWarnings

The CQ GetLog4JErrorWarning filters Log4JStream, selecting only WARN and ERROR messages and discarding all others.

CREATE CQ GetLog4JErrorWarning
INSERT INTO Log4ErrorWarningStream
SELECT l FROM Log4JStream l
WHERE l.level = 'ERROR' OR l.level = 'WARN';

Log4ErrorWarningStream is used by the ErrorHandling and WarningHandling flows and by the Log4JErrorWarningActivity sliding window, which contains the most recent 300 events:

CREATE WINDOW Log4JErrorWarningActivity 
OVER Log4ErrorWarningStream KEEP 300 ROWS;

This window is used by the LargeRTCheck and ZeroContentCheck flows.

HackerCheck

This flow sends an alert when an access log srcIp value is on a blacklist. The BlackListLookup cache contains the blacklist:

CREATE CACHE BlackListLookup using FileReader (
  directory: 'Samples/MultiLogApp/appData',
  wildcard: 'multiLogBlackList.txt'
)
PARSE USING DSVParser ( )
QUERY (keytomap:'ip') OF IPEntry;

The CQ FindHackers selects access log events that match a blacklist entry:

CREATE CQ FindHackers
INSERT INTO HackerStream
SELECT ale 
FROM AccessStream ale, BlackListLookup bll
WHERE ale.srcIp = bll.ip;

The CQ SendHackingAlerts sends an alert for each such event:

CREATE CQ SendHackingAlerts 
INSERT INTO HackingAlertStream 
SELECT 'HackingAlert', ''+accessTime, 'warning', 'raise',
  'Possible Hacking Attempt from ' + srcIp + ' in ' + IP_COUNTRY(srcIp)
FROM HackerStream;

CREATE SUBSCRIPTION HackingAlertSub 
USING WebAlertAdapter( ) 
INPUT FROM HackingAlertStream;

This flow also creates the UnusualActivity WActionStore that populates various charts and tables on the dashboard:

CREATE TYPE UnusualContext (
    typeOfActivity String,
    accessTime DateTime,
    accessSessionId String,
    srcIp String KEY,
    userId String,
    country String,
    city String,
    lat double,
    lon double
);
CREATE WACTIONSTORE UnusualActivity 
CONTEXT OF UnusualContext ...

The CQ GenerateHackerContext populates UnusualActivity:

CREATE CQ GenerateHackerContext
INSERT INTO UnusualActivity
SELECT 'HackAttempt', accessTime, sessionId, srcIp, userId,
  IP_COUNTRY(srcIp), IP_CITY(srcIP), IP_LAT(srcIP), IP_LON(srcIP)
FROM HackerStream
LINK SOURCE EVENT;

HackAttempt is a literal string that identifies the type of activity. That will distinguish events from this flow from those from the three other flows that populate UnusualActivity.

LargeRTCheck

LargeRTCheck sends an alert whenever an access log responseTime value exceeds 2000 microseconds.

CREATE CQ FindLargeRT
INSERT INTO LargeRTStream
SELECT ale
FROM AccessStream ale
WHERE ale.responseTime > 2000;

The alert code is similar to HackerCheck's.

The typeOfActivity string for events written to the UnusualActivity WActionStore is LargeResponseTime.

ProxyCheck

ProxyCheck sends an alert when an access log srcIP value is on a list of suspicious proxies. This works exactly like HackerCheck but with a different list. The typeOfActivity string for events written to the UnusualActivity WActionStore is ProxyAccess.

ZeroContentCheck

ZeroContentCheck sends an alert when an access log entry's code value is 200 (that is, the HTTP request succeeded) but the size value is 0 (the return had no content).

CREATE CQ FindZeroContent
INSERT INTO ZeroContentStream
SELECT ale
FROM AccessStream ale
WHERE ale.code = 200 AND ale.size = 0;

The alert code is similar to HackerCheck's.

The typeOfActivity string for events written to the UnusualActivity WActionStore is ZeroContent).

ErrorHandling

This flow sends an alert immediately when an error appears in Log4ErrorWarningStream.

CREATE CQ GetErrors 
INSERT INTO ErrorStream 
SELECT log4j 
FROM Log4ErrorWarningStream log4j WHERE log4j.level = 'ERROR';

CREATE CQ SendErrorAlerts 
INSERT INTO ErrorAlertStream 
SELECT 'ErrorAlert', ''+logTime, 'error', 'raise', 'Error in log ' + message 
FROM ErrorStream;

CQ GetErrors discards all WARN messages and passes only ERROR messages. In CQ SendErrorAlerts, since the key value is logTime (which is different for every event) and the flag is raise (see Sending alerts from applications), an alert will be sent for every message in ErrorStream.

WarningHandling

This flow sends an alert once an hour with the count of warnings for each API call for which there has been at least one warning. The following code creates a one-hour jumping window of application log warning messages:

CREATE CQ GetWarnings 
INSERT INTO WarningStream 
SELECT log4j 
FROM Log4ErrorWarningStream log4j WHERE log4j.level = 'WARN';

CREATE JUMPING WINDOW WarningWindow 
OVER WarningStream KEEP WITHIN 60 MINUTE ON logTime;

The HAVING clause in the CQ SendWarningAlerts filters out API calls that have had no warnings.

CREATE CQ SendWarningAlerts 
INSERT INTO WarningAlertStream 
SELECT 'WarningAlert', ''+logTime, 'warning', 'raise', 
        COUNT(logTime) + ' Warnings in log for api ' + api 
FROM WarningWindow 
GROUP BY api 
HAVING count(logTime) > 1;

InfoFlow, APIFlow, CompanyApiFlow, and UserApiFlow

InfoFlow joins application log INFO events with user information from the MLogUserLookup cache:

CREATE CQ GetInfo 
INSERT INTO InfoStream 
SELECT log4j 
FROM Log4JStream log4j WHERE log4j.level = 'INFO';

CREATE CQ GetUserDetails 
INSERT INTO ApiEnrichedStream 
SELECT a.userId, a.api, a.sobject, a.logTime,
  u.userName, u.company, u.userZip, u.companyZip 
FROM InfoStream a, MLogUserLookup u 
WHERE a.userId = u.userId;

Otherwise this portion of the application is generally similar to PosApp. APIFlow, CompanyApiFlow, and UserAPIFlow populate dashboard charts and send alerts as described in the summary above.