Technical Note Alerts into Splunk SL Corporation 18 ...

RTView

Technical Note

SL Corporation 18 January 2016 TN-712

How to Bring TIBCO Monitoring Metrics and Alerts into Splunk Dashboards Using RTView

RTView Enterprise Monitor is a mature and robust platform that collects, analyzes, and archives monitoring data from a broad range of middleware products, from TIBCO and other vendors, as well as open source solutions. This article describes in detail how any of the current or historical metrics collected by RTView, as well as the alerts it generates, can be imported and made visible in custom dashboards created using Splunk. This permits Splunk users to quickly get visibility into a broad range of middleware monitoring information, without having to develop and maintain the collection and analytics functions already contained within RTView.

RTView and Splunk ? Background

RTView has become the de facto standard for monitoring complex applications built around middleware components such as the TIBCO suite of messaging, business process, and analytics products. Many TIBCO customers use RTView component monitoring products purchased through TIBCO, or the complete and comprehensive RTView Enterprise Monitor suite obtained directly from SL Corporation.

Despite strong competition from open source, much of IT has embraced Splunk for its powerful log analysis features, as well as its ability to ingest, store, visualize, monitor, and analyze virtually any type of data. Splunk provides a simple UI for building dashboards, and many API/SDK options which have spawned a variety of user-supported apps and add-ons (available for download at ).

However, you will find little support for TIBCO-centric monitoring data in splunkbase. This is partly due to the fact that it's simply not that easy to do it well. In order to monitor TIBCO apps, you need to use the proprietary "Hawk" protocol and EMS Admin API, and perform additional processing on the collected metrics. SL's RTView dataserver natively supports Hawk and makes it painless to collect metrics for BusinessWorks, BusinessEvents, EMS, and other TIBCO solutions. In this technical note, we describe a simple and cost-effective way to get this important data into Splunk.

RTView Dataserver Basics

The RTView dataserver is the collection component for the RTView EM suite. Architecturally, any number of dataservers may be distributed in local or remote datacenters to collect host, network, or application metrics. A central set of servers handle alerting, configuration management, and display, but these servers "refer" to the dataservers as necessary to reference data rather than aggregating the data locally. Hence, the architecture scales as needed to support very large populations of

1

monitored resources. But, for purposes of this article, we'll look at deploying only a single dataserver and exposing the data it collects to Splunk.

Our test dataserver is configured by specifying the appropriate "solution packages" and corresponding properties. A solution package is a bundled collection of cache definitions, cache source files (templates for data acquisition), alert definitions, and user interface displays. As an example, the following properties configure the dataserver to collect host data via hawk.

rtvapm_package=hawkmon rtvapm_package=hostbase collector.sl.rtview.hawk.hawkconsole sl_qa ems sl_qa tcp://10.16.200.118:7222 admin collector.sl.rtview.hawk.agentGroup WIN_AGENTS SLHOST16(sl_qa) SLHOST15(sl_qa)

The hostbase package contains the basic caches, alerts, and displays needed for generic host metrics, regardless of the protocol used to collect these metrics. The hawkmon add-on adds support for collection of host metrics via hawk. Hawk messages can be carried by either of two different transports. The "ems" transport is TIBCO's Enterprise Message Server. (Alternatively, the Rendezvous messaging middleware could be specified by using "rv".) The example hawkconsole property defines a connection to an EMS server topic, and the agentGroup specifies that we collect for two specific hosts whose hawk microagents are also configured to use ems.

When a dataserver is started with this configuration, we can access the collected host metrics via the following REST query to the dataserver.



By default, the response format will be XML, as shown below, where a row of metadata describes the data columns followed by a row of data for each host (some data omitted for brevity). For use with splunk, we'll tack "?fmt=json" to this query to get our results in an easier-to-parse json format:

2

1450987153921 myHawkDomain SLHOST15(sl_qa) Win32 6.1 7525657 2 8192.0 3733.26171875 4458.73828125 45.57 5.069760901455278 -1.0 94.93023909854472 5.0697609014552825 1450987137189 myHawkDomain SLHOST16(sl_qa) Win32 6.1 6914605 4 8192.0 7304.38671875 887.61328125 89.16 9.086940514895092 -1.0 90.91305948510491 9.08694051489509

Configuring Splunk to Get RTView Data

In order to query the RTView dataserver from splunk, download the REST add-on from splunkbase which can be found at . Install the add-on by selecting "Manage Apps" from the "App" menu in the upper left corner of your browser window. In the Apps window, click the "Install app from file" button, use the "Browse..." button to set the path to the downloaded file, and click "Upload".

3

After the REST interface is installed, you'll need to update a python script to handle the specific json format returned by queries to the RTView dataserver. Edit "/etc/apps/rest_ta/bin/responsehandlers.py" and add the following code:

class slRtvEventHandler:

def __init__(self,**args): pass

def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint): if response_type == "json": parsedJson = json.loads(raw_response_output) for rtvDataRow in parsedJson["data"]: rtvDataRow["Timestamp"] =

datetime.datetime.fromtimestamp(rtvDataRow["time_stamp"]/1000).strftime('%d-%b-%Y %H:%M:%S') print_xml_stream(json.dumps(rtvDataRow))

else: print_xml_stream(raw_response_output)

We'll use this class when configuring connections to REST-ful endpoints in the next sections. The slRtvEventHandler converts the json returned by queries to the RTView dataserver into python objects, then extracts the "data" section containing rows of tabular data and writes each row to Splunk as separate events. (If this is not done, Splunk treats the entire query result as an "event" object and it will be difficult to pull it apart to display in Splunk views.)

Before pushing each event to Splunk, you can optionally enrich the data in various ways. Here, we reformat the integer timestamp into a date/time string. Note that Splunk can also perform simple transformations like this example, but it may sometimes be advantageous to persist these changes in the stored data in order to optimize searches.

Ingest Host Metrics Collected via Hawk

Given a working dataserver and Splunk with the REST add-on, we can now configure a connection to pull data into Splunk. Click the "Settings" menu item in the splunk browser interface and select "Data inputs", then click on the "add new" action for the REST type. Configure the connection fields with the following values:

Endpoint URL: HTTP Method: GET URL Arguments: fmt=json Response Type: json Response Handler: slRtvEventHandler Request Timeout: 10 Backoff Time: 60 Polling Interval: 30

4

Set sourcetype: Manual Source type: rtv_hostbase

Save these settings, and then go to the "App: Search & Reporting" screen and click the "Data Summary" button. On the "Data Summary" pop-up, the "Sourcetypes(*)" tab should now show that data for a new source type "rtv_hostbase" is available. Click on this source type and examine the events. The search string for this display is simply "sourcetype=rtv_hostbase". We'll want to create tabular reports to visualize this data, so set the search time to a "1-minute window" in "Real time" and paste the following into the search window:

sourcetype=hostbase | dedup hostname sortby +_time | eval usedPerCentCpu=round(usedPerCentCpu,2),swapUsedPerCent=round(swapUsedPerCent,2),Me mUsedPerCent=round(MemUsedPerCent,2),MemFree=round(MemFree,1),VMemTotal=round( VMemTotal,1) | sort hostname | table Timestamp,hostname,OS_Description,usedPerCentCpu,MemTotal,MemFree,MemUsedPerCent ,VMemTotal,VMemUsedPerCent,swapTotal,swapUsedPerCent,agentClass The dedup clause de-duplicates the data (as indexed by hostname) so that no matter how many samples RTView may return each minute, Splunk will display the latest sample for each host. Now save this search as a report and open the report to see the following:

The Note the "Expired" column added to the tabular host data by RTView. This boolean indicates that RTView was unable to collect new data for the monitored resource during the last collection period. Normally, you would add Splunk alerts to catch the cases where a key metric (e.g., "usedPerCentCpu") was above a threshold. Adding an alert on the "Expired" status will let you know when the resource is not available. If the resource does not recover within a configurable "rowExpirationTimeForDelete", the expired data will be dropped from the RTview cache, and will then disappear from the Splunk displays.

This report can be used as-is, or included in dashboards like the following:

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download