VirtualWisdom and cross domain correlation

By Ravi Prakash, Product Manager

bee and blossom cross polination

In 2015 the Smithsonian featured an article on climate change and its impact on bees.  According to this article, global warming is causing the southern limits of the North American and European bumble bee species to move north by up to 300 kmbut the northern limits of the bee ranges remain stationary causing an overall contraction of the insect’s habitat. Why should you care?  One reason is that 30,000 bee species worldwide act as major pollinators for crops of apples, almonds, blueberries, cherries, avocados, grapefruit, oranges, pumpkins… the list goes on.  Without bees our food supply will shrink drastically, and this portends chaos for 7.6 billion people on our densely populated planet.

Scientists arrive at such conclusions by painstaking analysis of what are seemingly unrelated domains of information like human development, disease, global warming, climate change, use of pesticides.  In the same way application latency in your data center can be due to any number of seemingly unrelated events that occur in silos (or domains) like servers, HBAs and NICs in servers, the SAN fabric, ports on your SAN attached or NAS storage array.

When you have silos like compute, networking, storage, applications, virtualization, correlating millions of micro flows across all these silos to identify root-cause is a daunting task.  To address this Virtual Instruments created Trend Matcher – a purpose built analytic for cross-domain correlation in our performance monitoring platform VirtualWisdom.  Let us walk through how this works in practice.

VirtualWisdom includes a variety of default report templates. When we select an Application Workload Detail report we see details of the workload generated by an application. We notice that an application has a health problem (the yellow color box among the green boxes), a host (in this case a blade on a Cisco UCS server) has an alarm (red versus green) and two storage ports on the EMC VNX array are reporting problems.

virtualwisdom workload detail report template

Select host acme-prd-ucs01 and a host level alarm of “average response time” on this host and you will notice spikes in activity which tells us that the server that hosts your tier 0 CRM application experienced a problem.

virtualwisdom event investigation details

To help you get to root-cause we’ve introduced Investigations which follow a run-book style methodology leveraging over 10 years of customer-facing experience gained by our professional service teams.  When you select the recommended Investigation on the top right of the screen, it, in turn, recommends that you run our purpose built analytic “Event Advisor”.

virtualwisdom event advisor

When you run Event Advisor on the host metric “Average write completion time” you will see the number of events (number of times the graph goes above a base line before returning below the baseline) and the maximum length in minutes for events.  These baselines are set by you for the alarms that you wish to trigger.

virtualwisdom baseline graphs

From the choice of graphs, let us select the second graph as it seems to have consistent deviations from the baseline.  Select the custom-built analytic “Trend Match” at the top left of the second graph and you’ll see a display with uniquely colored bubbles.  Each bubble corresponds to an event where an event can be anything from a host to an HBA to a storage port to a LUN on a storage array.

virtualwisdom trend match event graph

Select the bubble corresponding to the event VNX9759 and you get the following chart. We see that the yellow line in the graph mimics the green line with 73.51% accuracy.

virtualwisdom event chart detail

When the host complained about slow response times using the metric average write completion times, Trend Matcher helped us identify the storage port on the EMC VNX array that was also complaining about buffer credit starvation at the same time!  If you consider that most large datacenters have 1000s of hosts, 100s of applications, 100s of multi-vendor storage arrays and 1000s of SAN ports correlating this manually without a purpose built analytic like Trend Matcher would be a very daunting task indeed.  With Trend Matcher it is a matter of a few clicks and you don’t have to be an expert to use it!

Just as scientists were able to find root cause of non-pollinated crops in North America from data in seemingly unconnected silos or domains, so does Trend Matcher help you find out what changed in your infrastructure when your application or host running the application noticed a problem.  Get to root cause in minutes and meet your critical application SLAs – what’s not to like?  Want to learn more?  Give us a call!