Cisco SAN Telemetry Streaming (STS) & VirtualWisdom – Performance Monitoring for all your infrastructure

By Ravi Prakash, Product Manager

As a tourist walking across a historic city who doesn’t appreciate a choice of bridges each with its own unique view of the city?  In the same way, when it comes to application-centric infrastructure performance management we believe you should have choices to decide how deeply you wish to view the impact of shared infrastructure on application performance.

With Cisco MDS SAN switches in your switching fabric and VirtualWisdom for infrastructure performance monitoring, traditionally you had two choices:

  • Hardware performance probes to monitor the impact of infrastructure on tier 0 applications. While hardware performance probes involve installing TAPs they give you deep visibility into over 500 metrics including Exchange Completion Time (ECT), queue depth, SCSI metrics and so on.  With hardware performance probes you get performance statistics about individual Initiator Target LUN (ITL).
  • Software probes to monitor all other tiers of applications. These probes don’t require TAPs and use SNMP to poll the Cisco SAN and gives you a few dozen Fibre Channel and Fibre Channel Over Ethernet (FCoE) port-level statistics about health and utilization.

How about a 3rd option with the functionality of hardware performance probes, 100% visibility, at line rate and in real time without the CAPEX and OPEX costs involved with installing hardware probes and TAPs?  This option is now available due to Cisco SAN Telemetry Streaming (STS) being supported in VirtualWisdom.

4/8/16/32 G Fibre Channel module for MDS 9700

Cisco introduced a SAN telemetry offload ASIC in the MDS 9700 48-port 32-Gbps Fibre channel switching module which when installed in an MDS 9000 series Multilayer edge switch causes telemetry information to be streamed to a subscriber like VirtualWisdom.

The ASIC generates a statistical summary of observed workload response times.  These statistics are similar in nature to those produced by Virtual Instruments performance probes.  The Cisco MDS switch then streams the summary via its telemetry interface to the VirtualWisdom appliance.  Every Fibre Channel SCSI header on every flow is inspected at wire speed, with no sampling involved.  The diagram below shows you where you would have the 32 Gb FC module on the edge Fibre channel switches that connect to your shared storage.  For now, VirtualWisdom supports only storage edge deployment of the Cisco 32 Gbps line cards.

The second graphic demonstrates that in addition to TAPs and probes monitoring the impact on tier 0 applications, you now have Cisco MDS switch with MDS 9700 FC modules sending telemetry data directly to VirtualWisdom.  The Cisco STS streaming metrics help you understand workload response and fall into 4 categories:

  • I/O per sec (I/O failures, aborts, timeouts, sequential I/O per sec, outstanding exchanges at the end of the interval, peak command rate)
  • I/O size (Minimum/average/maximum)
  • Payload rate (MB/s, peak bandwidth rate)
  • Response time (Completion time & command to 1st data, minimum/average/maximum)

These metrics are per Initiator-Target-LUN (ITL), they are for reads & writes and are 10 second summaries.  The ASIC in the MDS 9700 switching module which enables all this is also a part of the Cisco MDS 9132T.

You might ask – What value does VirtualWisdom provide beyond what is already possible via CLI on the MDS switch or via Cisco DCNM once you have installed Cisco SAN Telemetry licenses per switch?  VirtualWisdom adds significant value in 3 key areas:

Application Service Assurance

  • Most monitoring tools overwhelm operations teams with so many alerts or the alerts are not relevant to the performance of key applications that operations teams either de-activate the alerts or just ignore them altogether. In contrast, VirtualWisdom can discover your applications and SLAs, automatically assign a custom monitoring policy to each infrastructure service.  This means no more manual policy assignments, fewer but more meaningful and actionable alerts.  VirtualWisdom can discover applications from the AppDynamics controller to give you visibility from your application (not just from the host or VM) through your SAN fabric down to your storage LUN.  What if you don’t use AppDynamics?  Not a problem, we can detect your applications with a choice of 3 other options as listed below.
    • Using ServiceNow
    • Using SSH/WMI
    • Collecting and analyzing NetFlow from physical routers/switches and from the VMware vSphere Distributed Switch

Workload Infrastructure Balancing

  • Workloads which are initially balanced with underlying infrastructure begin to shift across your infrastructure and drift in behavior over time. VirtualWisdom analytics assess your environment from Compute to Storage for workload-infrastructure imbalance conditions and provide optimal re-balancing.  Our Balance Finder analytic shows you what %age of your Host Bus Adapters (HBA) are balanced vs unbalanced.  This helps improve application performance in a Fibre channel SAN environment without having to query the host.  Our Event Advisor analytic identifies unusual behavior in your workloads, so you may attend to it and avoid impact to the SLA of your tier 0 applications.  Our Trend Matcher analytic looks for causality between trends, as an example it may identify root cause of delays in writes to your shared networked storage which when remediated could help you meet your application SLAs.  Trend Matcher looks for a pattern among thousands of micro transactions that occur between entities every second, analyzes them and assigns a virtual fingerprint to the relevant patterns of traffic. It then searches every other device in your SAN for traces of the same virtual fingerprint. The benefits of such infrastructure balancing include higher application uptime, fewer business disruptions and reduction in CAPEX on additional infrastructure.

Problem resolution & remediation

  • Your application performance can be impacted by misconfiguration, component failures anywhere from the application through compute, network and storage stack. Traditionally you had a “war room” situation and every operations team (application, database, compute/virtualization, network, storage) and their associated vendors contribute specialized knowledge but then you had to manually correlate all this!  In VirtualWisdom we do the correlation, we provide the run book style automation, so you don’t have to.  Specifically, when an alert is generated, a case is opened, this starts an Investigation which recommends an analytic (based on 10 years of customer-facing Professional Services experience) which guides you through actual remediation down to opening the change control ticket.  Unlike other tools we don’t tell you that there is a problem and stop there, we guide you towards remediation.

Now that Cisco STS works with VirtualWisdom, does this mean that you no longer need a hardware performance probe?  No, the hardware performance probe provides over 500 metrics, per-command statistics (versus averages), unique analytics like the queue solver analytic hence is invaluable to monitor the performance of your tier 0 applications which are critical to your business operations.  However, for tier 1 and tier 2 applications, the Cisco STS & VirtualWisdom solution provides a compelling wire level monitoring solution with ITL statistics at a lower TCO than hardware performance probes but with more insights than software-only probes.  Like what you hear and wish to learn more?  Give us a call!