Blogs

Agentless “Software only” SAN monitoring

Part 1 in a 4-part series

By Ravi Prakash, Director Product Marketing

agentless-software-only-san-monitoring

Do your application users complain of sporadic slowness but the vendors in your IT war room claim that their infrastructure products are not at fault?  As the owner of all infrastructure you are on the hook to resolve slowness in business-critical applications.

Despite hearing good things about VirtualWisdom from your industry peers are you reluctant to deploy our solution because you think you’ll need to install TAPs and hardware probes?

The reality is that many of our Fortune 500 customers resolve most of their issues using our agentless “software-only” monitoring solution.  Here are scenarios where a software-only solution from Virtual Instruments would provide significant value:

database-impact-disk-timeouts

Database impact of disk I/O timeouts

A typical scenario: You notice sporadic slowness in your Oracle database, to get to root-cause you deploy VirtualWisdom and our agentless SAN integration.  The agentless SAN integration uses SMI-S or SNMP to gather information and link error statistics from your Brocade or Cisco SAN switches in a non-intrusive manner at a granularity of 1-minute.  VirtualWisdom reports a number of Cyclic Redundancy Check (CRC) errors in the server’s Host Bus Adapter (HBA).  CRC errors could be a symptom of code violation within a Fibre channel data frame.  Code violations in turn imply bit errors which in turn can be caused by a flapping SFP or a damaged fiber optic cable.

If left unchecked, CRC errors could impact the transaction writes to your Oracle database which in turn could cause a 30 sec or more timeout on the disk which could cause your database to come to a grinding halt!  VirtualWisdom can identify CRC errors while they are still few and far between, so you may schedule maintenance to clean and inspect your fiber optic cables and prevent the nightmare scenario of a production database coming to a halt.

Application outage due to unplanned path failure

Let us assume that you upgrade the firmware on your SAN switch and for some reason the host which was to have multiple paths to storage didn’t recover its second path. Unknown to you, only a single path is active from the host.  For any reason if this single path should fail there is no way for the host and its application to reach the SAN-attached storage.

virtualwisdom-balance-finder

VirtualWisdom can help you detect this issue if you schedule our Balance Finder analytic.  It will tell you what %age of all the HBAs in your SAN are balanced, unbalanced or recently unbalanced (in the last few hours or days).  This allows you to take remedial action before a single path from the server goes down taking the application down with it.

 

Impact of Fibre channel class 3 discard messages

The most common cause of class 3 discard messages in Fibre channel is a server requesting more data than it can consume or inter switch links (ISLs) or storage arrays being overloaded.

VirtualWisdom with agentless SAN monitoring can retrieve zero buffer credit counters from switches and correlate it with your application workloads to identify speed mismatches before they spiral into major issues.

Stay tuned for part 2 of this 4-part series where we’ll cover software-only monitoring for FCoE and software-only line-rating monitoring in a Cisco SAN environment.

Read part 2.

Don’t forget to follow us on TwitterLinkedIn and Facebook to stay up to date on the latest and greatest in app-centric infrastructure performance monitoring.