Blogs

Meet your application SLA in Isilon environments using VirtualWisdom

By Ravi Prakash, Product Manager


What does Formula 1 racing have in common with the approach taken by VirtualWisdom? In auto-racing, real-time telemetry from numerous sensors in cars enables pit crews to make active adjustments to cars while the car is on the track.  Multivariate data which factors in the conditions of the track, the weather, the competition is used to decide how best to tweak the car for optimal performance and avoid a blinding crash.  After all, what use is “after the fact” data if your vehicle has just hit a wall?  In the same way, VirtualWisdom provides real-time insights to help you avoid hitting the proverbial wall of unmet application SLAs.


If you are an enterprise focused on cancer research, digital media, social networking or even testing jet engines and needed a clustered storage solution it’s a good bet you probably picked DellEMC® Isilon® scale-out NAS.  For data management, you might use SmartPools for automated tiering, SmartDedupe, SmartQuotas and InsightIQ® for performance monitoring.

If you have InsightIQ for performance monitoring why consider VirtualWisdom?  InsightIQ is designed to be an off-cluster trending and reporting tool that works by polling the Isilon cluster every 15 seconds over the OneFS API.  It focuses on capacity utilization and basic performance metrics.  It could report on how the filesystem changed between two points in time.  It could help you in scenarios where SmartPools and FSAnalyze running concurrently caused CPU usage spikes in a given period.  You could use it to forecast the date when your storage capacity will reach 90%.

VirtualWisdom complements InsightIQ from a different perspective.  We monitor wire data at 10GbE line rate and correlate across your compute, switching network and storage.  We use a hardware performance probe and a passive tap between the GbE switch and your Isilon array and gather statistics on every conversation (SMB or NFSv3) over the wire.  We break this down by read, write, set, delete, dir, getattr, setattr.  We record every single read and write operation, and do not rely on sampling.  We don’t look at the payload of the packets as we apply a packet mask to ensure the security of your confidential data.

Since we monitor your compute (vCenter, Hyper-V, AIX) we can add an application context to the information we collect over the wire.  We rely on analytics like our seasonal trend deviation to identify deviations from a baseline and generate alarms.  We don’t just identify the causes of problems in your environment but recommend remediation using a guided approach called “Investigations”.  With VirtualWisdom you can set alarms on NAS average performance, NAS flow control, NAS histogram performance, NAS link errors, and NAS packet errors.  When an alarm fires, it is tied to a case which is tied to an investigation which in turn recommends the right analytics you should run.  This run-book style method of solving problems is based on 10+ years of the field facing professional services expertise from VI that is now coded into our product.

We include out-of-the-box dashboards for NAS Client investigations, NAS server distribution, SMB Top Talkers, NFS Top Talkers.  Typical uses cases for which our product is used:

  • Identifying root cause of sporadic latency in a tier 0 application perhaps caused by a noisy neighbor.
  • Identifying root cause of metadata storms possibly caused by a file level replication application designed for a SAN which has now been migrated over to NAS.

We also detect your applications using a choice of methods (ServiceNow CMDB, NetFlow, SSH/WMI) and make recommendations to reduce the onerous task of manually correlating half a dozen components (like web tier, database tier, app tier) for a few hundred applications running in VMs across your datacenter.  In conclusion, we fill in the critical gaps between your device monitor (InsightIQ) and your APM tool – i.e. identify the impact of infrastructure and workload behavior changes on application end users so you can identify which tier 2 or 3 application is impacting the SLA of your tier 0 application.  Like to learn more? Give us a call.