Blogs

Achieving application SLAs in ScaleIO environments using VirtualWisdom

By Ravi Prakash, Product Manager

teal classic car outside palm trees

A two-dimensional view is always better than a one-dimensional view, whether it is to fully appreciate a classic Chevy or whether it is to understand the impact of DellEMC ScaleIO infrastructure on your application SLA.  Consider the applications of today, they have at a minimum 3 tiers of servers: web servers, app servers, database servers.  To ensure that your application can scale, you may end up with multiple web, app and database servers.

If you are a ScaleIO user in a VMware shop and run your application components in VMs on ESXi, you probably run the ScaleIO Data Server (SDS) and Meta Data Manager (MDM) in a ScaleIO virtual machine (SVM) and your Software Defined Storage (SDS) runs on the ESX server or in a guest operating system.  Your virtualized apps run in VMs on the same ESX server.  If you have many such ESX servers sharing a vSphere distributed switch (VDS) VI can provide that multi-dimensional view by layering networking information over the monitoring information we retrieve from the ScaleIO cluster.  This is not to imply that VI supports ScaleIO only on VMware as we also support ScaleIO on bare-metal.

VirtualWisdom talks to your ScaleIO gateway via API calls to retrieve over 400 metrics:  bandwidth metrics collected every 5 seconds and capacity metrics (like the size of the RAM cache, mapped volumes) collected every 1 min.  If you pose a question “How many MBs were transferred in the last 5 min?” to most monitoring solutions, they would rely on the 5-second samples taken at the start and end of the 5-minute window and would be unable to give you a 5-minute trend.  In contrast, VirtualWisdom collects metrics every 5 seconds from the ScaleIO gateway and from this we aggregate data to give you a 5-minute trend.  This is how you reduce application performance issues.

We give you the option to generate alarms on ScaleIO cache usage, SDS capacity, and SDS latency.  We generate service reports on health (by protection domain, pool, data server and device), utilization (by protection domain, storage pool, data client, server, network), performance (average read/write latency, read/write throughput) and capacity (by the storage system, protection domain).  This is only the 1st dimension I was talking about.

virtual instruments scaleio screenshots

Where is the 2nd dimension you may ask?  We assist in application performance monitoring by discovering and mapping how your applications are using the ScaleIO infrastructure.  Consider one of the many scenarios in which we support ScaleIO.  If you have application components running on a guest OS over ESX and you have vSphere Distributed Switch (VDS) shared across ESX servers, then you can configure VDS to generate NetFlow to VirtualWisdom which acts as a passive collector of NetFlow.  From this flow information, VirtualWisdom can derive insights on which 2 or more VMs seem to be talking frequently to each other.  This tells us that these 3 VMs may be part of the same application and we give you a remediation suggestion in our GUI.  This takes away the need to keep exhaustive manual spreadsheets on application components and their locations.  Does this imply that if you don’t use VDS we can’t give you that 2nd-dimensional view?  No – we can alternately auto detect applications using 2 other methods in case NetFlow isn’t in your plans: SSH/WMI access into the virtual hosts to query process tables OR relying on the CMDB of ServiceNow if you are a customer of ServiceNow.

In addition, we offer you “Investigations”, a guided runbook style way of problem resolution based on 10+ years of the field facing professional service and support experience.  Typical investigations which are included: Did a log file fill up the storage space suddenly?  Are the evictions excessive?  Is there a rebuild/rebalance process occurring?  Is there a resource hog consuming too many resources?

Since it is likely that besides ScaleIO you may have legacy networked storage from DellEMC, NetApp, HDS, and others isn’t it good to know that you can also leverage our Probes for Fibre Channel and NAS protocols to give you a single dashboard view of your entire infrastructure?  Wish to learn more?  Give us a call!