Meet application SLAs in VMware vSAN using VirtualWisdom

By Ravi Prakash, Product Manager

virtual wisdom slas laptop on desk with coffee

Why rollout vSAN when you have networked storage? As a VMware customer comfortable with VMware tools like Dynamic Resource Scheduler (DRS), vMotion, Storage vMotion, vSphere replication – perhaps you were looking for a more efficient way to rollout new applications. Traditionally new applications required a VM which had its own datastore backed by a LUN or a file system on networked storage. If applications had unique requirements from the back-end storage (RAID 5 versus RAID 6 for instance) you needed new datastores. This meant you need to pre-configure storage services on your networked storage to support these additional datastores. All, this adds to the work of the storage administrator and creates additional steps and associated delays to guarantee application performance. vSAN offers you a way to avoid waiting for a storage administrator and just assign the task of allocating storage services to vSphere. A single vSAN datastore provides storage services to various VMs on different ESXi servers.

Like 10,000 vSAN customers world-wide, you may have decided to deploy VMware vSAN and started migrating workloads over to vSAN. Now that you have tier 1 and 2 apps using vSAN (while mission critical tier 0 apps continue to use SAN-attached networked storage), how do you ensure that application SLAs are met for those applications relying on vSAN?

To address this hybrid world, VI added support for vSAN monitoring to VirtualWisdom.

virtualwisdom 5.4 vSAN nodes

Of course, we collect vSAN metrics from VMware vCenter 6.5, we collect metrics around cluster, disk, disk-group and VM. What we do next with these metrics is where things get interesting.

We’ve introduced out-of-the-box dashboards around vSAN Top Talkers and vSAN client investigations. We’ve added a guided run-book style methodology to get to root cause where our alerts are tied to cases which are tied to investigations. Typical investigations address concerns like: Are the vSAN cache evictions excessive? Is vSAN congestion a sustained problem? Investigations are built upon the premise that our product should guide you through root-cause analysis leveraging 10 years of field experience gained by our own Professional Services teams.

We correlate this to applications which we can discover in a variety of ways:

  • Using the ServiceNow CMDB if you are a customer of ServiceNow
  • Using NetFlow from vSphere distributed switch (to correlate application components spread across different VMs and across ESX servers)
  • Using SSH/WMI to query the process tables on hosts by using read-only access.

By discovering your applications and correlating it to how vSAN resources are being consumed we give you a unique view into how your applications are being impacted by the shared underlying infrastructure, and reduce application performance issues.

Did I mention that we have taken a novel approach to licensing our support for vSAN? While other monitoring solutions may penalize you by counting the VMs in your vSAN or counting the terabytes of physical disk capacity in your vSAN cluster we simplify your accounting by licensing our vSAN support based on just the nodes in the vSAN cluster. We leave it to you to decide how many VMs you’ll have on each ESX node or what size disks you hang off those nodes. Our goal is to provide a single pane of glass to monitor application SLAs when they are impacted by underlying infrastructure whether it be using SAN, NAS or Hyper Converged Infrastructure (HCI) like vSAN.  Contact us if you’d like to learn more.