The unsung heroes of DRS in vSphere 6.5

VMware vSphere 6.5 has been out for a while now. Some features didn’t get as much publicity as it should, at least in my mind. A few of the really cool features that were introduced in vSphere 6.5 related to DRS, opening up for additional configuration and control of your cluster and virtual machines. It’s found under DRS settings and is called ”Additional Options” where you’ll see 3 new settings:

  • VM distribution

Under normal circumstances DRS will load balance VMs so each virtual machine has the resources it requires, taking into account the cost of vMotioning workloads around to new hosts (as a side note, CPU and RAM has been used since inception of DRS but now it’s also network aware) as one decision-making parameter. This will result in a cluster where you normally won’t see an exact split/distribution/load balancing of the virtual machines cross the hosts, some hosts may be running more VMs than the others. That’s not a problem since the VMs have the resources they need, they are not starved for resources, but the user perception of the DRS functionality is that VMs will be split equally across all hosts. One concern that might require some rethinking of that approach is availability, what if the host running the majority of VMs crash? HA will certainly restart the VMs but a lot of services will be impacted in the environment – Enter VM distribution!

The purpose of VM distribution is to have a fair distribution of VMs on your hosts. Or as it says on one of the official VMware blogs: ”This will cause DRS to spread the count of the VMs evenly across the hosts.”

However, DRS will always prioritize load balancing over the VM spread, so even distribution of VMs is done on a best-effort basis.

 

  • Memory Metric for Load Balancing

DRS will mainly consider active memory when load balancing, as opposed to consumed or granted memory to VMs. DRS will also take some overhead memory into consideration, so it will use active memory + a 25 % overhead as its primary metric. Active memory is a ”mathematical model that was created in which the hypervisor statistically estimates what Active memory is for a virtual machine by use of a random sampling and some very smart math.  Due diligence was done at the time this was designed to prove the estimation model did represent real life.  So, while it is an estimate, we should not be concerned about its accuracy.”

A really good read on the subject, and where the quote above came from, is “Understanding vSphere Active Memory”.

Now it’s possible to change what memory metric that is used when load balancing, instead of active memory we can use consumed memory. And by the way, the definition of consumed memory: ”Amount of guest physical memory consumed by the virtual machine for guest memory. Consumed memory does not include overhead memory. It includes shared memory and memory that might be reserved, but not actually used.

Virtual machine consumed memory = memory granted – memory saved due to memory sharing” – Quote from “Memory Performance Counters – An Evolved Look at Memory Management”.

This option is equivalent to setting the existing cluster advanced option PercentIdleMBInMemDemand with a value of 100.

 

  • CPU Over-Commitment

It’s now possible to enforce a specific level of overcommit of vCPU to pCPU resources in the cluster, if you only want to allow 4 virtual CPUs per physical core in your cluster you can now have it configured and enforced by the cluster. When the cluster runs out of pCPU resources you won’t be able to power on any more virtual machines. You define a percentage of over commitment of the entire cluster, setting it from 0 % (= no over commitment allowed or put in another way: 1 vCPU = 1 pCPU) up to 500 %.