Wednesday, August 27, 2008

Top Tips for Deploying VI

The following “top tips” highlight some issues that can arise in a VI deployment. They cover things which are sometimes hard to diagnose, or which might result in a problem weeks or months after some seemingly innocuous action. It is meant to shed some insight on “latent” issues, that is, those which don’t result in immediate warnings or errors when the root cause event occurs. These have been collected from customer experience gathered over time by the VI team, and will be posted in two parts. We welcome your comments on these and any other “gotchas” that you might have encountered.

  1. Make sure DNS is fully configured. This includes ensuring proper, consistent configuration for all of the following: short name, fully-qualified name, forward lookup, and reverse lookup. Otherwise, you’ll see ESX hosts intermittently disconnect from VirtualCenter, and HA might not work properly.
  2. Don’t use Virtual SMP for applications which don’t need it. Most applications are single-threaded and therefore cannot benefit from more than one virtual CPU. Assigning just a single CPU to VMs maximizes the physical CPU utilization of ALL of your cores, and avoids underutilized cores. If your applications were converted from running on 2 physical servers, don’t assume they need to – they might have been running on the smallest practical server configuration available. Start with a single VCPU, and then monitor the performance to see whether increase the number of virtual CPUs actually makes a difference.
  3. Make sure you monitor the “% ready” metric. There’s one new, key metric in managing virtualization environments that is doesn’t exist in physical environments: “% ready”. “% ready” measures, for a given VM, the amount of time that a VM is able to run on the physical CPU but is unable to run because ESX doesn’t schedule it. In a well-running VMware environment, “% ready” should always be near 0. When it climbs to non-zero, it’s indicating that your applications are not getting the CPU time that they desire. This typically points to one of many possible, real underlying issues: maybe you don’t have enough physical memory, and ESX is swapping out VM’s… Or maybe you’re simply running too many VM’s on the box… Or, maybe you’ve got the overuse of virtual SMP issue described above, and other VM’s are getting starved.
  4. Watch your snapshot space growth. Because snapshots live on your disk and grow over time, you want to be careful that you have enough spare capacity on your disk. Every snapshot consists of a “REDO” file; for the most recent snapshot, all new disk writes associated with the VM are recorded to this file. A REDO file has the potential in the extreme to grow to be the size of the original disk, and the REDO file of every snapshot that you maintain continues to occupy disk space. You want to make sure that you have enough “headroom” on your datastore to handle such growth over time. Operations that might dramatically increase the size of your snapshots include the following: an OS service pack update, application reinstall, or a disk defrag inside the VM.
  5. Make sure the SQL Server Agent is up and running on the VirtualCenter DB. VirtualCenter depends on Microsoft SQL Server Agent to perform stats rollups. However, VirtualCenter does not have the ability to ensure this service is running on the DB server. If the user has it disabled, or the service is shut down at some point, the VI Client will not show expected stats (weekly, monthly…). In addition, since daily data is not rolled up, it accumulates in the database, thus degrading performance and consuming more and more space.
  6. Team your management NICs if using VMware High Availability (VMware HA). This will help you avoid false alarms (i.e. false VMware HA failovers of VM’s) in situations when you temporarily lose connectivity between your ESX hosts (e.g. when there’s a momentary network outage, or even during a network switch maintenance operation).

1. If you have an active/passive FC storage array (most mid-range arrays fall into this bucket), be careful about setup. Firstly, be sure to have redundant paths from FC switches to your arrays’ storage processors. Secondly, be sure to use “MRU” (the default) for the path-selection policy and not “fixed”.

The best way to explain the first issue is with a picture. What’s wrong with the following configuration?


Although you might believe that you have full redundancy between the hosts and the switches, and specifically that you can survive one HBA failure on each host, the reality is that you don’t have enough redundancy. Here’s one failure scenario that won’t be handled properly:


The reason is that, with active/passive storage arrays, a given LUN can only be presented on one storage processor at a given time. The LUN can shift from one storage processor to another, but such a shift takes many seconds (potentially up to 30 seconds). If both HBA’s have failed (as in the above diagram), then the ESX hosts won’t be able to access to the same LUN at the same time. Host 1 attempts to access the LUN on storage processor 1; host 2 attempts to access the same LUN on storage processor 2; and you end up with a ping-pong effect, or a “path thrashing” effect due to the active/passive array shifting the LUN back and forth between the two storage processors. Performance of VM’s on both hosts will be erratic and penalized.

The solution is simple: create redundant connections from the FC switches to the array storage processors, as shown below.


There is a second noteworthy issue with active/passive arrays related to this same path thrashing effect: make sure that you use the “MRU” path selection policy (the default) rather than the “fixed” path selection policy. If you use “fixed”, you may make the mistake of forcing the use of a particular storage processor for one host… but a different storage processor for another host… and thus end-up in a similar LUN ping-pong or path thrashing situation.

For more details about path thrashing see, the SAN Configuration Guide.

2. When configuring your VI environment for VMotion, make sure that your physical network switches are configured properly; in particular, make sure that each port has the right network (e.g. VLAN) visibility.

VMotion requires that the destination ESX host have similar network connectivity to the source ESX host (so that, for example, the VM can continue access to its assigned VLAN after the VMotion). VirtualCenter checks for correct virtual switch configuration on the source and destination ESX; however, VirtualCenter does not for correct configuration of the physical network switches. In a larger VI deployment where many network switch ports are involved, a single misconfiguration of a single physical switch port can be hard to detect. The symptom will be as follows: when the particular VM relying on a particular VLAN id VMotion migrates to the particular ESX host with the misconfigured switch port, the VM loses all network connectivity. Solution: when adding new ESX hosts to a network, take the time to double-check your network switch port configurations to make absolutely sure that all the VLANs are correctly configured.

3. When using VMware HA, take note of how memory reservations are specified and used to reserve cluster failover capacity. Using more consistent reservations or disabling admission control are both appropriate workarounds if the calculations are overly conservative in your environment.

How VMware HA works: If a VMware ESX host fails, VMware HA will restart the VMs affected by that failure on alternate hosts in the cluster. In order to do so, HA must reserve failover capacity within the cluster. HA currently achieves this by implementing an “admission control” policy that prevents (or warns against) the powering on of VMs that would encroach upon the failover capacity being reserved. In some cases, however, the admission control calculations may be too conservative.

Example scenario: Suppose you have 19 VMs, each with a 300 MB memory reservation. To power-on all of these VM’s, you need 5.7GB of RAM (=19*0.3) (total within the cluster, after allocating space for potential host failures, and not accounting for memory sharing in ESX). Since all reservations are equivalent, HA defines an average VM to require 300 MB of memory.

Now, let’s suppose you power-on a 20th VM with a 2 GB memory reservation. Instead of calculating memory requirements as 7.7 GB (=19 x 0.3 + 1 x 2), HA takes a more conservative approach and redefines the average VM to be the biggest reservation observed. With the higher reservation specified, HA will cautiously assume that every VM need 2 GB of memory, and will ask for 40GB (=20*2) of RAM to be set aside for total runtime and failover capacity within the cluster. These calculations are intended to be conservative to ensure that sufficient spare capacity is available, without fragmentation across hosts within a cluster.

In many cases (such as clusters with widely varying sizes of hosts and VMs), however, these calculations can be more conservative than desirable, and can lead to “insufficient failover capacity” warnings when powering on more VMs.

Two potential approaches are recommended if you are observing these warnings, or want to avoid them within a heterogeneous cluster configuration:

Approach 1: Either lower the reservations on your most demanding VM’s, or remove the reservations skewing the calculations and rely upon “shares” instead. See the resource management guide for differences between reservations and shares.

Approach 2: Alternatively, configure HA to disable strict admission control. Host failures will still be detected and acted upon, but VMware HA will not prevent the starting of new VMs due to insufficient failover capacity.

4. When sizing your LUNs, a medium-sized LUN (~500GB) seems best for most situations.

Small LUN’s (and VMFS volumes) can result in SAN management complexity (too many LUNs to manage). Very large LUN’s can result in performance issues (due to VMFS lock-contention for certain operations), too coarse a granularity for troubleshooting and performance tuning, and failure/error isolation. The below chart summarizes some of the considerations. Details are provided on page 72 of the VI 3 SAN Design Guide.

Smaller LUN’s
Medium-sized LUN’s
Larger LUN’s
VMFS: Metadata overhead Some overhead (0.5%) Negligible overhead (<0.1%) Negligible overhead (<0.1%)
VMFS: Lock-contention during VM creation operations, or during VCB-based backup operations (*) Near zero contention Some contention Much contention
Impact of a failure or error, difficulty of troubleshooting Affects a few VM’s Affects 20-30 VM’s Affects many VM’s
Ease of SAN mgmt Hard (many LUN’s to manage) Medium Easy (just 1 LUN to manage)
Ease of tuning performance (**) High (tunable per the few VM’s on a LUN) Medium (tunable for 20-30 VM’s at a time) Low (one setting for many, many VM’s)
Flexibility in specifying value-added services (***) High (different LUNs can have different policies or settings) Medium (tunable for 20-30 VM’s at a time) Low (many VMs share the same policies or settings)

(*) File creation in VMFS grabs a SCSI lock on the LUN. Since each LUN has a limited number of SCSI locks, excessive concurrent file creation in VMFS can cause lock contention, which can hurt performance. This can be apparent if multiple users are concurrently creating VM’s (and therefore VMFS files), or when a VCB-based backup process is concurrently backing up multiple VM’s (and is therefore concurrently creating multiple VMFS REDO files)
(**) e.g. RAID-level, array caches, queue depths, path selection/path dedication
(***) e.g. Backup, other data protection features such as replication, mirroring, etc., capacity optimization features such as de-dupe, thin-provisioning, etc., security and encryption features


No comments: