vSphere 6: Multi-Processor Fault Tolerance (SMP-FT)

VMworld 2008; VMware announced Fault Tolerance (FT) in ESX 4 as a new feature that allows continuous availability for selected virtual machines (VM). FT is a technology that allows continuous availability virtual machines with literally zero downtime and zero data loss, even surviving server failures, while staying completely transparent to the guest software stack.

While it was a great new feature; FT enabled VM’s were not a very common sight in datacenter environments.

Legacy FT

FT not being a very common sight in datacenters was mostly due to the restriction of only 1 vCPU per FT virtual machine. This limitation was quite limiting the usability of FT in your datacenter. Most business critical VM’s, that could benefit from FT the most, were in need of multiple vCPU’s in order to meet the performance requirements. Further challenges were the limited options on how to back-up your FT enabled VM’s as creating VMware snapshots was not possible.

Other cluster and host requirements for legacy FT, or UP-FT, were:

  • A HA enabled cluster is required.
  • Shared storage is required.
  • VMDK’s must be eager zeroed thick provisioned.
  • Host CPU’s must be VMware FT capable and belong to the same processor model family.
  • Ensure that all ESX hosts in the VMware HA cluster have identical ESX versions and patch levels.

SMP-FT

With the release of VMware vSphere 6 some of the legacy FT restrictions are taken care off!! The most important one being the vCPU restriction. You can now have up to 4 vCPU‘s running per FT VM and use up to 64GB memory per FT VM. The feature now officially goes by the name SMP-FT, short for “Support for Multi-Processor Fault Tolerance”.

A logical overview of a SMP-FT enabled VM is shown in this diagram (as seen in the VMworld 2014 session BCO2701):

vsphere6-FTbasics

The method of keeping the vCPU of the primary VM in lockstep with the secondary VM (as used in legacy FT) didn’t cut it anymore. The lockstep method proved hard to scale. SMP-FT now uses an algorithm which provides ‘incredibly fast check-pointing’. This new technique leverages a modified X-vMotion process for continuous availability. Basically a never-ending vMotion as William Lam says! 🙂

An important difference with legacy FT is that SMP-FT now uses different storage/VMDK’s for the primary VM and the secondary VM. Meaning storage for your FT VM’s is no longer shared which should mean write IO’s are replicated to the secondary VM VMDK. Note that the use of eager zero thick disks is no longer a requirement. So thin, lazy zeroed thick and eager zero thick provisioned disks are all supported in SMP-FT!

A good improvement is that SMP-FT enabled VM’s now supports VADP (vStorage API for Data Protection). VADP is used to create non-disruptive VMware snapshots! It should be great to have the ability to use your secondary VM as backup source as it can take the pressure of the primary VM. I didn’t test yet if this is the case or if this maybe is it’s default behavior.

Design considerations

As SMP-FT brings continuous availability protection for VMs containing 4 vCPU’s, it is a good way to further protect your business critical workloads next to VMware HA. There are, however, some design considerations to keep in mind:

  • The necessary resources for a SMP-FT protected VM doubles when enabling SMP-FT. A secondary VM is created consuming the same resources as the primary VM.
  • Keep in mind that a SMP-FT protected work-load cannot exceed the maximum of 4 vCPU’s.
  • A private network for FT logging is strongly recommended as FT logging traffic is unencrypted. It contains guest network, IO data and memory contents of the guest OS.
  • As SMP-FT can create significant bandwidth on the FT logging enabled VMkernel adapter, the use of a 10GbE connectivity should be the only way to go.
  • Despite the minimum presence of it, SMP-FT enabled VM’s still face a higher latency compared to standard VM’s. So make sure your business critical applications can coop with the performance penalty.
  • Remove any unnecessary devices from your SMP-FT protected VM’s for optimal use of the FT logging traffic.
  • Leverage newest Linux/Windows OS versions which have TCP auto-tuned & better TCP stacks.
  • Keep the ‘not supported’ vSphere features when using SMP-FT in check!

Demo

I did not have the time to take on a full video demo. So I went on a search and found a demo which is hard to transcend. 🙂
Although it seems like an early development demo, the session BCO5065: vSphere Fault Tolerance for Multiprocessor VMs is a pretty good one. It was showed at VMworld 2013! SMP-FT has gone through some changes since then, but the basics as shown in this video remain the same.

 

Conclusion

With the arrival of one of our most anticipated feature within vSphere, we can now re-design/configure some of our VM’s on which SMP-FT would be very complementary. Using it to protect your vCenter VM’s seems like a natural fit!! We’re curious to see if the maximum number if vCPU’s is further raised in feature vSphere 6 releases…

 

Source: William Lam his twitter feed provided some good details which I used in this post!

The following two tabs change content below.
I am a virtualization enthusiast with a love for virtual datacenters! About 15 years of experience in IT. VMware VCDX #212. Working at HIC (Hagoort ICT Consultancy) as fully independent consultant/architect!

Latest posts by Niels Hagoort (see all)

Comments (6)

Leave a Reply

Your email address will not be published. Required fields are marked *