Stretched vSAN Cluster on Ravello

Stretched clustering has been something that I have wanted to set up for my home lab for a while, but it would not be feasible with my current hardware.  Recently I was selected to be a part of the vExpert program for the third year.  One of the perks of this is the use of Ravello cloud.  They have recently made a lot of advancements that has greatly increased the performance.  Now they have also added a bare metal option which which makes the performance even greater.  I am skipping most of the steps to setup vSAN, and trying to only include what is different for a stretched cluster.

The high level architecture of a stretched vSAN cluster is simple.


  • Two physically separated clusters.  This is accomplished using Ravello Availability grouping.
  • A vCenter to manage it all.
  • External witness.  This is needed for the quorum.  Which allows for an entire site to fail with it and the vm’s to fail over.
  • Less than 5ms latency between the two site.  This is needed because all writes need to be acknowledged at the second site.
  • 200ms RTT max latency between clusters and witness.

If this was a production setup there would be a few things to keep in mind.

  • All writes will need to be acknowledged at second site.  So that could be an added 5ms of latency for all writes.
  • You can use layer 2 and 3 networks between the clusters.  You would want at least 10gb for the connection between sites.
  • You can use layer 2 and 3 networks with at least 100mbs for the witness.

Deploying on Ravello


For the architecture of this deployment we will need 3 sections

  • Management
  • Cluster Group 1 (Availability groups simulate separate data center)
  • Cluster Group 2 (Availability groups simulate separate data center)
  • vSAN network and Management/Data Network


There needs to be a DNS server and a vCenter.  I used Server 2016 to setup both the DNS server and Domain Controller.  I used the vCenter appliance 6.5 which I then deployed to an separate mangement ESXi hosts.

Cluster Groups

These consist of 2 ESXi 6.5 hosts each.  They use Availability Groups to keep them physically separated to simulate the stretched cluster.  Group 1 used AG1 and Group 2 used AG2




I manually setup the DNS entries on the Server 2016 DNS, and the two networks consists of the following.

  • Data/Management
  • vSAN


The witness is an easy to deploy OVF.  It creates a nested ESXi host that runs on top of a physical host.  The networking consists of the following

  • vmk0 Management Traffic
  • vmk1 vSAN Traffic

Once the OVF is deployed add the new witness host into vCenter.  You will see it in vCenter as a blue ESXi host.


Creating the Cluster

Now that every is setup and online it is time to create the cluster.  All four hosts need to be in one cluster in vCenter.  Go to the cluster settings and start the setup of vSAN.  Choose configure stretched cluster.

stretched cluster

Now break out the two fault domains to correspond to the availability groups setup on Ravello


After the disk are claimed you now have a stretched vSAN cluster that provides high availability across two data centers.  One cluster or one node can go down, and your VM’s can fail over and keep on running.


vExpert 2018

Last week I was honored with being chosen to be a part of the elite group of VMware vExpert program.  This group is made of individuals who are active in the virtualization community.  This will make it the third year I have been chosen to be a part.  What makes this so great is being a part of the community and the networking that is brings. From the vExpert Slack channel I have learned a lot by talking to my peers.  Anytime I have had a question there was someone there to help out.  I have met many people, and became close friends with some of them.

Thank you everyone for making this community so great, and I hope to see everyone at VMworld this year and Nutanix .NEXT!


vSphere 6.5 Update 1 is the Update You’ve Been…

vSphere 6.5 Update 1 is the Update You’ve Been Looking For!

vSphere 6.5 Update 1 is the Update You’ve Been…

With this update release, VMware builds upon the already robust industry-leading virtualization platform and further improves the IT experience for its customers. vSphere 6.5 has now been running in production environments for over 8 months and many of the discovered issues have been fixed in patches and subsequently rolled into this release.

VMware Social Media Advocacy

ESXi 6.0 to 6.5 Upgrade Failed

The Problem

I am currently running vCenter 6.5 with a mix of 6.0 and 6.5 clusters.  I uploaded the latest Dell customized ESXi 6.5 image to update manager, and had no issues updating my first cluster from 6.0 to 6.5.  In the past I have had some weird issues with update manager, but since 6.5 was integrated into vCenter it has been a lot more stable.  I then proceeded to migrate the next cluster to 6.5 and received this weird error.


I then tried to mount the ISO to the host and install it that way, but now I get a much more detailed error.


The Solution

  1.  SSH into the host and run the following command to see list of installed VIB’s

esxcli software vib list

2. Remove the conflicting VIB.

esxcli software vib remove –vibname=scsi-mpt3sas

3. Reboot!

Now that the conflicting VIB has been removed you can proceed with installing the updates.







How To Setup A Nutanix Storage Container

Nutanix storage uses Storage Pool and Storage Container.  The Storage Pool is the aggregated disks of all or some of the nodes..  You can create multiple Storage Pools depending on the business needs, but Nutanix recommends 1 Storage Pool.  Within the Storage Pool are Storage Containers.  With these containers there are different data reduction settings that can setup to get the optimal data reduction and performance that is needed.

Creating The Container


Once the cluster is setup with a Storage Pool created we are ready to create a Storage Container.

  1. Name the Container
  2. Select Storage Pool
  3. Choose which hosts to add.

That is all looks really simple until the advanced button is clicked.  This is where the Geek Knobs be tweaked.


Advanced Settings

There are quite a few options to choose from, and each setting depends on the different use cases.

  1. Replication Factor –  2 copies of  data in the cluster or 3.  Depending on the use case.
  2. Reserved Capacity – How much guaranteed storage that is needed to be reserved for this container.  All the Containers share storage with the Storage Pool so this is used to guarantee the capacity is always available.
  3. Advertised Capacity – How much storage the connected hosts will see.  This can be use this to control actual usage on the Container side.  To allow
  4. Compression – A setting of 0 will result in inline compression.  This can be set to a higher number for desired performance.
  5. Deduplication – Cache deduplication can be used to optimize performance and use less storage.  Capitcity deduplication will deduplicate all data globally across the cluster.  Deduplication is only post-process, and if enabled after a Container is created then only new writes will be deduplicated.
  6. Erasure Coding – Requires at least 4 nodes.  It is a more efficient than the simple replication factor.  Instead of copies of data it uses parity to be able to rebuild anything.  Enabling this setting will result in some performance impact.


As you can see there can be a lot of impact in performance depending on the settings that you choose.  As always Architecture matters, and you will have to evaluate the needs that your workload has, and  better understanding on how everything works results in a better performing system.


vSAN Storage Policies

I get a lot of questions about vSAN and its storage policies.  “What exactly does FTT mean?”, “What should I set the stripe to?”.  The default storage policy with vSAN is FTT=1 and Stripe=1.  FTT means Failures To Tolerate.  Stripe is how many drives an object is written across.

FTT=1 in a 2 node configuration results in mirror of all data. You can lose one drive or one node which results in 200% storage usage.  In a 4 node or larger configuration it gives you RAID 5 which is data being distributed across nodes with a parity of 1.

FTT=2 requires 6 nodes and you can lose 2 drives or 2 nodes.  This is accomplished through using RAID 6 which is parity of 2, and results in 150% storage usage.

If you want to check the status go to Cluster > Monitor > vSAN > Virtual Objects.  From here you can see the FTT and what disks it involves.  From the picture you can see with the 2 node vSAN cluster the objects are on both nodes resulting in RAID 1 or mirroring.

2017-08-30 12_36_08-vSphere Web Client

Now lets break  down which each setting is.

2017-08-28 10_01_51-vSphere Web Client

Striping breaks apart an object to be written across multiple disks.  In a all  flash environment there is still one cache drive per disk group, but it is used just to cache writes.  The rest of the drives are use for reads.   In a hybrid configuration reads are cached on the SSD, but if that data is not on the disk it will then be retrieved from the slower disks.  This will result in slower performance, but by having the object broken apart, and written across multiple disks it can result in increased read performance.  I would recommend leaving the stripe at 1 unless you encounter any performance issues.  The largest size an object can be is 255GB.  If it grows beyond that size it will be broken up into multiple objects across multiple disks.

Force provisioning allows an object to be provisioned on a datastore even if it is not capable of meeting the storage policy.  Such as you have it set for FTT=2, but the cluster is only 4 nodes so its only capable of FTT=1.

Object Space Reservation controls how much of an object is thick provisioned. By default all storage is thin provisioned with vSAN.  You can change this by increasing the percentage.  If you set it to 100% then the object will be thick provisoined.  You can set it anywhere between 0%-100%.  The only caveats are with deduplication and compression its either 0% or 100%.  By default the page file is 100%, but there is a command line setting you can change if you need to save this space.

Flash Read Cache reserves the amount of cache you want reserved for objects.  The max amount of storage the cache drive can use is 800GB.  If you have have 80 VM’s each with 100GB in storage then the entire cache drive storage is used.  When you power on the 81st VM the cache drive will not be able to give that VM any read cache.  That is why its best to not change the default unless you have a technical reason to.


How To Install vSAN Updates

VMware is making a lot of great progress with vSAN.  One of the biggest pain points with the technology is the daunting HCL.  VMware spends a lot of time working with hardware vendors to validate the various hardware and firmware versions with vSAN.  In the past this meant manually checking to verify you were running on the right firmware version.  Now with vSAN 6.6 it will automatically check if your running the correct firmware version, and if not you can download and install the firmware automatically.  I found one simple issue with this.  The buttons are not very clear about what they do.  As you can see from the below image it looks like those buttons would refresh the page.  The arrow points to the button that “updates all”.  By selecting that it will apply the update to all your host.  You can do this to all at once or through a rolling update.

2017-08-28 09_46_26-Pasted image at 2017_08_18 08_02 AM.png ‎- Photos

Storage Resiliency in Nutanix. (Think about the Architecture!)

Hyperconverged is a great technology, but it does have its caveats.  You have to understand the architecture and design your environment appropriately.   Recently I had a Nutanix cluster that had lost Storage Resiliency.  Storage Resiliency is when there is not enough storage available in the event of the loss of a node.  When storage is written it is done locally and on a remote node.  This provides Data Resiliency, but at the cost of increased storage usage.  This is essentially the same thing as RAID with traditional storage.

I had 3 nodes that were getting close to 80% usage on the storage container.  80% is fairly full and if one node went down the VM’s running on that host would not be able to failover.  They cannot failover because the loss of one node would not provide enough storage for the VM’s on that node to HA to.  Essentially whatever running on that host would be lost including the what is on the drives.  I really wish they would add a feature to not let you use more storage than what is required for resiliency.

I had two options to remedy this.  I could either add more storage which would also require the purchase of another node, or I could turn off replication.  Each cluster was replicating to each other resulting in double the storage usage.  With replication the RPO was 1 hour, but there were also backups which gave an RPO of 24 hours.  An RPO of 24 hours was deemed acceptable so replication was disabled.  The space freed up was not available instantly.  Curator still needed to run background jobs to make the new storage available.

Screen Shot 2016-02-16 at 2.42.49 PM

A lot of time users will just look at the CPU commitment ratio or the memory usage and forget about the storage.  They are still thinking in the traditional 3 tier world.  Like any technology you need to understand how everything works underneath.  At the end of the day Architecture is what matters.

Blog at

Up ↑