Stretched vSAN Cluster on Ravello

Stretched clustering has been something that I have wanted to set up for my home lab for a while, but it would not be feasible with my current hardware.  Recently I was selected to be a part of the vExpert program for the third year.  One of the perks of this is the use of Ravello cloud.  They have recently made a lot of advancements that has greatly increased the performance.  Now they have also added a bare metal option which which makes the performance even greater.  I am skipping most of the steps to setup vSAN, and trying to only include what is different for a stretched cluster.

The high level architecture of a stretched vSAN cluster is simple.

21640548292_faf47a713e_o

  • Two physically separated clusters.  This is accomplished using Ravello Availability grouping.
  • A vCenter to manage it all.
  • External witness.  This is needed for the quorum.  Which allows for an entire site to fail with it and the vm’s to fail over.
  • Less than 5ms latency between the two site.  This is needed because all writes need to be acknowledged at the second site.
  • 200ms RTT max latency between clusters and witness.

If this was a production setup there would be a few things to keep in mind.

  • All writes will need to be acknowledged at second site.  So that could be an added 5ms of latency for all writes.
  • You can use layer 2 and 3 networks between the clusters.  You would want at least 10gb for the connection between sites.
  • You can use layer 2 and 3 networks with at least 100mbs for the witness.

Deploying on Ravello

blueprint

For the architecture of this deployment we will need 3 sections

  • Management
  • Cluster Group 1 (Availability groups simulate separate data center)
  • Cluster Group 2 (Availability groups simulate separate data center)
  • vSAN network and Management/Data Network

Management

There needs to be a DNS server and a vCenter.  I used Server 2016 to setup both the DNS server and Domain Controller.  I used the vCenter appliance 6.5 which I then deployed to an separate mangement ESXi hosts.

Cluster Groups

These consist of 2 ESXi 6.5 hosts each.  They use Availability Groups to keep them physically separated to simulate the stretched cluster.  Group 1 used AG1 and Group 2 used AG2

AG

Network

 

I manually setup the DNS entries on the Server 2016 DNS, and the two networks consists of the following.

  • 10.0.0.0/16 Data/Management
  • 10.10.0.0/16 vSAN

Witness

The witness is an easy to deploy OVF.  It creates a nested ESXi host that runs on top of a physical host.  The networking consists of the following

  • vmk0 Management Traffic
  • vmk1 vSAN Traffic

Once the OVF is deployed add the new witness host into vCenter.  You will see it in vCenter as a blue ESXi host.

4

Creating the Cluster

Now that every is setup and online it is time to create the cluster.  All four hosts need to be in one cluster in vCenter.  Go to the cluster settings and start the setup of vSAN.  Choose configure stretched cluster.

stretched cluster

Now break out the two fault domains to correspond to the availability groups setup on Ravello

1

After the disk are claimed you now have a stretched vSAN cluster that provides high availability across two data centers.  One cluster or one node can go down, and your VM’s can fail over and keep on running.

 

vSAN Storage Policies

I get a lot of questions about vSAN and its storage policies.  “What exactly does FTT mean?”, “What should I set the stripe to?”.  The default storage policy with vSAN is FTT=1 and Stripe=1.  FTT means Failures To Tolerate.  Stripe is how many drives an object is written across.

FTT=1 in a 2 node configuration results in mirror of all data. You can lose one drive or one node which results in 200% storage usage.  In a 4 node or larger configuration it gives you RAID 5 which is data being distributed across nodes with a parity of 1.

FTT=2 requires 6 nodes and you can lose 2 drives or 2 nodes.  This is accomplished through using RAID 6 which is parity of 2, and results in 150% storage usage.

If you want to check the status go to Cluster > Monitor > vSAN > Virtual Objects.  From here you can see the FTT and what disks it involves.  From the picture you can see with the 2 node vSAN cluster the objects are on both nodes resulting in RAID 1 or mirroring.

2017-08-30 12_36_08-vSphere Web Client

Now lets break  down which each setting is.

2017-08-28 10_01_51-vSphere Web Client

Striping breaks apart an object to be written across multiple disks.  In a all  flash environment there is still one cache drive per disk group, but it is used just to cache writes.  The rest of the drives are use for reads.   In a hybrid configuration reads are cached on the SSD, but if that data is not on the disk it will then be retrieved from the slower disks.  This will result in slower performance, but by having the object broken apart, and written across multiple disks it can result in increased read performance.  I would recommend leaving the stripe at 1 unless you encounter any performance issues.  The largest size an object can be is 255GB.  If it grows beyond that size it will be broken up into multiple objects across multiple disks.

Force provisioning allows an object to be provisioned on a datastore even if it is not capable of meeting the storage policy.  Such as you have it set for FTT=2, but the cluster is only 4 nodes so its only capable of FTT=1.

Object Space Reservation controls how much of an object is thick provisioned. By default all storage is thin provisioned with vSAN.  You can change this by increasing the percentage.  If you set it to 100% then the object will be thick provisoined.  You can set it anywhere between 0%-100%.  The only caveats are with deduplication and compression its either 0% or 100%.  By default the page file is 100%, but there is a command line setting you can change if you need to save this space.

Flash Read Cache reserves the amount of cache you want reserved for objects.  The max amount of storage the cache drive can use is 800GB.  If you have have 80 VM’s each with 100GB in storage then the entire cache drive storage is used.  When you power on the 81st VM the cache drive will not be able to give that VM any read cache.  That is why its best to not change the default unless you have a technical reason to.

 

How To Install vSAN Updates

VMware is making a lot of great progress with vSAN.  One of the biggest pain points with the technology is the daunting HCL.  VMware spends a lot of time working with hardware vendors to validate the various hardware and firmware versions with vSAN.  In the past this meant manually checking to verify you were running on the right firmware version.  Now with vSAN 6.6 it will automatically check if your running the correct firmware version, and if not you can download and install the firmware automatically.  I found one simple issue with this.  The buttons are not very clear about what they do.  As you can see from the below image it looks like those buttons would refresh the page.  The arrow points to the button that “updates all”.  By selecting that it will apply the update to all your host.  You can do this to all at once or through a rolling update.

2017-08-28 09_46_26-Pasted image at 2017_08_18 08_02 AM.png ‎- Photos

2 Node vSAN Design for a Remote Site

I was recently asked to design a solution for a remote site.  The requirements were it had to be cheap, run a few virtual machines, fail over capability and have shared storage The workloads are going to be very light so there is no need for powerful servers.  I had a few options with this.  Technically one server could run the entire workload, but that does not allow for any failure so I needed at least two servers.  This would provide a fail over capacity of only 1.  Bare minimum but acceptable for this use case.  These two servers would need some kind of shared storage. One option would be using a small storage array such as the DELEMC VNXe.  I have used these previously, and they were a great solution for the time, but the times are changing and I think hyperconvergence is the future.  With vSAN 6.5 there were a lot of new features that it would make it a perfect solution.

Previously with any Hyperconvereged solution you needed 3 nodes.  3 nodes are used to check for everything being online.  If 1 out of the 3 nodes goes down the other two nodes can check with each other to verify that the node actually went down.  To get away with using 2 nodes you use an external witness.  This external witness could run on a separate server on the site or at the main data center.

With vSAN you have one SSD per Disk Group (DG) to be used for cache.  Since this had to be a cheap solution my area on constraint was cost, and everything had to be a minimal design to get the job done.  Each server would have 1 DG with an 800GB SSD and 4 4TB 7.2k HHD.  This allowed for FTT=1 or only 1 host could be lost.  There is some risk with this design.  There is always a chance that in a maintenance situation one of the host would be in maintenance mode, and  this would leave a single point of failure.  Because there would only be 1 DG available on the one online host, but this is an acceptable risk for the constraint of cost.

One of my favorite new features with 6.5 is direct connect.  With this you can now directly connect two hosts to each other instead of running through a switch.  Each of these server have 2 1GB ports and 2 10GB ports. The remote site switch infrastructure is only 1GB.  Now 1GB can be a serious limitation for storage, and I wanted to avoid that.  With direct connect you can directly connect the two host to each other, and all storage traffic would then go across that link.  Leaving the 1GB ports to be used by the VM traffic.

As you can tell this is an bare minimum design for vSAN and hyperconvergence.  It does meet all the requirements such as Cost, Availability, Share Storage.  In the event of a host going down HA can restart all the VM’s on the second node providing minimal downtime.  This provides the optimal solution for the requirements of the design.

 

Blog at WordPress.com.

Up ↑