vSAN Storage Policies

I get a lot of questions about vSAN and its storage policies.  “What exactly does FTT mean?”, “What should I set the stripe to?”.  The default storage policy with vSAN is FTT=1 and Stripe=1.  FTT means Failures To Tolerate.  Stripe is how many drives an object is written across.

FTT=1 in a 2 node configuration results in mirror of all data. You can lose one drive or one node which results in 200% storage usage.  In a 4 node or larger configuration it gives you RAID 5 which is data being distributed across nodes with a parity of 1.

FTT=2 requires 6 nodes and you can lose 2 drives or 2 nodes.  This is accomplished through using RAID 6 which is parity of 2, and results in 150% storage usage.

If you want to check the status go to Cluster > Monitor > vSAN > Virtual Objects.  From here you can see the FTT and what disks it involves.  From the picture you can see with the 2 node vSAN cluster the objects are on both nodes resulting in RAID 1 or mirroring.

2017-08-30 12_36_08-vSphere Web Client

Now lets break  down which each setting is.

2017-08-28 10_01_51-vSphere Web Client

Striping breaks apart an object to be written across multiple disks.  In a all  flash environment there is still one cache drive per disk group, but it is used just to cache writes.  The rest of the drives are use for reads.   In a hybrid configuration reads are cached on the SSD, but if that data is not on the disk it will then be retrieved from the slower disks.  This will result in slower performance, but by having the object broken apart, and written across multiple disks it can result in increased read performance.  I would recommend leaving the stripe at 1 unless you encounter any performance issues.  The largest size an object can be is 255GB.  If it grows beyond that size it will be broken up into multiple objects across multiple disks.

Force provisioning allows an object to be provisioned on a datastore even if it is not capable of meeting the storage policy.  Such as you have it set for FTT=2, but the cluster is only 4 nodes so its only capable of FTT=1.

Object Space Reservation controls how much of an object is thick provisioned. By default all storage is thin provisioned with vSAN.  You can change this by increasing the percentage.  If you set it to 100% then the object will be thick provisoined.  You can set it anywhere between 0%-100%.  The only caveats are with deduplication and compression its either 0% or 100%.  By default the page file is 100%, but there is a command line setting you can change if you need to save this space.

Flash Read Cache reserves the amount of cache you want reserved for objects.  The max amount of storage the cache drive can use is 800GB.  If you have have 80 VM’s each with 100GB in storage then the entire cache drive storage is used.  When you power on the 81st VM the cache drive will not be able to give that VM any read cache.  That is why its best to not change the default unless you have a technical reason to.

 

X-IO Technologies Axellio At SFD13

This is part of a series of post from my time at Storage Field Day 13.  You can find all the related content about it here.

Background

X-IO has been around for a while.  It has recently been going through some troubling times along with the storage industry as a whole.  They had gone dark and I have not seen much about them since then.  Like the Phoenix rising they are now they are ready to show off their new product at Storage Field Day 13.

They were founded in 2002 as the Seagate Advanced Storage Group with the goal to build a scalable storage array with zero trade-offs.  The team included engineers from Digital Equipment Corporation, Hewlett-Packard and Storage Works.  This eventually led to in 2006 the X-IO Intelligent Storage Element (ISE). Then in 2007 they were purchased by Xiotech based out of Minneapolis.  In 2008 the ISE-1 product was introduced.  Then in 2012 they moved to Colorado Springs which is where they had the SFD presentation.  Some of the current products include iglu and ISE series.

X-IO-Axellio-Logo-BBD-Large-X-Green-IO-300x105

Axellio is not for your general Data Center workloads.  It is being built to solve specific problems.  Problems that I did not know even existed.   One of the use case examples was for the oil industry.  Currently a survey ship will travel across the ocean surveying the ocean floor.  This creates petabytes of data that gets stored on the ship.  Not all of the data is able to be processed locally.  This data will then be migrated to a data center somewhere else to be processed.  This is just one use case that Axellio can help solve.

The platform itself I would call a form a converged system.  Normally a converged system includes the storage, compute, and software.  Axellio includes the compute and storage, but not the software.  It would be up to the customer or partner to implement their own software stack to run on the hardware.  Maybe sometime in the future they will also include the software.

Hardware

The Axellio hardware is a 2U appliance which incorporates 2 servers or nodes.  Each node has 2 Intel Xeon e5-26xx CPU’s and 1 TB RAM or NVRAM.  This gives us in one appliance 4 CPU’s and 2TB RAM or NVRAM.  With the current CPU’s that gives us up to 88 cores and 176 threads.  Each appliance can hold up to 72 drives of Dual Port NVMe 2.5″ SSD drives which gives us up to 460TB of storage.  This is achieved using 6 trays with 12 drives each.  Offload modules can also be added.  Such as Intel Phi for CPU extension for parallel compute or Nvidia K2 GPU for VDI.  The appliances can all be attached to Ethernet switches ranging from 4x 10GB, 4x 40GB and up to 4x 100GB.

Capture2.PNG

Architecture (What Really Matters)

FabricExpress is a PCIe interconnect that allows the two nodes to connect directly two the 72 NVMe drives.  By using NVMe drives they are able to connect to the CPU directly over the PCIe lanes.  This creates super fast local storage for the nodes to connect to.  Normally in a converged system there would be an external switch that storage traffic would have to go across which always adds latency.

Axellio Performance

When it comes to performance Axellio does not disappoint.  12 millions IOPS  with 35 microsecond latency at 60 GB’s sustained bandwidth at 4KB writes.  That’s a lot of power in a little box.  Below is an image of some more test that was only utilizing 48 drives.

IMG_8408cropped

Brandon’s Take

One of the most exciting parts of the even was that I was able to go in the back and see where they were building this product.  I could see all kinds of machinery that I had no idea how to use or what it was used for.  You can open up a product once its shipped to you, but its an entirely different thing to see its different stages of being built.  Really makes you appreciate what all goes into creating these products.

Axellio is being built to solve a problem that I did not know much about before the event.  The problems it can solve can be very lucrative for X-IO and for the business that use it.  They are one of the few companies that are doing something different when all the storage products are starting to look the same.

Fellow Delegate Posts

SFD13 PRIMER – X-IO AXELLIO EDGE COMPUTING PLATFORM  by Max Mortillaro

Axellio, next gen, IO intensive server for RT analytics by X-IO Technologies by Ray Lucchesi

Full Disclosure

X-IO provided us the ability to attend the session.  This included a shirt, USB drive and some Thai food.

 

 

 

Weathervane, a benchmarking tool for…

Weathervane, a benchmarking tool for virtualized infrastructure and clouds – now open source!

Weathervane, a benchmarking tool for…

Weathervane is a performance benchmarking tool developed at VMware. It lets you assess the performance of your virtualized or cloud environment by driving a load against a realistic application and capturing relevant performance metrics. You might use it to compare the performance characteristics of two different environments, or to understand the performance impact of some change in an existing environment.


VMware Social Media Advocacy

ECC Errors On Nutanix

When logging into a Nutanix cluster I see that I have 2 critical alerts.

1

With a quick search I found KB KB 3357 I SSH into one of the CVM’s running on my cluster, and ran the following command as one line.

ncc health_checks hardware_checks ipmi_checks ipmi_sel_correctable_ecc_errors_check

Looking over the output I quickly found this line.

3

I forwared all the information to support, and will replace the faulty memory module when it arrives.  Luckly so far I have not seen and issues from this memory issue, and I really liked how quick and easy it was to resolve this issue using Nutanix.

vCenter Fails after Time Zone Change

We recently changed our NTP server, and I needed to update all or hosts and vCenters.  I have a handy powershell script to update the ESXi hosts, but that script does not work on the vCenter servers.  I log into the server using port 5480 to gain access to the vCenter Management. I login as root and notice that the time zone is UTC.  I am in the Central time zone so I wanted to change it from UTC.  Turns out if you do that it break everything.  I had to learn this the hard way, and once I changed the time zone I was not able to log into vCenter.  I had to then go back and change the time zone back to UTC to regain access. Capture.

VMworld 2017 Call for Papers is now open!

Inspire the VMware community, VMworld 2017 Call for Papers is now open!

VMworld 2017 Call for Papers is now open!

We’re looking for speakers who will inspire the VMware community. Have you integrated VMware solutions and technologies in an innovative way? Do you have a best practice or individual technical tips and tricks to recommend? Can you tell us about an amazing app that leverages VMware solutions to improve your business? If so, please join us at VMworld 2017.


VMware Social Media Advocacy

After SQL Installation Drives Inaccessible

Recently I setup a VM using standard best practices.  Such as using PVSCSI for the Database drives, and formatting them GPT.  Everything checked out so I passed it on to the DBA to finish up with the SQL installation.  Soon later I received an email from him saying the drives were inaccessible.  I had no idea why because I had setup a lot of VM’s the same way without any issues.  I did a quick Google search for “drive inaccessible after SQL installation”.  It turns out a lot of people have the same issue as me.  It is the result of Microsoft locking down the security for the data on the drives.  They set it up so only SQL could have access to the files to prevent certain security issues.  You can read more about it here.

Blog at WordPress.com.

Up ↑