Nasuni at SFD 16

 

GWbnBM4

Nasuni was a new company to me, but they had a great presentation and I really liked what they presented.  They are providing a solution to a real problem that a lot of companies are running into.  The cloud is a great solution to so many problems that IT departments are encountering, but going to the cloud is not always easy as it looks. Nasuni provides a solution that simplifies the distributed NAS.

The first line from the Nasuni website says it best “Nasuni is primary and archive file storage capacity, backup, DR, global file access, and multi-site collaboration in one massively scalable hybrid cloud solution.”  It does this through providing a solution to to have a “unified NAS” on top of public clouds.  It is essentially and overlay that is controlled by using an on premise appliance either through a VM on your current infrastructure or a Nasuni physical appliance and keeps the data locally cached.  This allows that in the event of internet access being down users can continues to access storage, and when internet is restored the data will be synced up.

There are no limits on the amount of files or file sizes.  The files in a solution like this can be access by multiple users and be changing all the time.  To prevent issues files are locked when a user is accessing it.  Once the file is done being accessed the file will remain locked until it is synced up with the cloud.  Through the continuous versioning of files and in the event of malware or such other issues. Files can be rolled back to another version before the incident occurred.  All the data is deduped and compressed for effective storage utilization.  Files can also be encrypted to prevent any data theft.   Managing a solution like this with multiple devices across many sites could be very complex and time consuming, but with Nasuni everything is managed from a single pane limiting operation costs.

1.PNG

 

Nasuni looks like a great product that really simplifies the migration to the cloud.  By supporting the big players such as AWS, Azure and GCP they give customers plenty of options on what cloud they wish to utilize.  With the caching device they ensure that data can be always accessible even if there are issues preventing access to the cloud and limiting the amount of data that has to be transferred.

You can see this presentation from Nasuni and all the other presentations from Storage Field Day 16 at http://techfieldday.com/event/sfd16/

 

NetApp OnCommand Insight

NetApp presented on its product OnComand Insight at Storage Field Day 16 this year.  What made the presentation unique from the rest of the presentations was that it was about an analytics and monitoring tool.  The only such presentation at the event.    OnCommand is an on premise appliance that can be setup as a VM in your environment.   Once it is fully deployed it will start reporting information about your environment.  Unlike other similar products it only reports on what it sees in your data center.  As opposed to comparing your environment to other environments.

255

OnCommand Insight is always watching your environment, and If an issues arises it can be setup to automatically generate a ticket and alert the proper team of the issue.  It supports Restful API so whatever needs to be done can be scripted out, and Licensing is done by the raw capacity.

They also spoke of the product Cloud Insights.  It is not a direct replacement for OnCommand, but takes many of its features and adds on top of it.  Cloud Insights is designed for the modern Hybrid data center. It can monitor both what is on premises and what is running in the cloud.    As more and more companies go hybrid it is imperative to have a tool that can monitor both and give recommendations on where to run a workload.

One of my favorite features is that is agnostic of what it monitors.  Monitoring is done via plugins and there is a large repository where you can download more.  It reminds me a lot of EMC ViPR SRM as it could monitor more than just EMC products, but NetApp has really gone a step further and its capabilities.

Take a look at the presentation from NetApp and the rest of the Storage Field Day 16 presentations here.

Storage Field Day 16

The past six months have brought a lot of changes in my life.  I have been busy changing jobs, and relocating to another state.  With the recent addition of my second child I can easily say I have been really busy.  All my time has taken up with the relocation and kids. Which has not given me much free time to do anything, especially writing. great-im-finally-home-from-work-aaaand-its-bedtime

Thankfully all that is starting to change.  Now that I am getting settled in to my new house, and the kids are getting a little older I am beginning to have a little more free time. With this new found free time I plan on getting back into writing.  With it I have met a lot of great people, and been given some great opportunities.

SFD-Logo-500x499

Thankfully I have been given great inspiration to kick start my writing off.  I have been selected as a delegate for Storage Field Day 16 in Boston, which is a city I have never been to and I am excited to finally visit..  It is truly an honor to be a part of Storage Field Day.  Not only do I get to see of a lot of great presentations from companies that I am already familiar with such as DELLEMC, Infindat and Zerto.  I will also be introduced to some new ones such as Nasuni and Storone.  The best part of the whole experience is being able to meet the fellow delegates who are some of the smartest people in IT.

You can catch all the action on June 27-28, and you may even catch me on camera.  Watch for updates on this site, and live tweets from me the day of the event.  Should be a lot of interesting content coming out.  For up to date information on companies and delegates take a look at http://techfieldday.com/event/sfd16/.

Stretched vSAN Cluster on Ravello

Stretched clustering has been something that I have wanted to set up for my home lab for a while, but it would not be feasible with my current hardware.  Recently I was selected to be a part of the vExpert program for the third year.  One of the perks of this is the use of Ravello cloud.  They have recently made a lot of advancements that has greatly increased the performance.  Now they have also added a bare metal option which which makes the performance even greater.  I am skipping most of the steps to setup vSAN, and trying to only include what is different for a stretched cluster.

The high level architecture of a stretched vSAN cluster is simple.

21640548292_faf47a713e_o

  • Two physically separated clusters.  This is accomplished using Ravello Availability grouping.
  • A vCenter to manage it all.
  • External witness.  This is needed for the quorum.  Which allows for an entire site to fail with it and the vm’s to fail over.
  • Less than 5ms latency between the two site.  This is needed because all writes need to be acknowledged at the second site.
  • 200ms RTT max latency between clusters and witness.

If this was a production setup there would be a few things to keep in mind.

  • All writes will need to be acknowledged at second site.  So that could be an added 5ms of latency for all writes.
  • You can use layer 2 and 3 networks between the clusters.  You would want at least 10gb for the connection between sites.
  • You can use layer 2 and 3 networks with at least 100mbs for the witness.

Deploying on Ravello

blueprint

For the architecture of this deployment we will need 3 sections

  • Management
  • Cluster Group 1 (Availability groups simulate separate data center)
  • Cluster Group 2 (Availability groups simulate separate data center)
  • vSAN network and Management/Data Network

Management

There needs to be a DNS server and a vCenter.  I used Server 2016 to setup both the DNS server and Domain Controller.  I used the vCenter appliance 6.5 which I then deployed to an separate mangement ESXi hosts.

Cluster Groups

These consist of 2 ESXi 6.5 hosts each.  They use Availability Groups to keep them physically separated to simulate the stretched cluster.  Group 1 used AG1 and Group 2 used AG2

AG

Network

 

I manually setup the DNS entries on the Server 2016 DNS, and the two networks consists of the following.

  • 10.0.0.0/16 Data/Management
  • 10.10.0.0/16 vSAN

Witness

The witness is an easy to deploy OVF.  It creates a nested ESXi host that runs on top of a physical host.  The networking consists of the following

  • vmk0 Management Traffic
  • vmk1 vSAN Traffic

Once the OVF is deployed add the new witness host into vCenter.  You will see it in vCenter as a blue ESXi host.

4

Creating the Cluster

Now that every is setup and online it is time to create the cluster.  All four hosts need to be in one cluster in vCenter.  Go to the cluster settings and start the setup of vSAN.  Choose configure stretched cluster.

stretched cluster

Now break out the two fault domains to correspond to the availability groups setup on Ravello

1

After the disk are claimed you now have a stretched vSAN cluster that provides high availability across two data centers.  One cluster or one node can go down, and your VM’s can fail over and keep on running.

 

How To Setup A Nutanix Storage Container

Nutanix storage uses Storage Pool and Storage Container.  The Storage Pool is the aggregated disks of all or some of the nodes..  You can create multiple Storage Pools depending on the business needs, but Nutanix recommends 1 Storage Pool.  Within the Storage Pool are Storage Containers.  With these containers there are different data reduction settings that can setup to get the optimal data reduction and performance that is needed.

Creating The Container

1

Once the cluster is setup with a Storage Pool created we are ready to create a Storage Container.

  1. Name the Container
  2. Select Storage Pool
  3. Choose which hosts to add.

That is all looks really simple until the advanced button is clicked.  This is where the Geek Knobs be tweaked.

2.png

Advanced Settings

There are quite a few options to choose from, and each setting depends on the different use cases.

  1. Replication Factor –  2 copies of  data in the cluster or 3.  Depending on the use case.
  2. Reserved Capacity – How much guaranteed storage that is needed to be reserved for this container.  All the Containers share storage with the Storage Pool so this is used to guarantee the capacity is always available.
  3. Advertised Capacity – How much storage the connected hosts will see.  This can be use this to control actual usage on the Container side.  To allow
  4. Compression – A setting of 0 will result in inline compression.  This can be set to a higher number for desired performance.
  5. Deduplication – Cache deduplication can be used to optimize performance and use less storage.  Capitcity deduplication will deduplicate all data globally across the cluster.  Deduplication is only post-process, and if enabled after a Container is created then only new writes will be deduplicated.
  6. Erasure Coding – Requires at least 4 nodes.  It is a more efficient than the simple replication factor.  Instead of copies of data it uses parity to be able to rebuild anything.  Enabling this setting will result in some performance impact.

Summary

As you can see there can be a lot of impact in performance depending on the settings that you choose.  As always Architecture matters, and you will have to evaluate the needs that your workload has, and  better understanding on how everything works results in a better performing system.

 

vSAN Storage Policies

I get a lot of questions about vSAN and its storage policies.  “What exactly does FTT mean?”, “What should I set the stripe to?”.  The default storage policy with vSAN is FTT=1 and Stripe=1.  FTT means Failures To Tolerate.  Stripe is how many drives an object is written across.

FTT=1 in a 2 node configuration results in mirror of all data. You can lose one drive or one node which results in 200% storage usage.  In a 4 node or larger configuration it gives you RAID 5 which is data being distributed across nodes with a parity of 1.

FTT=2 requires 6 nodes and you can lose 2 drives or 2 nodes.  This is accomplished through using RAID 6 which is parity of 2, and results in 150% storage usage.

If you want to check the status go to Cluster > Monitor > vSAN > Virtual Objects.  From here you can see the FTT and what disks it involves.  From the picture you can see with the 2 node vSAN cluster the objects are on both nodes resulting in RAID 1 or mirroring.

2017-08-30 12_36_08-vSphere Web Client

Now lets break  down which each setting is.

2017-08-28 10_01_51-vSphere Web Client

Striping breaks apart an object to be written across multiple disks.  In a all  flash environment there is still one cache drive per disk group, but it is used just to cache writes.  The rest of the drives are use for reads.   In a hybrid configuration reads are cached on the SSD, but if that data is not on the disk it will then be retrieved from the slower disks.  This will result in slower performance, but by having the object broken apart, and written across multiple disks it can result in increased read performance.  I would recommend leaving the stripe at 1 unless you encounter any performance issues.  The largest size an object can be is 255GB.  If it grows beyond that size it will be broken up into multiple objects across multiple disks.

Force provisioning allows an object to be provisioned on a datastore even if it is not capable of meeting the storage policy.  Such as you have it set for FTT=2, but the cluster is only 4 nodes so its only capable of FTT=1.

Object Space Reservation controls how much of an object is thick provisioned. By default all storage is thin provisioned with vSAN.  You can change this by increasing the percentage.  If you set it to 100% then the object will be thick provisoined.  You can set it anywhere between 0%-100%.  The only caveats are with deduplication and compression its either 0% or 100%.  By default the page file is 100%, but there is a command line setting you can change if you need to save this space.

Flash Read Cache reserves the amount of cache you want reserved for objects.  The max amount of storage the cache drive can use is 800GB.  If you have have 80 VM’s each with 100GB in storage then the entire cache drive storage is used.  When you power on the 81st VM the cache drive will not be able to give that VM any read cache.  That is why its best to not change the default unless you have a technical reason to.

 

Storage Resiliency in Nutanix. (Think about the Architecture!)

Hyperconverged is a great technology, but it does have its caveats.  You have to understand the architecture and design your environment appropriately.   Recently I had a Nutanix cluster that had lost Storage Resiliency.  Storage Resiliency is when there is not enough storage available in the event of the loss of a node.  When storage is written it is done locally and on a remote node.  This provides Data Resiliency, but at the cost of increased storage usage.  This is essentially the same thing as RAID with traditional storage.

I had 3 nodes that were getting close to 80% usage on the storage container.  80% is fairly full and if one node went down the VM’s running on that host would not be able to failover.  They cannot failover because the loss of one node would not provide enough storage for the VM’s on that node to HA to.  Essentially whatever running on that host would be lost including the what is on the drives.  I really wish they would add a feature to not let you use more storage than what is required for resiliency.

I had two options to remedy this.  I could either add more storage which would also require the purchase of another node, or I could turn off replication.  Each cluster was replicating to each other resulting in double the storage usage.  With replication the RPO was 1 hour, but there were also backups which gave an RPO of 24 hours.  An RPO of 24 hours was deemed acceptable so replication was disabled.  The space freed up was not available instantly.  Curator still needed to run background jobs to make the new storage available.

Screen Shot 2016-02-16 at 2.42.49 PM

A lot of time users will just look at the CPU commitment ratio or the memory usage and forget about the storage.  They are still thinking in the traditional 3 tier world.  Like any technology you need to understand how everything works underneath.  At the end of the day Architecture is what matters.

X-IO Technologies Axellio At SFD13

This is part of a series of post from my time at Storage Field Day 13.  You can find all the related content about it here.

Background

X-IO has been around for a while.  It has recently been going through some troubling times along with the storage industry as a whole.  They had gone dark and I have not seen much about them since then.  Like the Phoenix rising they are now they are ready to show off their new product at Storage Field Day 13.

They were founded in 2002 as the Seagate Advanced Storage Group with the goal to build a scalable storage array with zero trade-offs.  The team included engineers from Digital Equipment Corporation, Hewlett-Packard and Storage Works.  This eventually led to in 2006 the X-IO Intelligent Storage Element (ISE). Then in 2007 they were purchased by Xiotech based out of Minneapolis.  In 2008 the ISE-1 product was introduced.  Then in 2012 they moved to Colorado Springs which is where they had the SFD presentation.  Some of the current products include iglu and ISE series.

X-IO-Axellio-Logo-BBD-Large-X-Green-IO-300x105

Axellio is not for your general Data Center workloads.  It is being built to solve specific problems.  Problems that I did not know even existed.   One of the use case examples was for the oil industry.  Currently a survey ship will travel across the ocean surveying the ocean floor.  This creates petabytes of data that gets stored on the ship.  Not all of the data is able to be processed locally.  This data will then be migrated to a data center somewhere else to be processed.  This is just one use case that Axellio can help solve.

The platform itself I would call a form a converged system.  Normally a converged system includes the storage, compute, and software.  Axellio includes the compute and storage, but not the software.  It would be up to the customer or partner to implement their own software stack to run on the hardware.  Maybe sometime in the future they will also include the software.

Hardware

The Axellio hardware is a 2U appliance which incorporates 2 servers or nodes.  Each node has 2 Intel Xeon e5-26xx CPU’s and 1 TB RAM or NVRAM.  This gives us in one appliance 4 CPU’s and 2TB RAM or NVRAM.  With the current CPU’s that gives us up to 88 cores and 176 threads.  Each appliance can hold up to 72 drives of Dual Port NVMe 2.5″ SSD drives which gives us up to 460TB of storage.  This is achieved using 6 trays with 12 drives each.  Offload modules can also be added.  Such as Intel Phi for CPU extension for parallel compute or Nvidia K2 GPU for VDI.  The appliances can all be attached to Ethernet switches ranging from 4x 10GB, 4x 40GB and up to 4x 100GB.

Capture2.PNG

Architecture (What Really Matters)

FabricExpress is a PCIe interconnect that allows the two nodes to connect directly two the 72 NVMe drives.  By using NVMe drives they are able to connect to the CPU directly over the PCIe lanes.  This creates super fast local storage for the nodes to connect to.  Normally in a converged system there would be an external switch that storage traffic would have to go across which always adds latency.

Axellio Performance

When it comes to performance Axellio does not disappoint.  12 millions IOPS  with 35 microsecond latency at 60 GB’s sustained bandwidth at 4KB writes.  That’s a lot of power in a little box.  Below is an image of some more test that was only utilizing 48 drives.

IMG_8408cropped

Brandon’s Take

One of the most exciting parts of the even was that I was able to go in the back and see where they were building this product.  I could see all kinds of machinery that I had no idea how to use or what it was used for.  You can open up a product once its shipped to you, but its an entirely different thing to see its different stages of being built.  Really makes you appreciate what all goes into creating these products.

Axellio is being built to solve a problem that I did not know much about before the event.  The problems it can solve can be very lucrative for X-IO and for the business that use it.  They are one of the few companies that are doing something different when all the storage products are starting to look the same.

Fellow Delegate Posts

SFD13 PRIMER – X-IO AXELLIO EDGE COMPUTING PLATFORM  by Max Mortillaro

Axellio, next gen, IO intensive server for RT analytics by X-IO Technologies by Ray Lucchesi

Full Disclosure

X-IO provided us the ability to attend the session.  This included a shirt, USB drive and some Thai food.

 

 

 

Storage Field Day Is Almost Here

SFD-Logo-150x150

I am getting really excited about Storage Field Day coming up.  This is going to be my first Field Day experience, and it is going to bring many brand new experiences for me.  I am ready to see what the vendors have to present, and ready to learn many new things.

Netapp_logo.svg_-52x60

I have been on vacation for the last week, so when I got back I was really surprised about some recent changes.  The vendor Seagate is no longer going to be presenting, and in their place will be Netapp.  Netapp is a company that I really did not know much about until it its acquisition of Solidfire.  I had been following Soldfire for some time before Netapp acquired them, and they have recently announced a Hyper-Converged product.  The HCI market is becoming much more competitive with an increasing number of vendors in it.  All the major storage vendors have an HCI offering such as DellEMC with their VxRail, HPE with Simplivity, and Cisco with HyperFlex.  It only makes sense that Netapp would get into the market.  I am curious to see how their product will differentiate from all of its competitors.

Check back next week for an update on Netapp, and all the other vendors presenting at Storage Field Day 13.  I plan to have many more post to cover everything that will be presented at Storage Field Day.

Nutanix Node Running Low On Storage

I manage a few Nutanix clusters and they are all flash, because of this the normal tiering of data does not apply. In a hybrid mode, which has both spinning and solid state drives, the SSD will be used for read and write cache. Only moving “cold” data down to the slower spinning drives as needed.   The other day one of the nodes local drives were running out of free space.  It made me wonder what happens if they do fill up?

With Nutanix it tries keeps everything local to the node.  This provides low latency reads since there is no network for data to cross, but the writes still have to go across the network.  The reason for this is that you want at least two copies of data.  One local and one remote.  So when writes happen, it writes synchronously to the local and a remote node.  Writes are written across all nodes in the cluster, and in the event of a lost node it can use all nodes to rebuild that data.

When the drives do fill up nothing really happens.  Everything keeps working and their is no down time.  The local drives become read only.  Writes will then be written to at least two different nodes ensuring data redundancy.

To check the current utilization of your drives it is under Hardware > Table > Disk

Capture

So it is best practice to try to “right size” your workloads.  Try to  make sure that the VM’s will have their storage needs met by the local drives.  HCI is a great technology it just has a few different caveats to consider when designing for your workloads.

If you want a deeper dive about it check out Josh Odgers post about it.

Blog at WordPress.com.

Up ↑