Nutanix .NEXT Europe

Nutanix .NEXT is coming up soon on November 27-29 in London.  I was lucky enough to make it to .NEXT in New Orleans this year and it was a great experience.  It was great meeting up with Paul Woodward and Ken Nalbone.  Maybe one of these years I will also be able to attend the one in Europe.

The sessions at .NEXT are top notch and can cover a wide variety of subject, and there is something there for everyone.  If you are planning on getting your NPX anytime soon. there usually is a NPX boot camp the week before the conference.  It does mean a lot of time away from home, but is well worth it. The number one reason to attend any conference is for the networking with your peers.  Over the years I have met many great people that has helped grow my career and create new friendships.  So if you get a chance do what you can to attend.  You will not regret it.

How To Setup A Nutanix Storage Container

Nutanix storage uses Storage Pool and Storage Container.  The Storage Pool is the aggregated disks of all or some of the nodes..  You can create multiple Storage Pools depending on the business needs, but Nutanix recommends 1 Storage Pool.  Within the Storage Pool are Storage Containers.  With these containers there are different data reduction settings that can setup to get the optimal data reduction and performance that is needed.

Creating The Container

1

Once the cluster is setup with a Storage Pool created we are ready to create a Storage Container.

  1. Name the Container
  2. Select Storage Pool
  3. Choose which hosts to add.

That is all looks really simple until the advanced button is clicked.  This is where the Geek Knobs be tweaked.

2.png

Advanced Settings

There are quite a few options to choose from, and each setting depends on the different use cases.

  1. Replication Factor –  2 copies of  data in the cluster or 3.  Depending on the use case.
  2. Reserved Capacity – How much guaranteed storage that is needed to be reserved for this container.  All the Containers share storage with the Storage Pool so this is used to guarantee the capacity is always available.
  3. Advertised Capacity – How much storage the connected hosts will see.  This can be use this to control actual usage on the Container side.  To allow
  4. Compression – A setting of 0 will result in inline compression.  This can be set to a higher number for desired performance.
  5. Deduplication – Cache deduplication can be used to optimize performance and use less storage.  Capitcity deduplication will deduplicate all data globally across the cluster.  Deduplication is only post-process, and if enabled after a Container is created then only new writes will be deduplicated.
  6. Erasure Coding – Requires at least 4 nodes.  It is a more efficient than the simple replication factor.  Instead of copies of data it uses parity to be able to rebuild anything.  Enabling this setting will result in some performance impact.

Summary

As you can see there can be a lot of impact in performance depending on the settings that you choose.  As always Architecture matters, and you will have to evaluate the needs that your workload has, and  better understanding on how everything works results in a better performing system.

 

Storage Resiliency in Nutanix. (Think about the Architecture!)

Hyperconverged is a great technology, but it does have its caveats.  You have to understand the architecture and design your environment appropriately.   Recently I had a Nutanix cluster that had lost Storage Resiliency.  Storage Resiliency is when there is not enough storage available in the event of the loss of a node.  When storage is written it is done locally and on a remote node.  This provides Data Resiliency, but at the cost of increased storage usage.  This is essentially the same thing as RAID with traditional storage.

I had 3 nodes that were getting close to 80% usage on the storage container.  80% is fairly full and if one node went down the VM’s running on that host would not be able to failover.  They cannot failover because the loss of one node would not provide enough storage for the VM’s on that node to HA to.  Essentially whatever running on that host would be lost including the what is on the drives.  I really wish they would add a feature to not let you use more storage than what is required for resiliency.

I had two options to remedy this.  I could either add more storage which would also require the purchase of another node, or I could turn off replication.  Each cluster was replicating to each other resulting in double the storage usage.  With replication the RPO was 1 hour, but there were also backups which gave an RPO of 24 hours.  An RPO of 24 hours was deemed acceptable so replication was disabled.  The space freed up was not available instantly.  Curator still needed to run background jobs to make the new storage available.

Screen Shot 2016-02-16 at 2.42.49 PM

A lot of time users will just look at the CPU commitment ratio or the memory usage and forget about the storage.  They are still thinking in the traditional 3 tier world.  Like any technology you need to understand how everything works underneath.  At the end of the day Architecture is what matters.

Nutanix Node Running Low On Storage

I manage a few Nutanix clusters and they are all flash, because of this the normal tiering of data does not apply. In a hybrid mode, which has both spinning and solid state drives, the SSD will be used for read and write cache. Only moving “cold” data down to the slower spinning drives as needed.   The other day one of the nodes local drives were running out of free space.  It made me wonder what happens if they do fill up?

With Nutanix it tries keeps everything local to the node.  This provides low latency reads since there is no network for data to cross, but the writes still have to go across the network.  The reason for this is that you want at least two copies of data.  One local and one remote.  So when writes happen, it writes synchronously to the local and a remote node.  Writes are written across all nodes in the cluster, and in the event of a lost node it can use all nodes to rebuild that data.

When the drives do fill up nothing really happens.  Everything keeps working and their is no down time.  The local drives become read only.  Writes will then be written to at least two different nodes ensuring data redundancy.

To check the current utilization of your drives it is under Hardware > Table > Disk

Capture

So it is best practice to try to “right size” your workloads.  Try to  make sure that the VM’s will have their storage needs met by the local drives.  HCI is a great technology it just has a few different caveats to consider when designing for your workloads.

If you want a deeper dive about it check out Josh Odgers post about it.

ECC Errors On Nutanix

When logging into a Nutanix cluster I see that I have 2 critical alerts.

1

With a quick search I found KB KB 3357 I SSH into one of the CVM’s running on my cluster, and ran the following command as one line.

ncc health_checks hardware_checks ipmi_checks ipmi_sel_correctable_ecc_errors_check

Looking over the output I quickly found this line.

3

I forwared all the information to support, and will replace the faulty memory module when it arrives.  Luckly so far I have not seen and issues from this memory issue, and I really liked how quick and easy it was to resolve this issue using Nutanix.

Nutanix .NEXT

I was honored this year to be chosen to be a part of the Nutanix Technology Champions.  I have only recently started using Nutanix, but I could clearly see what made it different.  In an age with the Public Clould all the rage.  I could see something like Nutanix really keeping the private data center alive.  The Public Cloud is so popular because it is such an easy way to consume resources as you need them.  I see what Nutanix is doing how it is really converging everything.  Not only are storage and compute converged, but so are the underlying software.  One software package to rule them all instead of a seperate piece of software for every feature you need.

That is why I am so excited about being a part of the NTC, and the free pass I received to go to the Nutanix .NEXT conference.  Conferences can be really expensive, and I would like to thank Nutanix for investing in its community and showing its appreciation.  I know at it I will meet a lot of new people and learn a lot of new things.  Conferences are really about networking with your peers, and sharing knowledge.  For that reason I see them as valuable as any training.  Hopefully I will meet everyone who reads this post, and if you have not registered yet you can at .NEXT

Updating Prism Central to 5.0

Nutanix recently released the 5.0 code, and it has a lot of new really nice features.  In a future post I plan on going over some of the features, and detail why they are so important.

Before you start upgrading all of your Nutanix host you should first upgrade Prism Central to 5.0  This server gives you an overview of all your Nutanix clusters and some management capabilities.  I am still fairly new to Nutanix so I was not sure how to upgrade Prism Central.  Usually you can upgrade from within the Nutanix console, but with this being a brand new release you had to download it directly from the website.  Sometime soon in the future it will be a part of the automatic upgrade within the Console.

At first I was a little confused about how to upgrade.  You would think there would be a separate upgrade for Prism Central since it was originally a  separate install.  Instead the update is included in the AOS 5 download.  Download the complete AOS 5.0 and also download the Upgrade Metadate File.
2017-01-17-08_43_02-nutanix-portalOnce you have that downloaded everything login to Prism Central. Next click the gear icon and then click upload the Prism Central Binary.  Now point  it to the AOS 5.0 download and the metadate file.  Click upgrade and soon you will be running Prism Central 5.0.  2017-01-17-08_46_42-nutanix-web-console

Blog at WordPress.com.

Up ↑