Hyperconverged is a great technology, but it does have its caveats. You have to understand the architecture and design your environment appropriately. Recently I had a Nutanix cluster that had lost Storage Resiliency. Storage Resiliency is when there is not enough storage available in the event of the loss of a node. When storage is written it is done locally and on a remote node. This provides Data Resiliency, but at the cost of increased storage usage. This is essentially the same thing as RAID with traditional storage.
I had 3 nodes that were getting close to 80% usage on the storage container. 80% is fairly full and if one node went down the VM’s running on that host would not be able to failover. They cannot failover because the loss of one node would not provide enough storage for the VM’s on that node to HA to. Essentially whatever running on that host would be lost including the what is on the drives. I really wish they would add a feature to not let you use more storage than what is required for resiliency.
I had two options to remedy this. I could either add more storage which would also require the purchase of another node, or I could turn off replication. Each cluster was replicating to each other resulting in double the storage usage. With replication the RPO was 1 hour, but there were also backups which gave an RPO of 24 hours. An RPO of 24 hours was deemed acceptable so replication was disabled. The space freed up was not available instantly. Curator still needed to run background jobs to make the new storage available.
A lot of time users will just look at the CPU commitment ratio or the memory usage and forget about the storage. They are still thinking in the traditional 3 tier world. Like any technology you need to understand how everything works underneath. At the end of the day Architecture is what matters.