Strange Issues With Microsoft Clustering and ESXi

I have some legacy applications that require Microsoft Clustering which are running on ESXi 6.0.  Using Microsoft Clustering on top of VMware does not give you many benefits.  Things like HA and moving workloads across nodes is already available using virtualization.  What clustering does do is create more places for things to break and give you downtime.  Really the only benefit I see with clustering in a virtualized environment is the ability to restart a server for system updates.

RDM’s are required for using Microsoft Clustering.  RDM “Raw Device Map” gives the VM control of the LUN such as it was directly connected to it. To set this up you need to add a second SCSI controller and set it to physical mode.  Each disk must then share the same SCSI controller settings for every VM in the cluster. The negative side to doing this is that you lose such features as snapshot and vmotion.  When using RDM’s with physical mode you should treat those VM’s as if they were physical hosts.

12

The problem occurred when one of the clustered nodes was rebooted.  The node never came back online, and when checking the console it looked like the Windows OS was gone.  Powered off the VM and removed the mapped RDM’s.  When powering on the VM Windows booted up fine.  I Found that very strange so powered it off again and added the drives back.  That is when  I got the error invalid device backing.  VMware KB references the issue, and it basically says there is an issue with inconsistent LUN’s The only problem was I did have have consistent LUN’s.  I put in a ticket with GSS, and the first level support was not able to help.  They had to get a storage expert to help out. He quickly found this issue which was the LUN ID had changed. I am not sure how that occurred, but it was not anything I could change  When adding the drives in the VM’s the config it makes a mapping from the VM to the LUN.  When the LUN ID changed the mapping did not.  The only fix was to remove the RDM’s from all VM’s in that cluster and then add them back.

Weathervane, a benchmarking tool for…

Weathervane, a benchmarking tool for virtualized infrastructure and clouds – now open source!

Weathervane, a benchmarking tool for…

Weathervane is a performance benchmarking tool developed at VMware. It lets you assess the performance of your virtualized or cloud environment by driving a load against a realistic application and capturing relevant performance metrics. You might use it to compare the performance characteristics of two different environments, or to understand the performance impact of some change in an existing environment.


VMware Social Media Advocacy

ECC Errors On Nutanix

When logging into a Nutanix cluster I see that I have 2 critical alerts.

1

With a quick search I found KB KB 3357 I SSH into one of the CVM’s running on my cluster, and ran the following command as one line.

ncc health_checks hardware_checks ipmi_checks ipmi_sel_correctable_ecc_errors_check

Looking over the output I quickly found this line.

3

I forwared all the information to support, and will replace the faulty memory module when it arrives.  Luckly so far I have not seen and issues from this memory issue, and I really liked how quick and easy it was to resolve this issue using Nutanix.

vCenter Fails after Time Zone Change

We recently changed our NTP server, and I needed to update all or hosts and vCenters.  I have a handy powershell script to update the ESXi hosts, but that script does not work on the vCenter servers.  I log into the server using port 5480 to gain access to the vCenter Management. I login as root and notice that the time zone is UTC.  I am in the Central time zone so I wanted to change it from UTC.  Turns out if you do that it break everything.  I had to learn this the hard way, and once I changed the time zone I was not able to log into vCenter.  I had to then go back and change the time zone back to UTC to regain access. Capture.

New Fling: VMware vSphere Compatibility Predictor

New Fling: VMware vSphere Compatibility Predictor

New Fling: VMware vSphere Compatibility Predictor

This Fling scans all PSCs connected to a single PSC. It will detect the versions of all the vCenter Servers connected to PSCs and all the Solutions connected to vCenter Servers. It will then depict the connectivity in pictorial form.


VMware Social Media Advocacy

VMworld 2017 Call for Papers is now open!

Inspire the VMware community, VMworld 2017 Call for Papers is now open!

VMworld 2017 Call for Papers is now open!

We’re looking for speakers who will inspire the VMware community. Have you integrated VMware solutions and technologies in an innovative way? Do you have a best practice or individual technical tips and tricks to recommend? Can you tell us about an amazing app that leverages VMware solutions to improve your business? If so, please join us at VMworld 2017.


VMware Social Media Advocacy

After SQL Installation Drives Inaccessible

Recently I setup a VM using standard best practices.  Such as using PVSCSI for the Database drives, and formatting them GPT.  Everything checked out so I passed it on to the DBA to finish up with the SQL installation.  Soon later I received an email from him saying the drives were inaccessible.  I had no idea why because I had setup a lot of VM’s the same way without any issues.  I did a quick Google search for “drive inaccessible after SQL installation”.  It turns out a lot of people have the same issue as me.  It is the result of Microsoft locking down the security for the data on the drives.  They set it up so only SQL could have access to the files to prevent certain security issues.  You can read more about it here.