ESXi 6.0 to 6.5 Upgrade Failed

The Problem

I am currently running vCenter 6.5 with a mix of 6.0 and 6.5 clusters.  I uploaded the latest Dell customized ESXi 6.5 image to update manager, and had no issues updating my first cluster from 6.0 to 6.5.  In the past I have had some weird issues with update manager, but since 6.5 was integrated into vCenter it has been a lot more stable.  I then proceeded to migrate the next cluster to 6.5 and received this weird error.

2

I then tried to mount the ISO to the host and install it that way, but now I get a much more detailed error.

3

The Solution

  1.  SSH into the host and run the following command to see list of installed VIB’s

esxcli software vib list

2. Remove the conflicting VIB.

esxcli software vib remove –vibname=scsi-mpt3sas

3. Reboot!

Now that the conflicting VIB has been removed you can proceed with installing the updates.

 

 

 

 

 

 

Strange Issues With Microsoft Clustering and ESXi

I have some legacy applications that require Microsoft Clustering which are running on ESXi 6.0.  Using Microsoft Clustering on top of VMware does not give you many benefits.  Things like HA and moving workloads across nodes is already available using virtualization.  What clustering does do is create more places for things to break and give you downtime.  Really the only benefit I see with clustering in a virtualized environment is the ability to restart a server for system updates.

RDM’s are required for using Microsoft Clustering.  RDM “Raw Device Map” gives the VM control of the LUN such as it was directly connected to it. To set this up you need to add a second SCSI controller and set it to physical mode.  Each disk must then share the same SCSI controller settings for every VM in the cluster. The negative side to doing this is that you lose such features as snapshot and vmotion.  When using RDM’s with physical mode you should treat those VM’s as if they were physical hosts.

12

The problem occurred when one of the clustered nodes was rebooted.  The node never came back online, and when checking the console it looked like the Windows OS was gone.  Powered off the VM and removed the mapped RDM’s.  When powering on the VM Windows booted up fine.  I Found that very strange so powered it off again and added the drives back.  That is when  I got the error invalid device backing.  VMware KB references the issue, and it basically says there is an issue with inconsistent LUN’s The only problem was I did have have consistent LUN’s.  I put in a ticket with GSS, and the first level support was not able to help.  They had to get a storage expert to help out. He quickly found this issue which was the LUN ID had changed. I am not sure how that occurred, but it was not anything I could change  When adding the drives in the VM’s the config it makes a mapping from the VM to the LUN.  When the LUN ID changed the mapping did not.  The only fix was to remove the RDM’s from all VM’s in that cluster and then add them back.

ECC Errors On Nutanix

When logging into a Nutanix cluster I see that I have 2 critical alerts.

1

With a quick search I found KB KB 3357 I SSH into one of the CVM’s running on my cluster, and ran the following command as one line.

ncc health_checks hardware_checks ipmi_checks ipmi_sel_correctable_ecc_errors_check

Looking over the output I quickly found this line.

3

I forwared all the information to support, and will replace the faulty memory module when it arrives.  Luckly so far I have not seen and issues from this memory issue, and I really liked how quick and easy it was to resolve this issue using Nutanix.

vCenter Fails after Time Zone Change

We recently changed our NTP server, and I needed to update all or hosts and vCenters.  I have a handy powershell script to update the ESXi hosts, but that script does not work on the vCenter servers.  I log into the server using port 5480 to gain access to the vCenter Management. I login as root and notice that the time zone is UTC.  I am in the Central time zone so I wanted to change it from UTC.  Turns out if you do that it break everything.  I had to learn this the hard way, and once I changed the time zone I was not able to log into vCenter.  I had to then go back and change the time zone back to UTC to regain access. Capture.

After SQL Installation Drives Inaccessible

Recently I setup a VM using standard best practices.  Such as using PVSCSI for the Database drives, and formatting them GPT.  Everything checked out so I passed it on to the DBA to finish up with the SQL installation.  Soon later I received an email from him saying the drives were inaccessible.  I had no idea why because I had setup a lot of VM’s the same way without any issues.  I did a quick Google search for “drive inaccessible after SQL installation”.  It turns out a lot of people have the same issue as me.  It is the result of Microsoft locking down the security for the data on the drives.  They set it up so only SQL could have access to the files to prevent certain security issues.  You can read more about it here.

Only Default Printer Mapping Over With View 6.2

I recently had an issue with only the default printer being mapped over from the local Windows 7 PC to the View Client session.  It did not make any sense to me.  I had 10 printers mapped yet only 1 was showing up.  It turns out its a limitation of Windows 7.  If all the printers are using the same Driver and Port then you will only see the default printer under the Devices and Printers page.  If you right click that printer it will list all the printers you have mapped.  When you try to print something it will also list all your printers.

VMAX3 and Locked Devices

I have an application that writes to its Device then  inside the array that Device is mirrored to an BCV.  When it is time for a backup a script will kick off that will break the mirror.
Then the backup application reads the data from that mirrored BCV.  Sometimes things will go wrong and that script will never resync the mirror.  That then results in broken backups.  To fix this I had to go into the VMAX3 SYMCLI and run these few commands.

  1.  symdev -lock list
  2. symdev release “Then confirm yes for each lock”

Looks really simple doesn’t it?  It took me a while to figure out.  It is not often I have to touch the VMAX and when I do it is usually with the GUI.  I also have a thick book of all the long complicated SYMCLI commands.  Thankfully through trial and error it ended up just being two simple commands.

Blog at WordPress.com.

Up ↑