NetApp OnCommand Insight

NetApp presented on its product OnComand Insight at Storage Field Day 16 this year.  What made the presentation unique from the rest of the presentations was that it was about an analytics and monitoring tool.  The only such presentation at the event.    OnCommand is an on premise appliance that can be setup as a VM in your environment.   Once it is fully deployed it will start reporting information about your environment.  Unlike other similar products it only reports on what it sees in your data center.  As opposed to comparing your environment to other environments.

255

OnCommand Insight is always watching your environment, and If an issues arises it can be setup to automatically generate a ticket and alert the proper team of the issue.  It supports Restful API so whatever needs to be done can be scripted out, and Licensing is done by the raw capacity.

They also spoke of the product Cloud Insights.  It is not a direct replacement for OnCommand, but takes many of its features and adds on top of it.  Cloud Insights is designed for the modern Hybrid data center. It can monitor both what is on premises and what is running in the cloud.    As more and more companies go hybrid it is imperative to have a tool that can monitor both and give recommendations on where to run a workload.

One of my favorite features is that is agnostic of what it monitors.  Monitoring is done via plugins and there is a large repository where you can download more.  It reminds me a lot of EMC ViPR SRM as it could monitor more than just EMC products, but NetApp has really gone a step further and its capabilities.

Take a look at the presentation from NetApp and the rest of the Storage Field Day 16 presentations here.

Etherchannel, LACP and VMware

Recently I have had some discussions about using LACP and static etherchannel with VMware.  The conversations have mainly revolved around how to get it setup, and what are the different use cases for it. The biggest question was about what exactly is the difference between the two.  Are they the same thing with different names or are they actually different things?

nirclecom_user_file_VR9EFwHQTILTpFyG4tnAFjWnKn6tVUGoSyBc

Etherchannel and LACP are used to accomplish the same thing, but they both do it in a slightly different way.  They are used to form a link-aggregation-groups (LAG) made of multiple physical links to connect networking devices together.  This is needed to avoid getting a loop in the network, that is normally handled by using the Spanning Tree Protocol.   So what is the real difference between the two?  LACP has two modes.  Active and passive, if one or both sides are set for active then they form a channel.   With Etherchannel one side must be set for active and the other set for passive.  Otherwise no channel will form.  Seems fairly simple but…

The reason all of this matters is that the virtual switches with VMware cannot form a loop.  So by setting up LACP or etherchannel you are just increasing your operational cost, and the complexity of the network.  It requires greater coordination with the networking team to ensure that LACP or etherchannel are setup with the same exact settings.   LACP and etherchannel offer different forms of load balancing.  This is accomplished by using hashes based on things such as source IP, source MAC. There are quite a few options to choose from.  Once the hash is created the packet is sent down a certain link determined by the hash that was generated..  This creates a constraint because now every packet is sent down that same link, and will keep using it until a link fails and it is forced to use another link.  So it is possible that if 2 VM’s are communicating over a LAG all traffic could be going across just one link, and leaving the other links underutilized.  The distributed switch and physical switch must be setup to use the same settings or a link will not be established. LACP is only available by using the Distributed switch which is only available with Enterprise Plus Licensing.

If you are able to use the Distributed switch it also supports Load Base Teaming.  LBT is the only true load balancing method.  It will send traffic across all links based on the actual utilization of the link.  This is a far superior load balancing feature and if you are already paying for it you should be using it.  There is also the myth that bonding two 10gb links will give you 20gb of throughput.  As I discussed earlier the limitation is that vNIC can only utilize one link at a time.  It cannot break up streams across two links for increased throughput.  You can only really gain the throughput advantage with multiple VM’s utilizing them.

download (2)

As a best practice you should always use trunk ports down to your hypervisor hosts, this allows the host to utilize multiple VLAN’s as opposed to placing the switch ports into access mode and allowing only one VLAN, customers who do this often end up re-configuring their network later on and its always a pain. I generaly recommend setting up each port on the physical switch in a standard trunk mode with all the VLAN’s that you need.  Then on the virtual switch build out all of your portgroups and have the traffic tagged there with the VLAN needed for that portgroup.  By doing this and using LBT you have a simple yet efficient design.

Now there is one caveat to all of this  vSAN does not support LBT, but it does support LACP, and if you have vSAN you are licensed for the distributed switch.  LACP has one advantage over LBT and that is the fail over time.  This is the time it takes for a dead link to be detected and traffic sent to another link. LACP failover is faster than that of LBT, and this failover time could mean the difference between a failed write with vSAN.  Which can limit any downtime, but with a production hopefully there will not be many links going offline.

VMworld 2018!!!

It is finally that time of year.  The greatest time of year. It is time for VMworld!!!  August 26-30 is the the time where everyone packs up and spends a week in Las Vegas with some of the greatest minds in Virtualization.

download

VMworld is a great opportunity to learn about some of the latest technology in the industry.  The show floor will be backed with tons of vendors.  Some you have heard of and some that you haven’t.  You may find that vendor that has just the solution that you have been looking for.  All the vendors will have lots of information about the various products and solutions that they offer.  It is a great idea to talk to as many as you can.  Always a great opportunity to learn something new, and they usually have some great prize and swag!

The sessions will be excellent as always presented by some of the smartest people you have ever met.  You can take a look of all the sessions here.  If you can’t make it to VMworld they will post most of the sessions on Youtube shortly after.

They will also be offering training sessions on the various VMware products, and if you ready for it you can take one of the certification tests.  Maybe finally get that VCP or VCAP that you have been working on.

The best part of all of this is the networking, and the lifelong friends you will make.  Through VMworld and various other social events I met many great people and friends.  It is a great community to be a part of, and I hope this year I will be able to meet up with as many people I can at the various events.

download (1)

Storage Field Day 16

The past six months have brought a lot of changes in my life.  I have been busy changing jobs, and relocating to another state.  With the recent addition of my second child I can easily say I have been really busy.  All my time has taken up with the relocation and kids. Which has not given me much free time to do anything, especially writing. great-im-finally-home-from-work-aaaand-its-bedtime

Thankfully all that is starting to change.  Now that I am getting settled in to my new house, and the kids are getting a little older I am beginning to have a little more free time. With this new found free time I plan on getting back into writing.  With it I have met a lot of great people, and been given some great opportunities.

SFD-Logo-500x499

Thankfully I have been given great inspiration to kick start my writing off.  I have been selected as a delegate for Storage Field Day 16 in Boston, which is a city I have never been to and I am excited to finally visit..  It is truly an honor to be a part of Storage Field Day.  Not only do I get to see of a lot of great presentations from companies that I am already familiar with such as DELLEMC, Infindat and Zerto.  I will also be introduced to some new ones such as Nasuni and Storone.  The best part of the whole experience is being able to meet the fellow delegates who are some of the smartest people in IT.

You can catch all the action on June 27-28, and you may even catch me on camera.  Watch for updates on this site, and live tweets from me the day of the event.  Should be a lot of interesting content coming out.  For up to date information on companies and delegates take a look at http://techfieldday.com/event/sfd16/.

Stretched vSAN Cluster on Ravello

Stretched clustering has been something that I have wanted to set up for my home lab for a while, but it would not be feasible with my current hardware.  Recently I was selected to be a part of the vExpert program for the third year.  One of the perks of this is the use of Ravello cloud.  They have recently made a lot of advancements that has greatly increased the performance.  Now they have also added a bare metal option which which makes the performance even greater.  I am skipping most of the steps to setup vSAN, and trying to only include what is different for a stretched cluster.

The high level architecture of a stretched vSAN cluster is simple.

21640548292_faf47a713e_o

  • Two physically separated clusters.  This is accomplished using Ravello Availability grouping.
  • A vCenter to manage it all.
  • External witness.  This is needed for the quorum.  Which allows for an entire site to fail with it and the vm’s to fail over.
  • Less than 5ms latency between the two site.  This is needed because all writes need to be acknowledged at the second site.
  • 200ms RTT max latency between clusters and witness.

If this was a production setup there would be a few things to keep in mind.

  • All writes will need to be acknowledged at second site.  So that could be an added 5ms of latency for all writes.
  • You can use layer 2 and 3 networks between the clusters.  You would want at least 10gb for the connection between sites.
  • You can use layer 2 and 3 networks with at least 100mbs for the witness.

Deploying on Ravello

blueprint

For the architecture of this deployment we will need 3 sections

  • Management
  • Cluster Group 1 (Availability groups simulate separate data center)
  • Cluster Group 2 (Availability groups simulate separate data center)
  • vSAN network and Management/Data Network

Management

There needs to be a DNS server and a vCenter.  I used Server 2016 to setup both the DNS server and Domain Controller.  I used the vCenter appliance 6.5 which I then deployed to an separate mangement ESXi hosts.

Cluster Groups

These consist of 2 ESXi 6.5 hosts each.  They use Availability Groups to keep them physically separated to simulate the stretched cluster.  Group 1 used AG1 and Group 2 used AG2

AG

Network

 

I manually setup the DNS entries on the Server 2016 DNS, and the two networks consists of the following.

  • 10.0.0.0/16 Data/Management
  • 10.10.0.0/16 vSAN

Witness

The witness is an easy to deploy OVF.  It creates a nested ESXi host that runs on top of a physical host.  The networking consists of the following

  • vmk0 Management Traffic
  • vmk1 vSAN Traffic

Once the OVF is deployed add the new witness host into vCenter.  You will see it in vCenter as a blue ESXi host.

4

Creating the Cluster

Now that every is setup and online it is time to create the cluster.  All four hosts need to be in one cluster in vCenter.  Go to the cluster settings and start the setup of vSAN.  Choose configure stretched cluster.

stretched cluster

Now break out the two fault domains to correspond to the availability groups setup on Ravello

1

After the disk are claimed you now have a stretched vSAN cluster that provides high availability across two data centers.  One cluster or one node can go down, and your VM’s can fail over and keep on running.

 

vExpert 2018

Last week I was honored with being chosen to be a part of the elite group of VMware vExpert program.  This group is made of individuals who are active in the virtualization community.  This will make it the third year I have been chosen to be a part.  What makes this so great is being a part of the community and the networking that is brings. From the vExpert Slack channel I have learned a lot by talking to my peers.  Anytime I have had a question there was someone there to help out.  I have met many people, and became close friends with some of them.

Thank you everyone for making this community so great, and I hope to see everyone at VMworld this year and Nutanix .NEXT!

ad-vexpert-3stars-1

vSphere 6.5 Update 1 is the Update You’ve Been…

vSphere 6.5 Update 1 is the Update You’ve Been Looking For!

vSphere 6.5 Update 1 is the Update You’ve Been…

With this update release, VMware builds upon the already robust industry-leading virtualization platform and further improves the IT experience for its customers. vSphere 6.5 has now been running in production environments for over 8 months and many of the discovered issues have been fixed in patches and subsequently rolled into this release.


VMware Social Media Advocacy

ESXi 6.0 to 6.5 Upgrade Failed

The Problem

I am currently running vCenter 6.5 with a mix of 6.0 and 6.5 clusters.  I uploaded the latest Dell customized ESXi 6.5 image to update manager, and had no issues updating my first cluster from 6.0 to 6.5.  In the past I have had some weird issues with update manager, but since 6.5 was integrated into vCenter it has been a lot more stable.  I then proceeded to migrate the next cluster to 6.5 and received this weird error.

2

I then tried to mount the ISO to the host and install it that way, but now I get a much more detailed error.

3

The Solution

  1.  SSH into the host and run the following command to see list of installed VIB’s

esxcli software vib list

2. Remove the conflicting VIB.

esxcli software vib remove –vibname=scsi-mpt3sas

3. Reboot!

Now that the conflicting VIB has been removed you can proceed with installing the updates.

 

 

 

 

 

 

How To Setup A Nutanix Storage Container

Nutanix storage uses Storage Pool and Storage Container.  The Storage Pool is the aggregated disks of all or some of the nodes..  You can create multiple Storage Pools depending on the business needs, but Nutanix recommends 1 Storage Pool.  Within the Storage Pool are Storage Containers.  With these containers there are different data reduction settings that can setup to get the optimal data reduction and performance that is needed.

Creating The Container

1

Once the cluster is setup with a Storage Pool created we are ready to create a Storage Container.

  1. Name the Container
  2. Select Storage Pool
  3. Choose which hosts to add.

That is all looks really simple until the advanced button is clicked.  This is where the Geek Knobs be tweaked.

2.png

Advanced Settings

There are quite a few options to choose from, and each setting depends on the different use cases.

  1. Replication Factor –  2 copies of  data in the cluster or 3.  Depending on the use case.
  2. Reserved Capacity – How much guaranteed storage that is needed to be reserved for this container.  All the Containers share storage with the Storage Pool so this is used to guarantee the capacity is always available.
  3. Advertised Capacity – How much storage the connected hosts will see.  This can be use this to control actual usage on the Container side.  To allow
  4. Compression – A setting of 0 will result in inline compression.  This can be set to a higher number for desired performance.
  5. Deduplication – Cache deduplication can be used to optimize performance and use less storage.  Capitcity deduplication will deduplicate all data globally across the cluster.  Deduplication is only post-process, and if enabled after a Container is created then only new writes will be deduplicated.
  6. Erasure Coding – Requires at least 4 nodes.  It is a more efficient than the simple replication factor.  Instead of copies of data it uses parity to be able to rebuild anything.  Enabling this setting will result in some performance impact.

Summary

As you can see there can be a lot of impact in performance depending on the settings that you choose.  As always Architecture matters, and you will have to evaluate the needs that your workload has, and  better understanding on how everything works results in a better performing system.

 

Blog at WordPress.com.

Up ↑