Recently I have had some discussions about using LACP and static etherchannel with VMware. The conversations have mainly revolved around how to get it setup, and what are the different use cases for it. The biggest question was about what exactly is the difference between the two. Are they the same thing with different names or are they actually different things?
Etherchannel and LACP are used to accomplish the same thing, but they both do it in a slightly different way. They are used to form a link-aggregation-groups (LAG) made of multiple physical links to connect networking devices together. This is needed to avoid getting a loop in the network, that is normally handled by using the Spanning Tree Protocol. So what is the real difference between the two? LACP has two modes. Active and passive, if one or both sides are set for active then they form a channel. With Etherchannel one side must be set for active and the other set for passive. Otherwise no channel will form. Seems fairly simple but…
The reason all of this matters is that the virtual switches with VMware cannot form a loop. So by setting up LACP or etherchannel you are just increasing your operational cost, and the complexity of the network. It requires greater coordination with the networking team to ensure that LACP or etherchannel are setup with the same exact settings. LACP and etherchannel offer different forms of load balancing. This is accomplished by using hashes based on things such as source IP, source MAC. There are quite a few options to choose from. Once the hash is created the packet is sent down a certain link determined by the hash that was generated.. This creates a constraint because now every packet is sent down that same link, and will keep using it until a link fails and it is forced to use another link. So it is possible that if 2 VM’s are communicating over a LAG all traffic could be going across just one link, and leaving the other links underutilized. The distributed switch and physical switch must be setup to use the same settings or a link will not be established. LACP is only available by using the Distributed switch which is only available with Enterprise Plus Licensing.
If you are able to use the Distributed switch it also supports Load Base Teaming. LBT is the only true load balancing method. It will send traffic across all links based on the actual utilization of the link. This is a far superior load balancing feature and if you are already paying for it you should be using it. There is also the myth that bonding two 10gb links will give you 20gb of throughput. As I discussed earlier the limitation is that vNIC can only utilize one link at a time. It cannot break up streams across two links for increased throughput. You can only really gain the throughput advantage with multiple VM’s utilizing them.
As a best practice you should always use trunk ports down to your hypervisor hosts, this allows the host to utilize multiple VLAN’s as opposed to placing the switch ports into access mode and allowing only one VLAN, customers who do this often end up re-configuring their network later on and its always a pain. I generaly recommend setting up each port on the physical switch in a standard trunk mode with all the VLAN’s that you need. Then on the virtual switch build out all of your portgroups and have the traffic tagged there with the VLAN needed for that portgroup. By doing this and using LBT you have a simple yet efficient design.
Now there is one caveat to all of this vSAN does not support LBT, but it does support LACP, and if you have vSAN you are licensed for the distributed switch. LACP has one advantage over LBT and that is the fail over time. This is the time it takes for a dead link to be detected and traffic sent to another link. LACP failover is faster than that of LBT, and this failover time could mean the difference between a failed write with vSAN. Which can limit any downtime, but with a production hopefully there will not be many links going offline.