vSwitch NIC Teaming and Network Failure Detection Policies

What is NIC Teaming and why you need it?

Uplinks is what provides conenctivity between a vSwitch and a physical  switch. This uplinks passes all the traffic generated by virtual machines or the vmkernel adapters. 

But what happens when that physical network adapter fails, when the cable connecting that uplink to the physical network fails, or the upstream physical switch to which that uplink is connected fails? With a single uplink, network connectivity to the entire vSwitch and all of its ports or port groups is lost. This is where NIC teaming comes in.

NIC teaming means that we are taking multiple physical NICs on a given ESXi host and combining them into a single logical link that provides bandwidth aggregation and redundancy to a vSwitch. NIC teaming can be used to distribute load among the available uplinks of the team.

Below diagram illustrates vSwitch conenctivity to physical world using 2 uplinks.

nt-1.PNG

And NIC teaming configuration for this vSwitch looks like as shown below:

nt-2.PNG

Building a functional NIC team requires that all uplinks be connected to physical switches in the same broadcast domain. If VLANs are used, all the switches should be configured for VLAN trunking, and the appropriate subset of VLANs must be allowed across the VLAN trunk.

After a NIC team is established for a vSwitch, ESXi can then perform load balancing for that vSwitch. The load-balancing feature of NIC teams on a vSwitch applies only to the outbound traffic. The inbound traffic arrives on whatever uplink the upstream switch decided to put it on, and the vSwitch is only responsible for making sure it reaches its destination.

NIC teams on a vSwitch can be configured with one of the following four load-balancing policies:

1: Route Based on Originating virtual Port-ID: This is the default load balancing policy for a vSS or vDS. This policy doesn’t require any special configuration to be done at virtual switch level or physical switch level.

In this policy when a NIC is added to a VM or a new VM is provisioned with a NIC and comes online, VMkernel assigns a Port-ID to the virtual NIC of the VM. The outgoing traffic from the VM NIC will be routed through which uplink of the team, is determined by vSwitch. Each port on the vSwitch is “hard wired” to a particular pNIC. When a VM is initially powered on, the vNIC of VM will be dynamically connected to the next available vSwitch port. 

At a given time a VM NIC can use only one uplink to send out its traffic. In case of failure of the uplink the traffic of that VM NIC is rerouted (failed over) among one of the available uplink of the team. The selected uplink for a VM NIC can be changed if a VM changes its power state or is migrated using vMotion

Although this is one of the simplest load balancing technque to configure, it has some downside too and is explained below:

This technique focusses on approximately equal distribution of vNICs to pNICs. This LB technique do not attempt to distribute vNICs based on utilization. One side effect of such a distribution technique can be “rough balancing of vNICs across all available pNICs in the vSwitch” amd it is entirely possible that you could end up with all heavily utilized VM on one pNIC and the less utilized VMs on another pNICs.

2:  Route Based on Source MAC hash: This policy is similar to Route based on originating Port ID but with the difference that vSwitch uses the MAC address of the VM NIC to determine the uplink which will be responsible for taking outgoing traffic of that VM NIC.

To determine which vNIC will be mapped to which pNIC, this LB technique uses the least significant byte (LSB) of the source MAC address of the vNIC modulo the number of active pNICs in the vSwitch to derive an index into the pNIC array. 

vmnic used = HEX (VM's vNIC MAC Address) mod (Number of vmnics) 

Lets understand this with a example (taken from MYLES article)

A host has 4 physical pNICs, and there are 6 vNIC’s coming out of 3 VM’s (2 per VM). Suppose the vNIC’s have following Mac address generated by VMkernel.

nt-3

Convert the hexadecimal values to Base10:

nt-4.PNG

Run a modulus of number of NICs against it:

nt-5

At the end, this is how vNIC’s will be mapped tp pNIC’s

nt-6

In this policy, a VM NIC can be assigned only one uplink to send traffic out at a given time but failover is supported in case that uplinks fails. This policy is available in both vSS and vDS. 

Also downside of this that vNICs are mapped to pNIC’s based on the mac address and load is not balanced on based of actual load on pNIC’s.

3: Route Based on IP Hash: This is the only load balancing policy in which a VM NIC can send out traffic through more than one uplink at a given time. This policy requires a special configuration i.e. Ether-Channel or Port-Channel to be configured on physical switch. 

IP Hash assigns uplinks to vNIC’s based on an IP “conversation” and then creating a hash between the source and destination IP address in an IP packet The formula used by this LB technique is

LSB(SrcIP) xor LSB(DestIP)) mod (# pNICs)

It takes an exclusive OR of the Least Significant Byte (LSB) of the source and destination IP addresses and then compute the modulo over the number of pNICs. 

When selecting IP Hash technique, the configuration should be made on vSwitch and should not be overrideden at the Port Group level.

For this technique to work, your physical switch must have support for 802.3ad static link aggregation. Also the vSwitch does not support the use of dynamic link aggregation protocols. Additionally, you’ll want to disable Spanning Tree protocol and enable portfast and trunkfast on the physical Switch ports.

nt-7.png

Thomas Low has written an excellent article to demostrate how this load balancing technique works. 

There is one caveat in this policy. A VM NIC can utilize more than one uplink to send outgoing traffic when it is communicating with more than one destination (IP). If a VM is doing one to one communication i.e. communicating with only one destination IP, traffic will not be shared among the uplinks and only one of the uplink will be used to send the traffic out.

4: Route Based on Physical NIC Load: This load balancing policy is only available with vDS and by far is the most intelligent policy to distribute load among the uplinks in a teamed environment.

The assignment of uplinks to VM NIC’s is based on the originating Port-ID itself but before assigning any uplink vDS looks at the load on the physical adapters. The adapter which is least loaded will be assigned to the VM NIC for sending out traffic. If an adapter which was previously less utilized but suddenly becomes busy due to a heavy network activity on a VM NIC, then that VM NIC will be moved to a different physical adapter so as to keep balance among all uplinks as best as possible.

This load balancing policy use an algorithm to perform a regular inspection of load on the Physical NIC’s every 30 seconds. When the utilization of Particular physical uplink exceeds 75% over 30 seconds, the hypervisor will move VM’s traffic to another uplink adapter. This load balancing doesn’t require any additional configuration at the physical switch level.

load-based-on-physical-nic-load-1.jpg

                               Graphic Thanks to VMwareArena.Com

Magnus Anderson has demostrated this LB technique in action in this article. I also loved this blogpost from Adam Wisowaty on this topic.

Use explicit failover order: This policy really doesn’t do any sort of load balancing. Instead, the first Active NIC on the list is used to route the outgoing traffic for all VM’s. If that one fails, the next Active NIC on the list is used, and so on, until you reach the Standby NICs.

Note: With Explicit Failover option if you have a vSwitch with many uplinks, only one of the uplink will be actively used at any given time.

Network Failure Detection

When an uplink or the upstream physical switch to which uplink is connected, fails, the vSwitch is notified about it via 2 methods:

1: Link status: The link status failover-detection method works just as the name suggests. The link status of the physical network adapter identifies the failure of an uplink. In this case, failure is identified for events like removed cables or power failures on a physical switch.

The downside to the setting for link status failover-detection is its inability to identify misconfigurations or pulled cables that connect the switch to other networking devices (for example, a cable connecting one switch to an upstream switch.)

2: Beacon probing: The beacon-probing failover-detection setting sends Ethernet broadcast frames across all physical network adapters in the NIC team. These broadcast frames allow the vSwitch to detect upstream network connection failures and force failover when failures have occured in network. 

When a beacon is not returned on a physical network adapter, the vSwitch triggers the failover notice and reroutes the traffic from the failed network adapter through another available network adapter based on the failover policy.

Lets understand this with help of below diagram.

Here we have a vSwitch with 3 uplinks. When uplink 1 sends out a beacon that uplink 2 recieves but uplink 3 does not, this is because the upstream aggregatin switch 2 is down and tus traffic is unable to reach to uplink 3.

nt-8.PNG

For beacon probing to work correctly, you need to have 3 uplinks. You may ask why? The answer is if you had only 2 uplinks and the beacon sent out by one is not heard by other then how will you determine which uplink is at fault. Does the sender uplink have issue or the recieving uplink. 

Notify Switches and Failback

By default, When a virtual adapter is reconnected to a new path due to a path failure, it will notify the physical switch. One use case where this should be changed is when Microsoft Network Load Balancing (NLB) is used in a unicast mode. But why you need to notify physical switch about recovery of an adapter? The answer is explained as below:

Physical switch maintains MAC address table that they are used to map ports to MAC addresses.  This avoids the need to flood their ports—which means sending frames to all ports except the port they arrived on (which is the required action when a frame’s destination MAC address doesn’t appear in the switch’s MAC address table).

When one of the uplinks in a vSwitch fails and all of the VMs begin using a new uplink, the upstream physical switch have no idea which port the VM is now using and would have to resort to flooding the ports or wait for the VM to send some traffic so it can re-learn the new port.

The Notify Switches option speeds things along by sending Reverse Address Resolution Protocol (RARP) frames to the upstream physical switch on behalf of the VM or VMs so that upstream switch updates its MAC address table. This is all done before frames start arriving from the newly vMotioned VM, the newly powered-on VM, or from the VMs that are behind the uplink port that failed and was replaced.

These RARP announcements are just a fancy way of saying that the “Esxi host is shouting to the upstream physical switch and saying, “Hey! This VM is over here now!”

Failback

If you have a Standby NIC in your NIC Team, it will become Active if there are no more Active NICs in the team. When the problem with the failed Active NIC is fixed, the failback setting determines if the previously failed Active NIC should now be returned to Active duty.

If you set this value to Yes, the now-operational NIC will immediately go back to being
Active again, and the Standby NIC returns to being Standby. Things are returned back to
the way they were before the failure.

If you choose the No value, the replaced NIC will simply remain inactive until either
another NIC fails or you return it to Active status.

We have touched down on almost all concepts that are used in NIC Teaming. To explore more about this topic, I will suggest reading below articles written by awesome people.

Some excellent reads on Load Balancing

The Great vSwitch Debate

Load Balancing and Teaming, the Math

MAC hash based LB Deep Dive

Load Balancing Test: Route based on IP hash

Load Based Teaming In Action

I hope you find this post informational. Feel free to share this on social media if it is worth sharing. Be sociable 🙂