Last week I was facing a serious issue in my home lab where my esxi host is getting disconnected from my vCenter Server randomly. Whenever I am doing any configuration changes like enabling ssh or creating a new vSwitch the host got disconnected immediately. I was damn frustrated and was looking for a solution because it was very hard for me to work.
So I started troubleshooting by going through my vCenter log files and found following:
2015-03-16T23:06:11.270+05:30 [06304 info 'vpxdvpxdMoHost' opID=BADE9DBF-0000007B-b1] [HostMo] host connection state changed to [CONNECTED] for host-35
2015-03-16T23:06:11.273+05:30 [06304 info 'vpxdvpxdMoHost' opID=BADE9DBF-0000007B-b1] [HostMo::SetComputeCompatibilityDirty] Marked host-35 as dirty.
2015-03-16T23:02:09.628+05:30 [04380 info 'vpxdvpxdHostCnx' opID=SWI-7e4a49e9] [VpxdHostCnx] No heartbeats received from host 5294adb1-584a-2f13-8987-7b52ed31c84b within 120665000 microseconds
2015-03-16T23:02:09.628+05:30 [09928 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-35
2015-03-16T23:02:09.629+05:30 [05548 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-46 -- host-35 -- VpxdInvtHostSyncHostLRO.Synchronize --
2015-03-16T23:02:09.629+05:30 [05548 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-35
2015-03-16T23:02:09.629+05:30 [05548 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Attempting to fix not responding host host-35
2015-03-16T23:02:10.052+05:30 [05548 info 'vpxdvpxdHostAccess'] Got VpxaCnxInfo over SOAP version vpxapi.version.version9 for host megatron.alex.local
2015-03-16T23:06:32.368+05:30 [07760 warning 'Default'] Failed to connect socket; <io_obj p:0x000000000a6fa038, h:3300, <TCP '0.0.0.0:0'>, <TCP '[::1]:32010'>>, e: system:10061(No connection could be made because the target machine actively refused it)
2015-03-16T23:06:33.369+05:30 [07760 warning 'Proxy Req 00047'] Connection to localhost:32010 failed with error class Vmacore::SystemException(No connection could be made because the target machine actively refused it).
So I guess something wrong was happening related to heartbeat exchange between my host and vCenter server. I started my troubleshooting by following below steps:
1: Checked whether Esxi is able to reach my vCenter server or not by pinging and doing a telnet from Esxi host to vCenter Server on port 902
Note: Telnet command wont work in Esxi so you have to use “nc -z” command
So as you can see I was able to reach my vCenter from my Esxi host successfully.
2: Next I checked whether or not my Esxi host is listening on port 902 (heartbeat port)
The above command verified yes my host is listening on port 902
4: I added the host disconnection timeout string in Advance Settings of vCenter and increased the value to 120
I verified once again that value has been added.
4: Next I check my vCenter Server for “Managed IP Setting”. Sometimes if the vCenter IP is not listed then also you can face this issue.
In my case I manually entered IP under Run Time Settings as shown in above image.
5: I checked the same settings on my Esxi host.
So from above image it is pretty clear that my Esxi host is configured to managed by correct vCenter server.
6: Next I checked for Heartbeat Port Value on my Esxi host by running the command:
# grep -i serverport /etc/vmware/vpxa/vpxa.cfg
The output which I got was something strange as my Esxi host was using port 922 for heartbeats exchange instead of using default port 902.
According to VMware KB Article 2040630
This issue is caused by dropped, blocked, or lost heartbeat packets between the vCenter Server and the ESXi/ESX host. If there is an incorrect configuration of the vCenter Server managed IP address, the host receives the heartbeat from vCenter Server but cannot return it.
It is important to remember that the default heartbeat port is UDP 902, and these packets must be sent between vCenter Server and the ESXi/ESX host for the host to stay connected and remain in the vCenter Server inventory.
I changed the port to 902 by editing the vpxa.cfg file and removed and added back my Esxi host to vCenter Server and hoped that my issue is now resolved. But surprisingly I was still getting the disconnection problem. Once again I connected my Esxi host using ssh and checked vpxa.cfg file and found the port has been again changed to 922. This was strange.
On digging more I found that this is happening because of heartbeat port specified as 922 in the registry key of vCenter server. I got this clue from one of the issue 2437489 posted in VMware Community group.
The full registry key is :
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter
As you can see in above image the heartbeat port is 922 which is causing all the troubles. I changed it to 902 and restarted my vCenter Server and bingo my issue is resolved.
Hit Like and share on social media if above information is helpful to you. Happy Learning!!!