Learning vSphere 6.5-Part-10-VCHA failover testing

Last 2 post of this series were revolving around the high availability feature for vCenter that is introduced in vSphere 6.5 and we discussed the VCHA architecture and also learnt how to configure VCHA.

In this post we will be testing the HA feature and will see what happens when the Active Node of VCHA cluster goes down.

If you have missed earlier post of this series, you can read them from below links:

1: Installing and Configuring Esxi

2: VCSA Overview

3: vCenter Server and PSC Deployment Types

4: System Requirements for Installing vCenter Server

5: Installing vCenter Server on Windows

6: Deploying vCSA with embedded PSC

7: Deploying External PSC for vCSA

8: Understanding vCenter Server High Availability (VCHA)

9: Configuring vCenter Server High Availability (VCHA)

Lets jump into lab and test this awesome feature.

We will be testing failover via 2 method:

  • Automated failover (let system do the magic)
  • Manual failover (user will intentionally bring down active node of VCHA cluster)

Automated Failover Testing

1: To test the failover, login to vCenter web client and navigate to Configuration > vCenter HA and before performing a failover look at the Active/Passive node info and note which IP is active at the moment.

To start failover, hit the Initiate Failover button at top right corner.

haf-1

2: System will ask you that if you want to initiate the failover process and if you want to start failover immediately. If you check this box, then the recent DB changes from Active to Passive will not be replicated to Passive node.

I think it should be best to let database commit the running transactions to the passive node and because of that I chose not to check mark the immediate failover option.

haf-2

 

3: Once failover is initiated and if you try access IP/FQDN of active node you will see below screen

haf-3

4: Once the failover is completed, you will notice that the previous passive node now had become active. Compare this screenshot with screenshot shown in step 1 and you will notice the difference in IP for the active node.

haf-4

5: If you click on HA monitoring tab, system will report you that all nodes are up and running and overall health status of VCHA cluster is good and application state/ DB replication etc are all in place and working fine.

So what happens in automated failover testing is that the Active node is forced to fail by system so that the Passive node will become Active and the active node will become passive once it is recovered from failed state (recovery is done by system itself)

haf-5

2: Manual Failover Testing

A: To perform a manual failover testing, lets power off the Active node intentionally.

haf-6

B: After few seconds of powering off the Active node, if you try to access the vCSA IP you will see message about “Failover in Progress”

haf-3

c: Once the failover is completed and vSphere web client allow you to login again, you will observe that  health status of VCHA cluster is deteriorated and now you have a new active node and the previous active node had become passive and is currently down (because we have not powered on the node yet)

haf-7

D: Also if you go to Monitoring tab for VCHA, you will see that system is reporting that VCHA cluster has lost one of its node and the DB replication between Active node and Passive node is not happening.  Also you will notice Application state out of sync.

haf-8

At this point I hope you have understood the difference between the Automated failover test and Manual failover test and what happens when during failover.

I hope this post is informational to you. Feel free to share this on social media if it is worth sharing. Be sociable 🙂

One Comment