Overview
I have ran into this problem since ESX 3.5 and vCenter 2.5 and is still present today in vSphere 4.1 HA configuration. I have had several clusters initiate HA events today due to bad weather in our region and rolling black outs. 90% of the clusters recovered without errors, but some of them had the HA failure with "cmd addnode failed for primary node". All VMs came back and this is more of a minor issue, but still not clean.
I believe the race conditions with HA recovery and and cluster monitoring may be an issue.
Solution
I tried to use the "Reconfigure HA" from the vCenter options but failed continuously. I found out the method of just removing and adding HA to the cluster fixed the issue. I will be submitting this again to VMWare for resolution.
No comments:
Post a Comment