Articles > Virtualization
If your HA won't enable properly on your VMware vSphere ESXi 4.1 host after a recent upgrade, then you are not alone. I am having the same problem.
More Info: (skip to the bottom if you just want the solution!)
Error Message: Configuration Issues - HA agent on [esx hostname] in cluster [cluster name] has an error : Error while running health check script.
The logs at /var/log/messages from the console show this additional info:
:Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=[esx hostname] -uname=VMkernel -cmd=monitornodes -domain=vmware
error 'App'] [VpxaVMAP::Invoke] Command /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=[esx hostname] -uname=VMkernel -cmd=monitornodes -domain=vmware failed with error 3
Failure location: 08/11/10 22:08:26 [myexit] function main::myexit called from line 360
Aug 11 22:08:27 08/11/10 22:08:26 [myexit] VMwareresult=failure
Total time for script to complete: 0 minute(s) and 0 second(s)
Note: I removed some text from the logs to make it easier to read.
Hopefully, I will have a solution after contacting VMware support later this week.
UPDATE:
I figured out a little more. From the logs above you can see that the HA health script that runs every 30 seconds is located at: /opt/vmware/aam/ha/aam_config_util.pl. I have compared that script one two servers, one with HA working and one with HA not working. On the working ESX host the script has a date stamp of 4/13/2010. On the ESX host with broken HA the script has a time stamp of 4/22/2009. All the other files in the folder: /opt/vmware/aam/ha are also 1 year behind the working ESX host. It appears that the 4.1 upgrade must not have installed all the correct HA files.
The next interesting thing, is that I found a script called: VMware-aam-ha-uninstall.sh at the location of: /opt/vmware/aam. It makes me wonder if I execute the uninstall script and reconfigure HA, will my problems go away? Am I brave enough to find out?
UPDATED AGAIN: (Solution)
Yep, it does fix the problem. Turn off HA for the cluster, run the uninstall script, and restart services. Here are the commands to fix the issue.
From ESXi 4.1 SSH console: (You can enable SSH from Configuration > Security Profiles > Properties)
./opt/vmware/aam/VMware-aam-ha-uninstall.sh
services.sh stop
services.sh start
re-enable HA for the cluster and click "reconfigure for VMware HA" if it doesn't do it automatically.
I figured this out by applying the solution from KB Article: 1007234 which solves a different HA issue with the same solution. So if you want more detail, read that article.
[Solved] VMware HA Fails after Upgrade to ESXi 4.1
Last Updated: 8/11/10If your HA won't enable properly on your VMware vSphere ESXi 4.1 host after a recent upgrade, then you are not alone. I am having the same problem.
More Info: (skip to the bottom if you just want the solution!)
Error Message: Configuration Issues - HA agent on [esx hostname] in cluster [cluster name] has an error : Error while running health check script.
The logs at /var/log/messages from the console show this additional info:
:Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=[esx hostname] -uname=VMkernel -cmd=monitornodes -domain=vmware
error 'App'] [VpxaVMAP::Invoke] Command /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=[esx hostname] -uname=VMkernel -cmd=monitornodes -domain=vmware failed with error 3
Failure location: 08/11/10 22:08:26 [myexit] function main::myexit called from line 360
Aug 11 22:08:27 08/11/10 22:08:26 [myexit] VMwareresult=failure
Total time for script to complete: 0 minute(s) and 0 second(s)
Note: I removed some text from the logs to make it easier to read.
Hopefully, I will have a solution after contacting VMware support later this week.
UPDATE:
I figured out a little more. From the logs above you can see that the HA health script that runs every 30 seconds is located at: /opt/vmware/aam/ha/aam_config_util.pl. I have compared that script one two servers, one with HA working and one with HA not working. On the working ESX host the script has a date stamp of 4/13/2010. On the ESX host with broken HA the script has a time stamp of 4/22/2009. All the other files in the folder: /opt/vmware/aam/ha are also 1 year behind the working ESX host. It appears that the 4.1 upgrade must not have installed all the correct HA files.
The next interesting thing, is that I found a script called: VMware-aam-ha-uninstall.sh at the location of: /opt/vmware/aam. It makes me wonder if I execute the uninstall script and reconfigure HA, will my problems go away? Am I brave enough to find out?
UPDATED AGAIN: (Solution)
Yep, it does fix the problem. Turn off HA for the cluster, run the uninstall script, and restart services. Here are the commands to fix the issue.
From ESXi 4.1 SSH console: (You can enable SSH from Configuration > Security Profiles > Properties)
./opt/vmware/aam/VMware-aam-ha-uninstall.sh
services.sh stop
services.sh start
re-enable HA for the cluster and click "reconfigure for VMware HA" if it doesn't do it automatically.
I figured this out by applying the solution from KB Article: 1007234 which solves a different HA issue with the same solution. So if you want more detail, read that article.
Keywords: esxi, esx, 40 to 4.1, upgrade, ha error, cluster error, error while running health check script,