Auto-update VMware tools installation fails?

Somehow we noticed a few machines where it was not possible to auto-update the VMware tools from vCenter. After a time-out of 30 minutes the task failed. Result, no VMware tools installed and a task that failed (according to vCenter) but a task keeps remaining on the ESXi host/VM.

We never encountered this issue, after looking through log files, eventvwr etc. I didn’t find a proper explanation. Somehow I suddenly got a clear moment and thought about looking in the advancedsettings from the VM because a while ago I changed the template settings according to our yearly hastle with the security baseline 🙂
Easy with powerCLI:

get-vm <vm>|Get-AdvancedSetting

Mmz I found a tools setting:

Name: isolation.tools.autoinstall.disable
Value:true

This just means that the auto-install is disabled. That explains why it won’t work automatically, and manually we didn’t encounter any issue.

Hah, this easily can be edited also with powerCLI.

get-vm <VM>|Get-AdvancedSetting -Name isolation.tools.autoinstall.disable|Set-AdvancedSetting -Value:$false -confirm:$false

DAMN!

I throw this script in our test cluster, but somehow not all machines can be edited. A little investigation leaded me to the fact that the machines which failed got a “Time-out VMware tools install task”

So what about killing the task, it was not possible through vCenter, I tried to restart the management agents on the host but that wasn’t a success also.

Then I came accross this VMware article :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1038542

  1. Run this command to identify the virtual machine ID:
    # vim-cmd vmsvc/getallvms 

    You see an output similar to:

    2256   testvm          [datastore] testvm/testvm.vmx              winNetStandardGuest       vmx-07    
  2. Note the ID of the virtual machine with the ongoing VMware Tools installation. In this example, the virtual machine ID is 2256.
  3. Run this command to stop the VMware Tools installation:
    # vim-cmd vmsvc/tools.cancelinstall <vm_ID>

    Where <vm_ID> is the ID of the virtual machine noted in Step

Now the Tools upgrade task is killed, now it is possible again to vMotion, change advanced settings etc.

So now it’s possible to edit the settings. I started the script again and it went flawlessly. Next step is to try the VMware tools installation again.

I picked a machine, started install VMware tools through vCenter and waited….after 30 minutes again: TIME-OUT!
Mmmmm maybe we need to power-off/power-on the VM to pick the new setting. So again, cancelled task, checked setting. All OK.

Shutdown vmguest, power-on and reinstall VMware tools through vCenter this seems to be the magical combination.

To wrap it up:

– Cancel any running tasks on the ESX host/VM
– Change the isolation.tools.autoinstall.disable setting
– Power off/on the VM
– Reinstall VMware tools

Best is of course to change the setting while the machine is powered off.

Or just do a manual install 🙂

Last but not least I created a little script to retrieve the key, and if the key exists, and the value not equals  “true”, it sets the setting to


foreach ($VM in get-cluster *199* | Get-VM){
$Setting = $VM|Get-AdvancedSetting -Name isolation.tools.autoinstall.disable|select name, value
if ($Setting -ne $null){
if ($Setting.value -ne "false"){
$VM|Get-AdvancedSetting -Name isolation.tools.autoinstall.disable|Set-AdvancedSetting -Value:$false -Confirm:$false
}
}
}

Root disk full WTF?

Root disk full  WTF?

Suddenly somehow we got a virtual machine which couldn’t be powered on, vCenters events showed messages like:

Power On virtual machine
<Computer>
A general system error occurred: Unknown error

Mmm strange, but seen it before. I used the articles in each section to find the solution

Part 1

I started to look at the /var/spool/snmp directory, but this one was empty so it could not be the cause of filling the /root disk.

  1. Connect to the ESXi host using SSH. For more information, see Using ESXi Shell in ESXi 5.0 and 5.1 (2004746).
  2. Check if SNMP is creating too many .trp files in the /var/spool/snmp directory on the ESXi host by running the command:ls /var/spool/snmp | wc -lNote: If the output indicates that the value is 2000 or more, this may be causing the full inodes.
  3. Delete the .trp files in the /var/spool/snmp/ directory by running the commands:# cd /var/spool/snmp
    # for i in $(ls | grep trp); do rm -f $i;done
  4. Change directory to /etc/vmware/ and back up the snmp.xml file by running the commands:# cd /etc/vmware
    # mv snmp.xml snmp.xml.bkup
  5. Create a new file named snmp.xml and open it using a text editor. For more information, see Editing files on an ESX host using vi or nano (1020302).
  6. Copy and paste these contents to the file:<?xml version="1.0" encoding="ISO-8859-1"?>
    <config>
    <snmpSettings><enable>false</enable><port>161</port><syscontact></syscontact><syslocation></syslocation>
    <EnvEventSource>indications</EnvEventSource><communities></communities><loglevel>info</loglevel><authProtocol></authProtocol><privProtocol></privProtocol></snmpSettings>
    </config>
  7. Save and close the file.
  8. Reconfigure SNMP on the affected host by running the command:# esxcli system snmp set –-enable=true
  9. To confirm the SNMP services are running normally again, run the command:# esxcli system snmp getHere is an example of the output:
    /etc/vmware # esxcli system snmp get
    Authentication: Communities: Enable: true Engineid: 00000063000000a10a0121cf Hwsrc: indications Loglevel: info Notraps: Port: 161 Privacy: Remoteusers: Syscontact: Syslocation: Targets: Users: V3targets:

But to be sure I disabled SNMP like mentioned in the article.
ESXi 5.1 host becomes unresponsive when attempting a vMotion migration or a configuration change (2040707)

Part 2

In the first article there is also a hint that HP Gen8 hardware could have similar issues with another resolution. HAH! We use that hardware, so I went on to remove the hpHelper.log file. With the steps below

To remove the hpHelper.log file:

  1. Log in to the ESXi host as the root user.
  2. Stop the HP Helper management agent by running the command:/etc/init.d/hp-ams.sh stop
  3. To remove the hpHelper.log file, run the command:rm /var/log/hpHelper.log
  4. To restart the HP Helper management agent, run the command:/etc/init.d/hp-ams.sh start

ESXi ramdisk full due to /var/log/hpHelper.log file size (2055924)

After I started everything and mentioned the HPHelper.log file was super small I tried again. DAMN! Same error, guess there is something else which is filling the root disk, so my search continues.

Part 3

Somehow I got a clear moment and thought about the sfcdb-watchdog service. After a search with sfcbd and root disk full I found this article :

ESXi 5.x host is disconnected from vCenter Server due to sfcbd exhausting inodes (2037798)

With the help of the command vdf -h I noticed that there was only 8 MB Free on /root

Ramdisk                   Size      Used Available
root                           32M        24M         8M
etc                             28M     224K     27M
tmp                         192M       40K    191M
hostdstats            834M        3M    830M

Mmm that’s not much.

To free up inodes:
  1. Connect to the ESXi host using SSH.
  2. To stop the sfcbd service, run the command:/etc/init.d/sfcbd-watchdog stop
  3. Manually delete the files in the var/run/sfcb directory to free inodes.To remove the files, run the commands:cd /var/run/sfcb
    rm [0-2]*
    rm [3-6]*
    rm [7-9]*
    rm [a-c]*
    rm [d-f]*
  4. To restart the sfcbd service, run the command:/etc/init.d/sfcbd-watchdog startNote: You may need to restart management agents after step 4. For more information, see Restarting the Management agents on an ESXi or ESX host (1003490).

After doing this I had like 30MB free on /root.

This issue should be resolved in update below. Unfortunatly it was not possible for us to update these hosts before with the latest patches.

To manually resolve this issue, you can prevent the root file system from being impacted

  1. Connect to the ESXi host using SSH.
  2. Run this command to stop the sfcbd service:/etc/init.d/sfcbd-watchdog stop
  3. Enable this command to be run at boot time:esxcli system visorfs ramdisk add –name sfcbtickets –min-size 0 –max-size 1024 –permissions 0755 –target /var/run/sfcbFor more information, see Modifying the rc.local or local.sh file in ESX/ESXi to execute commands while booting (2043564).
  4. Run this command to start the sfcbd service:/etc/init.d/sfcbd-watchdog start
  5. Run this command and verify if the new ramdisk exists:esxcli system visorfs ramdisk list
Note: This issue may also occur if the SNMP agent is enabled. For more information, see vCenter Server and vmkernel logs report the message: The ramdisk ‘root’ is full (2033073).
After doing this I was able to succesfully start up the virtual machine.
Now it’s time to make sure we do the updates asap!
Another handy article to gather file system information