Performance research ESXi hosts with esxtop and more

Gathering & Analysing performance data

For a performance research I need to gather & analyse the esxtop statistics during a 2 hour window on about 10 ESXi hosts with about 15 VM’s where I need to gather the data from .One requirement was a perfromance data with a delay of 15 seconds and capture it between 08:00 and 10:00 (+1 timezone). Focus on CPU and disk statistics

So let’s break it up in some steps:

1) Gather ESXtop data on specific time using Crontab
2) Retrieve & Extract data from datastores using PowerCLI
3) Analyse data using Perfmon
4) Plot data with Datplot

Gather ESXtop data on specific time using Crontab

ESXtop batch mode command

First we need to know how to retreive data and which command we need to schedule, while scavenging the internet I saw a lot of good explained esxtop posts which helped me creating the command below which I wanted to schedule:

export DATE=$(date +%Y-%m-%d-%H-%M-%S) && export HOSTNAME=$(hostname) && /sbin/esxtop -b -d 15 -n 480 | gzip -9c > /vmfs/volumes/Datastore/Logging/$HOSTNAME-$DATE.csv.gz

To break it up, I used the post below to create the first part of the command, this is about setting the hostname and date variable and execute the ESXtop command and saving it to a filename where the time and hostname are added. Nothing much to explain here.

http://vbyron.com/blog/performance-analytics-esxi-esxtop-mongodb/

export DATE=$(date +%Y-%m-%d-%H-%M-%S) && export HOSTNAME=$(hostname) && <command>

I wasn’t completely happy with the ESXtop command in the post so I used Duncan’s post to complete it for my needs. I used it a few times before, because the direct zip the output is extremely handy.
http://www.yellow-bricks.com/esxtop/

esxtop -b -d 15 -n 480 | gzip -9c > /vmfs/volumes/Datastore/Logging/$HOSTNAME-$DATE.csv.gz

So let’s start esxtop in batchmode (-b) with a delay (-d) of 15 seconds, because we need to capture it for 2 hours  ( 7200 sec / 15 sec interval  = 480 samples) so the iterations (-n) are set to 480. To use the handy gzip command, pipe the output and set a location where to store the data. Make sure the path you set is available, as you can see I used a seperate “logging” directory, only flaw was that when the directory doesn’t exist, the data isn’t gathered. You might just want to dump it in the root of the datastore.

Ok, to wrap it up, we now have a nice command which gathers the ESXtop data according to requirements and save it as a zip file with the hostname and time & date stamp.

Scheduling using Crontab

To schedule the command on a specific date/time we use crontab for scheduling. More explanation on how to use crontab can be found here:

http://nl.wikipedia.org/wiki/Cronjob

Important is the part below which explains how the scheduling is done.

 # * * * * *  command to execute
 # │ │ │ │ │
 # │ │ │ │ │
 # │ │ │ │ └───── day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
 # │ │ │ └────────── month (1 - 12)
 # │ │ └─────────────── day of month (1 - 31)
 # │ └──────────────────── hour (0 - 23)
 # └───────────────────────── min (0 - 59)

 

Also for this part are a few good posts around, as well as a VMware article with some basic explanation:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033346

First start by enabling SSH on the ESXi host, and make a connection to your host. Once you are connected to the host, open the crontab file:

cd /var/spool/cron/crontabs/
vi ./root

Now you’re in the crontab file, there should be already some settings configured. Because we use VI to edit the file, first press <i> to go to insert mode.

Next add the line below the last one with a simple copy/paste.

0    7    10   2   *   export DATE=$(date +%Y-%m-%d-%H-%M-%S) && export HOSTNAME=$(hostname) && /sbin/esxtop -b -d 15 -n 480 | gzip -9c > /vmfs/volumes/Datastore/Logging/$HOSTNAME-$DATE.csv.gz

As you can see, I start the job at 0 minutes, 7 hours, 10th of 2nd month(February), no specific day.

Huh wait….07:00 wasn’t the requirement 08:00 ? Yes that’s true, but 08:00 is our local time, as ESXi hosts run in UTC mode, you need to set the time right in UTC.

The enable the scheduled job we need to restart the crond process. First retrieve the ID of the process using:

cat /var/run/crond.pid

Next kill the process ID:

Kill -HUP <Proc ID>

And start Crond:

crond

That’s it, now disconnect from your host and disable SSH if that’s your default.

Retrieve and extract the data with PowerCLI

Because I didn’t want to open all the datastores, copy files and extract them manually I just made a simple PowerCLI script (without error handling)

First I created an alias for 7zip which will be used later to extract the .gz files.

# Creating alias for 7-zip and test path
if (-not (test-path "C:\Program Files\7-Zip\7z.exe")) {throw "C:\Program Files\7-Zip\7z.exe needed"}
set-alias sz "C:\Program Files\7-Zip\7z.exe

Now we can use the alias sz to extract the files.


$datastores = Get-Datastore Datastore1,Datastore2
foreach ($datastore in $datastores){
Write-Host "Mounting $Datastore" -ForegroundColor Magenta
if (Get-PSDrive|?{$_.name -eq "ds"}){
Remove-PSdrive ds -Force | Out-Null
New-PSDrive -Location $datastore -Name ds -PSProvider VimDatastore -Root "\" |Out-Null
Set-Location ds:\
Write-Host "Copying datastore items from $datastore" -ForegroundColor Magenta
copy-DatastoreItem -Item ds:\Logging\* -Destination D:\Troubleshooting\LoggingGZ
}
else {
New-PSDrive -Location $datastore -Name ds -PSProvider VimDatastore -Root "\" |Out-Null
Set-Location ds:\
Write-Host "Copying datastore items from $datastore" -ForegroundColor Magenta
copy-DatastoreItem -Item ds:\Logging\* -Destination D:\Troubleshooting\LoggingGZ -Force
}
write-host -ForegroundColor Magenta "Done"
}

I got some errors with mounting and dismounting the PSdrive, so  I created a simple if statement to work around it.

Now we have the data local, we can extract it using the alias created earlier.


$GZ = gci D:\Troubleshooting\LoggingGZ\ | where {$_.name -like "*.gz"}
$path = "D:\Troubleshooting\LoggingGZ"
cd $path
#Extract all GZ files and move to folder \Logging
Write-Host "Extracting .gz files" -ForegroundColor Magenta
$GZ  | %{sz e -y $_.fullname}
Write-Host "Moving CSV files" -ForegroundColor Magenta
gci $path|?{$_.name -like "*.csv"}|% {Move-Item -Force $_ "D:\Troubleshooting\Logging\"}
Write-Host "Done" -ForegroundColor Magenta

There we go, now we have the original .gz files retrieved and also an unpacked CSV version.

 

Analyse data using Perfmon

Right..the CSV files are between 100 and 500 MB, (if you use the -a switch in ESXtop) to capture “all” statistics it will be even larger.

So as we don’t need all the data I extract only what I need so the files become easier to handle.

First start perfmon (Start->Run->Perfmon)

Right click on the screen and select “Properties” – > “Tab Source”
Select “Log files” -> “Add” and browse to on of your CSV files.

Next go to “tab Data” -> “Add”  to select the counters we need.

I need the counters below for 2 VM’s

Group CPU // % Costop/ %Ready /%Wait / %Run
Physical Disk SCSI Device // Average Driver ms/cmd / average guest ms/cmd

Select the right counters and instances you want to, now we only selected the data we want to work with. What about saving it to a new CSV file.

Right click on the screen and select “Save data as..”, select a filename and location and the filetype you want. You also could use *.blg format so you could later load multiple BLG files in Perfmon to compare between ESX hosts.
Now the file has shrunk from 166 MB to 308 KB…that’s better to handle.

You could use the perfmon tool to do further troubleshooting but I found another cool tool named : Datplot

Plot data using Datplot

After you donwloaded and succesfully installed Datplot it took me a few seconds to see how things worked. Not that hard, but here are some starting tips

#Import data
File -> Load new datasource and select your CSV file.

Next you get a few questions you need to answer, enter the lines where the columns and data starts. So for an ESXtop file that will be :

Get column (parameter) names from line : 1 (this is the header line)
Get column (parameter) names from line : 0 (no unit line)
Get column (parameter) names from line : 2 (here starts the data)

Column Delimiter : , (Comma)
Number decimal symbol : . (Dot)

Select “import data”, now the data is imported and you see an empty graph.

#Plot something
Next in the upper right corner there is a dropdown menu where you need to select the X-Axis. Select the first option (which is the time) and click apply.

So what’s next : let’s add some data

Right click on the graph and select “Datacurve -> Add”
Select the parameter you want to plot. You need to plot the parameters one by one. You can plot multiple parameters and even select to target on the left or right y-axis. This way you could add different parameter values in one graph.

What if you want to split it out, that’s also possible, when you right click and select “Graph pane -> Add” you’ll see a 2nd graph appear. Here you can plot more data.

Nice thing is you can split different graph on the same timeline. Another cool thing is that you can add an “Event line”.

With this line you could for excample point out a spike -> “right click on spike  -> event line -> add”

This way a vertical line is drawn through both graphs which also displays the values on that time for all lines.

Also adding a box with the Min/mean/max values can be handy, add this through the “Edit Menu” -> “Toggle min/max/mean” -> Select the location.

Some other things you can do is save as different image formats, add lines/text.

Will add some screenshots later.