VMFS-volumes with I/O-errors

Status: work in progress

Symptoms

In worst case an I/O-error on a VMFS-volume will result in an unmountable volume.
In less severe cases the I/O-error affects only a vmware.log or another file that can be recreated easily – such as vmx-files, vmdk-descriptorfiles or vmsd-files.
In a typical case the error affects a flat.vmdk or a delta.vmdk.
The last 2 cases will result in an unstartable VM or in a vmdk that can not be copied, cloned or used in a VM.
Effectively this often means that the content of the VM is lost.

What does the VMware documentation say about this problem ?

Unfortunately I am not aware of any knowledgebase entry or other official resource that gives any useful advice on this topic.
Please correct me if I am wrong.

My 2 cents …

Most admins will first of all consider hardware related issues when they see I/O errors.
In my experience only 1/3 or less of the I/O errors I see on VMFS-volumes really are caused by hardware-problems.
The majority of cases that I have seen are related to what I call “logical VMFS errors”
In this post I will only deal with this “logical VMFS errors”
These errors occur because there is an error in the VMFS-metadata that describe files and directories of the VMFS-volume.
I found ways to work around some of these error.

DISCLAIMER:
The procedures I explain in the following may contradict the VMware documentation or Knowledgebase.

Anamnese:

First of all lets check how seriously the problem really is.
Serious: there are I/O errors in the area used for the VMFS – that means the I/O-error is located at the start of the VMFS-volume up to about  1200-1500MB
A command like
dd if=/dev/disks/vmfs-partition bs=1M count=1536 of=/tmp/vmfs-header.dump
will fail and report an I/O error. A typical location of the first I/O error is at 29 MB into the VMFS-partition.
This problem typically result in an unmountable volume.

Manageable: the I/O error affects only a single file
A command like
vmkfstools -i name.vmdk clone.vmdk
will fail with an I/O error.
The VM that uses this VMDK may still work but you can not copy or clone it anymore.
It is also possible that the VM will start and report a more specific error.


Examples:

A flat.vmdk reports an I/O error

Lets assume we already tried
vmkfstools -i /vmfs/volumes/datastoreA/directoryA/name.vmdk  /vmfs/volumes/datastoreB/directoryB/clone.vmdk 
failed with an I/O error.
Next we try
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk
this also fails with an I/O error. We see that the error is at lets say 21 MB. dd reports 20 MB were copied and then it aborted.
Now we can split the dd-command in 2 steps: first one copies the part before the error, second step is to copy the part after the error.
Like this
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk count=20
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk skip=21 seek=21
If this works the flat.vmdk has only one error and the error-range is 1 MB.
Of course in real life it will not be so easy – we can have several I/O-errors with different error-ranges.
If ESXi had the tool ddrescue it would be easy – but we have to work with dd only.
Next thing to try is a vmkfstools command like
vmkfstools -p 0 /vmfs/volumes/datastoreA/directoryA/name.vmdk > mapping-name.txt
If that works we can use the mapping-name.txt file to create a dd-command that copies the vmdk fragment by fragment.
Unfortunately it is very rare that this command will work at this stage.

When we tried all the buildin ESXi tools and all of them failed the next step is to switch to Linux.
As always I use the MOA LiveCD
After the Linux-system has finished booting and the network is up you can access it via putty or WinSCP (user: root – password: sanbarrow)
create 2 directories:
sudo su
mkdir /esxi
mkdir /vmfs-out

connect to the target ESXi-host in readonly mode
sshfs -o ro root@ip-of-esxi:/ /esxi
connect to the same ESXi-host again in a writeable mode – mounting a single directory is enough.
sshfs  root@ip-of-esxi:/vmfs/volumes/datastoreB/directoryB /vmfs-out
Now we can use ddrescue to have an easy way to skip the error-areas.
ddrescue-syntax: ddrescue <file that needs to be copied> <output-file> <copy-log-file>
Make sure to create a log-file !
ddrescue /esxi/vmfs/volumes/datastoreA/directoryA/name-flat.vmdk /vmfs-out/clone-flat.vmdk /vmfs-out/copy.log
This command will try to copy the source file block by block and it can skip blocks that can not be copied.
The result will still be a damaged flat.vmdk – but this time we will at least get a vmdk that we can work with later.
This workaround will not always work – all I can say is that the attempt to try it is worth the effort.
Keep in mind that the VMDK will probably be a complete loss if you dont try …

There is a Plan C but that exceeds the scope of this blog.
Call me if necessary.


Examples:

The VMFS-metadata area has an  I/O error

In this case you will not be able to mount the volume at all or you see corrupted directories or a bunch of corrupt vmdks.
Lets assume we already tried the normal procedure to create a VMFS header dump
dd if=/dev/disks/vmfs-partition bs=1M count=1536 of=/tmp/vmfs-header.dump
and the command failed with an I/O error.
Trying to work around the I/O error with several dd-commands is very inconvenient using ESXi only.
So we switch to Linux again.
Use the same sshfs commands to connect to the ESXi host as we used before.
Again I use ddrescue to locate the errors.
To do that I simply run
ddrescue /esxi/dev/disks/vmfs-partition /tmp/vmfs-header.dump /tmp/copy-log
I watch how it goes and abort with CTRL + C once the command copied at least 1300MB.
Inspecting the ddrescue copy log now tells me how many errors there are and how large the error-area actually is.
Now it gets a bit tricky ….
The VMFS-metadata area (the first 1200-1500MB of the VMFS-volume) has some areas that are supposed to be filled with zeroes.
If our error is located in such an area we can fix the VMFS-metadata by simply injecting zeroes.
Errors in the more important cant be fixed that easily.
The basic approach I use here is to try to get a header-dump with an error-area as small as possible.
Once I have such a dump I use it in my lab-environment and try to create dd-commands for the VMs I need to extract.

Please understand that I do not go deeper into this matter at the moment.
If you need to investigate errors like this one bad dd-command can make matters much worse.
If you need help here – call me.

 

 

 

 

 

Create a VMFS header dump using Linux

Scenario

The ESXi-host is active in production.
A VMFS- datastore has a problem like:
– can not be mounted
– a directory or a file is suddenly missing
– a vmdk can not be used because of I/O errors
– other “strange effects”
The datastore can be active and there is no need to shutdown any VMs first.

Prepare the Linux-system

The Linux-system can be launched
– on a physical host
– as a VM on any other ESXi/Workstation-host
– as a VM on the target ESXi-host
Required:
–  the moa.iso Linux LiveCD
– network connection to the target ESXi
– if it is a VM assign at least 4 GB RAM,  2 CPUs or better and use a network that can access the IP of the target ESXi-host
– a VMDK is only necessary for extensive analysis

Create the VMFS-meta-data dump

After the Linux-system has finished booting and the network is up you can access it via putty or WinSCP (user: root  –  password: sanbarrow)
create a directory:
sudo su
mkdir /esxi 

connect to the target ESXi-host
sshfs -o ro root@ip-of-esxi:/ /esxi

check the list of devices that are in use by the target ESXi
cd /esxi/dev/disks
ls -lah | grep -v vml
You should now be able to identify the target VMFS-volume
Dump the first 1536 MB of the 3rd partition if the VMFS-volume is part of the ESXi-boot disk
dd if=name-vmfs-partition:3   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
Dump the first 1536 MB of the 1st partition if the VMFS-volume is on a disk of its own.
dd if=name-vmfs-partition:1   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
To analyse the partitiontable dump the first 1 MB of the disk
dd if=name-vmfs-disk   bs=1M count=1 of=/tmp/name-vmfs.disk

( Edit the lines above and use appropriate file names. )
If that worked you can download the dump-file with WinSCP .
Delete the dump after you downloaded it from /tmp to keep free space in /tmp available.

Locked files with VMFS 6

You probably know the error message: operation impossible since file is locked.
If a VMDK is locked you can not start it inside a VM and you can not copy it.
The knowledgebase basically tells you to stop all processes that access the file and reboot.
This is supposed to fix the problem.
This documentation is not good enough.
In VMFS 6 following this advice may not be sufficient to release the lock.
This means that you basically lost the file as you can not read it anymore.
Trying to access it via Linux is also impossible at the moment.
Recent recovery third party tools like UFSexplorer also failed to read the vmdks in such a case.
Here is a recent case for your reference ….
Problem with locked virtual machine after esxi host crash
I have seen a couple of this cases recently so I started to investigate the issue,
Now I have found a way to recover a VMDK that would other wise be a case for Ontrack or VMware Support.
My procedure requires a direct hexedit of the VMFS heartbeat- section so please do not ask for details at the moment.
I will post details once I know this approach is dafe.
Anyway I believe this is a bug in the way ESXi 6.5 handles the heartbeat section of a VMFS-volume.
This issue should never affect single host environements.
Feel free to contact me via skype if you run into this yourself.

Symptoms:

none of the following commands that are available on ESXi will work once a flat.vmdk is locked:
vmkfstools -i  name.vmdk new.vmdk
vmkfstools -p 0 name-flat.vmdk > mapping.txt
hexdump -C name-flat.vmdk | less
dd if=name-flat.vmdk of=new-flat.vmdk bs=1M
starting a VM which uses the locked VMDK will fail

Consequences:

Effectively the VM / VMDK is lost as you can not even read it one more time to copy the data.

Plan A:

Try this first:
If possible isolate the VMFS-volume so that it is exposed to a single ESXi.
Then follow the troubleshooting steps from the VMware Knowledgebase
https://kb.vmware.com/s/article/10051
If you follow the steps you are supposed to get rid of the stale lock.
Apparently this is no longer a 100% reliable procedure with VMFS 6.

Should I try VOMA ?
At the moment my answer to that question is NO.
We need to know the  IP / MAC of the ESXi-host that holds the lock.
To acquire that info good old vmkfstools is enough and that will work without silencing the VMFS-volume first.
In other words: in this case VOMA is most likely a waste of time.

Plan B:

This often used to work with earlier VMFS-versions:
– use  Linux vmfs-tools to read the VMDK and clone it to a new location (unfortunately even the latest builds of vmfs-tools that I am aware of do not support VMFS 6)
– use an ESXi-LiveCD to read the VMFS-volume and clone the VMDK to a new location
– do a fresh install of the same ESXi-build to a temporary USB-flash drive

Plan C:

One essential requirement for a Cluster-filesystem is the need to allow access to a VMDK for a single host inside the Cluster and prevent access for all other hosts.
In order to do this in an efficient and fast way every VMFS-volume uses a small section of the VMFS-metadata for so called heartbeats.
For this “heartbeat section” VMFS 6 uses an area of just one block which equals 1 MB.
This “heartbeat section” can be accessed in 2 ways:
1. dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=1 skip=20  of=heartbeat-section.bin
Option 1 appears to be the more reliable one as this location is independant on the actual location of the .vh.sf file.

Quick-diagnosis:

Use a Linux-system or a Windows that has the commandline tool strings(.exe)
Windows versions is available here: https://docs.microsoft.com/en-us/sysinternals/downloads/strings
Run
strings heartbeat-section.bin > strings.txt
In strings.txt you should see the IP-address or MAC-address that you already know from the error-message of the locked file.
If you find no such reference you can stop reading here – I assume you have a different problem.

Safety first:

As far as I know there does not exist any detailed public  documentation on the exact syntax of the heartbeat-section in a VMFS 6-partition.
That means that as long I do not definetely know all the fine details I have to take care that my instructions are as failsafe as possible.
When we edit such a critical section of the VMFS-metafiles we should avoid to use hexeditors or other tools that are a risk in the hand of an inexperienced user.
Instead I prefer a way that easily allows to create a backup of the relevant 1MB block first.
The command
dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
does that.
Then I inject a clean-heartbeat-section.bin that I created on a newly created and freshly formatted VMFS 6 -volume (created by the same ESXi-build)
This will completely clean all eventually existing stale locks and appears to have the desired effect.
If still something goes wrong you can easily reinject the original section.

To inject a clean heartbeat-section use
dd of=.vh.sf bs=1M count=1 seek=3 if=clean-heartbeat-section.bin conv=notrunc

According to my current experiences this injection will be effective almost immediatly.
If you see no change in the behaviour try a reboot of the ESXi.

Please help …

I defintely need to see more cases of this defect before I consider offering downloads of premade fixed sections.
So if you run into this problem in the near future please contact me.
skype: sanbarrow
I will then create a fixed section and help you to safely inject it.

Ulli

You can hire me on a “per-incident-level” – my help is most useful with recovery-problems.

Virtual USB-disks

A VMware feature that I have missed since at least 10 years apparantly already exists sincequite a while.
See https://communities.vmware.com/thread/580985
Am I the only one who did not notice this ?
So whats new ?
We are used to the VMware virtual disk format (vmdk-files)
This VMDK-files can be attached to a VM so that the guestOS perceives them as:
– IDE-device
– SCSI-device
– SATA-device
– NVME-device
Until today I was not aware of the fact that there is one more option:
– USB-device
This feature is not exposed in the GUI but using it is quite easy and straight forward.
To define an existing VMDK as USB-device you have to edit the vmx-file.
First of all make sure that you have this line:
ehci.present = “TRUE”
You need this line as a main switch for USB 2 ports.
You should also see a line like
ehci.pciSlotNumber = “35”
Do not edit this line – instead simply delete it if you want to reset the port.
If you assign a “bad” port you will get obscure follow-up problems – so don’t do it.
Now to assign a VMDK as USB-device set this parameters:

ehci:0.present = “TRUE”
ehci:0.deviceType = “disk”
ehci:0.fileName = “usb-vmdk.vmdk”

ehci:1.present = “TRUE”
ehci:1.deviceType = “disk”
ehci:1.fileName = “usb2-vmdk.vmdk”

Using this appears to be possible for more than one VMDK – so it maybe possible that the full range from ehci0 – ehci5 is allowed.
This is just a first guess – I need to do more research here ….

I said that this feature is not exposed in the GUI – that is not entirely correct.
Once you created the required vmx-parameters and start the VM you will see the disks appear in the list of removable devices:

Inside a guest both vmdks appear like this: (using my Linux-LiveCD with Ubuntu 14)


During my experiments I noticed that the USB-vmdks may appear in a write protected mode.
At the moment I cant claim to have completely understood in which constellation the vmdks are write-protected.
This will require further research ….

Anyway – even at the moment I would call this an extremly useful “new” feature.

1. for all those guys that develope USB-bootable tools

If the USB-vmdks are created with the monolithicFlat VMDK-format the USB-images can be easily transferred to real USB-devices with a simple dd-command.

2. for all users who are looking for a way to assign VMDKS  as “optional”

A USB-vmdk is allowed to be temporarily unavailable !!!
All other options to assign VMDKs will fail if the file is not present.
With USB-vmdks the VM will start even if the file is not available.
This will open new paths to acchieve obscure constellations that were impossible until now.

3. for all users that missed the option to assign single-partition images as a VMDK.

All other options to assign VMDKs usually require a partitioned image including a valid MBR or GPT.


Todo:
check since when this feature exists
check if this works in ESXi
find out when such a VMDK will be flagged as readonly

Ulli