Create a VMFS header dump using Linux

Scenario

The ESXi-host is active in production.
A VMFS- datastore has a problem like:
– can not be mounted
– a directory or a file is suddenly missing
– a vmdk can not be used because of I/O errors
– other “strange effects”
The datastore can be active and there is no need to shutdown any VMs first.

Prepare the Linux-system

The Linux-system can be launched
– on a physical host
– as a VM on any other ESXi/Workstation-host
– as a VM on the target ESXi-host
Required:
–  the moa.iso Linux LiveCD
– network connection to the target ESXi
– if it is a VM assign at least 4 GB RAM,  2 CPUs or better and use a network that can access the IP of the target ESXi-host
– a VMDK is only necessary for extensive analysis

Create the VMFS-meta-data dump

After the Linux-system has finished booting and the network is up you can access it via putty or WinSCP (user: root  –  password: sanbarrow)
create a directory:
sudo su
mkdir /esxi 

connect to the target ESXi-host
sshfs -o ro root@ip-of-esxi:/ /esxi

check the list of devices that are in use by the target ESXi
cd /esxi/dev/disks
ls -lah | grep -v vml
You should now be able to identify the target VMFS-volume
Dump the first 1536 MB of the 3rd partition if the VMFS-volume is part of the ESXi-boot disk
dd if=name-vmfs-partition:3   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
Dump the first 1536 MB of the 1st partition if the VMFS-volume is on a disk of its own.
dd if=name-vmfs-partition:1   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
To analyse the partitiontable dump the first 1 MB of the disk
dd if=name-vmfs-disk   bs=1M count=1 of=/tmp/name-vmfs.disk

( Edit the lines above and use appropriate file names. )
If that worked you can download the dump-file with WinSCP .
Delete the dump after you downloaded it from /tmp to keep free space in /tmp available.

Locked files with VMFS 6

You probably know the error message: operation impossible since file is locked.
If a VMDK is locked you can not start it inside a VM and you can not copy it.
The knowledgebase basically tells you to stop all processes that access the file and reboot.
This is supposed to fix the problem.
This documentation is not good enough.
In VMFS 6 following this advice may not be sufficient to release the lock.
This means that you basically lost the file as you can not read it anymore.
Trying to access it via Linux is also impossible at the moment.
Recent recovery third party tools like UFSexplorer also failed to read the vmdks in such a case.
Here is a recent case for your reference ….
Problem with locked virtual machine after esxi host crash
I have seen a couple of this cases recently so I started to investigate the issue,
Now I have found a way to recover a VMDK that would other wise be a case for Ontrack or VMware Support.
My procedure requires a direct hexedit of the VMFS heartbeat- section so please do not ask for details at the moment.
I will post details once I know this approach is dafe.
Anyway I believe this is a bug in the way ESXi 6.5 handles the heartbeat section of a VMFS-volume.
This issue should never affect single host environements.
Feel free to contact me via skype if you run into this yourself.

Symptoms:

none of the following commands that are available on ESXi will work once a flat.vmdk is locked:
vmkfstools -i  name.vmdk new.vmdk
vmkfstools -p 0 name-flat.vmdk > mapping.txt
hexdump -C name-flat.vmdk | less
dd if=name-flat.vmdk of=new-flat.vmdk bs=1M
starting a VM which uses the locked VMDK will fail

Consequences:

Effectively the VM / VMDK is lost as you can not even read it one more time to copy the data.

Plan A:

Try this first:
If possible isolate the VMFS-volume so that it is exposed to a single ESXi.
Then follow the troubleshooting steps from the VMware Knowledgebase
https://kb.vmware.com/s/article/10051
If you follow the steps you are supposed to get rid of the stale lock.
Apparently this is no longer a 100% reliable procedure with VMFS 6.

Should I try VOMA ?
At the moment my answer to that question is NO.
We need to know the  IP / MAC of the ESXi-host that holds the lock.
To acquire that info good old vmkfstools is enough and that will work without silencing the VMFS-volume first.
In other words: in this case VOMA is most likely a waste of time.

Plan B:

This often used to work with earlier VMFS-versions:
– use  Linux vmfs-tools to read the VMDK and clone it to a new location (unfortunately even the latest builds of vmfs-tools that I am aware of do not support VMFS 6)
– use an ESXi-LiveCD to read the VMFS-volume and clone the VMDK to a new location
– do a fresh install of the same ESXi-build to a temporary USB-flash drive

Plan C:

One essential requirement for a Cluster-filesystem is the need to allow access to a VMDK for a single host inside the Cluster and prevent access for all other hosts.
In order to do this in an efficient and fast way every VMFS-volume uses a small section of the VMFS-metadata for so called heartbeats.
For this “heartbeat section” VMFS 6 uses an area of just one block which equals 1 MB.
This “heartbeat section” can be accessed in 2 ways:
1. dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=1 skip=20  of=heartbeat-section.bin
Option 1 appears to be the more reliable one as this location is independant on the actual location of the .vh.sf file.

Quick-diagnosis:

Use a Linux-system or a Windows that has the commandline tool strings(.exe)
Windows versions is available here: https://docs.microsoft.com/en-us/sysinternals/downloads/strings
Run
strings heartbeat-section.bin > strings.txt
In strings.txt you should see the IP-address or MAC-address that you already know from the error-message of the locked file.
If you find no such reference you can stop reading here – I assume you have a different problem.

Safety first:

As far as I know there does not exist any detailed public  documentation on the exact syntax of the heartbeat-section in a VMFS 6-partition.
That means that as long I do not definetely know all the fine details I have to take care that my instructions are as failsafe as possible.
When we edit such a critical section of the VMFS-metafiles we should avoid to use hexeditors or other tools that are a risk in the hand of an inexperienced user.
Instead I prefer a way that easily allows to create a backup of the relevant 1MB block first.
The command
dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
does that.
Then I inject a clean-heartbeat-section.bin that I created on a newly created and freshly formatted VMFS 6 -volume (created by the same ESXi-build)
This will completely clean all eventually existing stale locks and appears to have the desired effect.
If still something goes wrong you can easily reinject the original section.

To inject a clean heartbeat-section use
dd of=.vh.sf bs=1M count=1 seek=3 if=clean-heartbeat-section.bin conv=notrunc

According to my current experiences this injection will be effective almost immediatly.
If you see no change in the behaviour try a reboot of the ESXi.

Please help …

I defintely need to see more cases of this defect before I consider offering downloads of premade fixed sections.
So if you run into this problem in the near future please contact me.
skype: sanbarrow
I will then create a fixed section and help you to safely inject it.

Ulli

You can hire me on a “per-incident-level” – my help is most useful with recovery-problems.

Virtual USB-disks

A VMware feature that I have missed since at least 10 years apparantly already exists sincequite a while.
See https://communities.vmware.com/thread/580985
Am I the only one who did not notice this ?
So whats new ?
We are used to the VMware virtual disk format (vmdk-files)
This VMDK-files can be attached to a VM so that the guestOS perceives them as:
– IDE-device
– SCSI-device
– SATA-device
– NVME-device
Until today I was not aware of the fact that there is one more option:
– USB-device
This feature is not exposed in the GUI but using it is quite easy and straight forward.
To define an existing VMDK as USB-device you have to edit the vmx-file.
First of all make sure that you have this line:
ehci.present = “TRUE”
You need this line as a main switch for USB 2 ports.
You should also see a line like
ehci.pciSlotNumber = “35”
Do not edit this line – instead simply delete it if you want to reset the port.
If you assign a “bad” port you will get obscure follow-up problems – so don’t do it.
Now to assign a VMDK as USB-device set this parameters:

ehci:0.present = “TRUE”
ehci:0.deviceType = “disk”
ehci:0.fileName = “usb-vmdk.vmdk”

ehci:1.present = “TRUE”
ehci:1.deviceType = “disk”
ehci:1.fileName = “usb2-vmdk.vmdk”

Using this appears to be possible for more than one VMDK – so it maybe possible that the full range from ehci0 – ehci5 is allowed.
This is just a first guess – I need to do more research here ….

I said that this feature is not exposed in the GUI – that is not entirely correct.
Once you created the required vmx-parameters and start the VM you will see the disks appear in the list of removable devices:

Inside a guest both vmdks appear like this: (using my Linux-LiveCD with Ubuntu 14)


During my experiments I noticed that the USB-vmdks may appear in a write protected mode.
At the moment I cant claim to have completely understood in which constellation the vmdks are write-protected.
This will require further research ….

Anyway – even at the moment I would call this an extremly useful “new” feature.

1. for all those guys that develope USB-bootable tools

If the USB-vmdks are created with the monolithicFlat VMDK-format the USB-images can be easily transferred to real USB-devices with a simple dd-command.

2. for all users who are looking for a way to assign VMDKS  as “optional”

A USB-vmdk is allowed to be temporarily unavailable !!!
All other options to assign VMDKs will fail if the file is not present.
With USB-vmdks the VM will start even if the file is not available.
This will open new paths to acchieve obscure constellations that were impossible until now.

3. for all users that missed the option to assign single-partition images as a VMDK.

All other options to assign VMDKs usually require a partitioned image including a valid MBR or GPT.


Todo:
check since when this feature exists
check if this works in ESXi
find out when such a VMDK will be flagged as readonly

Ulli