Updated 12. June.2019
You probably know the error message: operation impossible since file is locked.
If a VMDK is locked you can not start it inside a VM and you can not copy it.
The knowledgebase basically tells you to stop all processes that access the file and reboot.
This is supposed to fix the problem.
This documentation is not good enough.
In VMFS 6 following this advice may not be sufficient to release the lock.
This means that you basically lost the file as you can not read it anymore.
Trying to access it via Linux is also impossible at the moment.
Recent recovery third party tools like UFSexplorer also failed to read the vmdks in such a case.
Here is a recent case for your reference ….
Problem with locked virtual machine after esxi host crash
I have seen a couple of this cases recently so I started to investigate the issue,
Now I have found a way to recover a VMDK that would other wise be a case for Ontrack or VMware Support.
My procedure requires a direct hexedit of the VMFS heartbeat- section so please do not ask for details at the moment.
I will post details once I know this approach is safe.
Anyway I believe this is a bug in the way ESXi 6.5 handles the heartbeat section of a VMFS-volume.
This issue should never affect single host environements.
Feel free to contact me via skype if you run into this yourself.
None of the following commands that are available on ESXi will work once a flat.vmdk is locked:
vmkfstools -i name.vmdk new.vmdk
vmkfstools -p 0 name-flat.vmdk > mapping.txt
hexdump -C name-flat.vmdk | less
dd if=name-flat.vmdk of=new-flat.vmdk bs=1M
starting a VM which uses the locked VMDK will fail
Effectively the VM / VMDK is lost as you can not even read it one more time to copy the data.
Isolate the VMFS-volume so that it is exposed to a single ESXi.
Then follow the troubleshooting steps from the VMware Knowledgebase
If you follow the steps you are supposed to get rid of the stale lock.
Apparently this is no longer a 100% reliable procedure with VMFS 6.
Should I try VOMA ?
At the moment my answer to that question is NO.
We need to know the IP / MAC of the ESXi-host that holds the lock.
To acquire that info good old vmkfstools is enough and that will work without silencing the VMFS-volume first.
In other words: in this case VOMA is most likely a waste of time.
One essential requirement for a Cluster-filesystem is the need to allow access to a VMDK for a single host inside the Cluster and prevent access for all other hosts.
In order to do this in an efficient and fast way every VMFS-volume uses a small section of the VMFS-metadata for so called heartbeats.
For this “heartbeat section” VMFS 6 uses an area inside the hidden volume header system file named .vh.sf
Structure of the .vh.sf
1. a blank area with a size of 2 MB
2. a section with a size of 1 MB starting at offset 0x200000 – magic value viewed with hexdump -C 5e f1 ab 2f.
This section lists volume information such as the “friendly name” of the datastore.
3. heartbeat section : lots of areas with a size of 512 bytes or 4096 bytes using the magic value 01 ef cd ab
VMware changed the size of the heartbeat section sometime between ESXi 6.0 and ESXi 6.5
vh.sf files that use 512 bytes for each heartbeat have a size of 4 MBs – VMFS 5 and early VMFS 6
vh.sf files that use 4096 bytes for each heartbeat have a size of 7 MBs – VMFS 6 created by ESXi 6.5 and later
Check which version is used in your case with
ls .*.sf -lah
This “heartbeat section” can be accessed in 2 ways:
1. dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=1 skip=20 of=heartbeat-section.bin
for the 4MB vh.sf or with
1. dd if=.vh.sf bs=1M count=4 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=4 skip=20 of=heartbeat-section.bin
for the 7 MB vh.sf
Option 1 appears to be the more reliable one as this location is independant on the actual location of the .vh.sf file.
But actually the .vh.sf is rarely fragmented so in allmost all cases both commands should create the same result.
Use a Linux-system or a Windows that has the commandline tool strings(.exe)
Windows versions is available here: https://docs.microsoft.com/en-us/sysinternals/downloads/strings
strings heartbeat-section.bin > strings.txt
In strings.txt you should see the IP-address or MAC-address that you already know from the error-message of the locked file.
If you find no such reference you can stop reading here – I assume you have a different problem.
As far as I know there does not exist any detailed public documentation on the exact syntax of the heartbeat-section in a VMFS 6-partition.
That means that as long I do not definetely know all the fine details I have to take care that my instructions are as failsafe as possible.
When we edit such a critical section of the VMFS-metafiles we should avoid to use hexeditors or other tools that are a risk in the hand of an inexperienced user.
Instead I prefer a way that easily allows to create a backup of the relevant 1MB block first.
dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin # use for a 4 MB vh.sf
dd if=.vh.sf bs=1M count=4 skip=3 of=heartbeat-section.bin # use for a 7 MB vh.sf
Then I inject a clean-heartbeat-section.bin that I created on a newly created and freshly formatted VMFS 6 -volume (created by the same ESXi-build)
This will completely clean all eventually existing stale locks and appears to have the desired effect.
If still something goes wrong you can easily reinject the original section.
To inject a clean heartbeat-section use
dd of=.vh.sf bs=1M count=1 seek=3 if=clean-heartbeat-section.bin conv=notrunc # use for a 4 MB vh.sf
dd of=.vh.sf bs=1M count=4 seek=3 if=clean-heartbeat-section.bin conv=notrunc # use for a 7 MB vh.sf
WARNING: Do not inject anything if you can not isolate the VMFS-volume so that it is connected to a single host only !!!
According to my current experiences this injection will be effective almost immediatly.
If you see no change in the behaviour try a reboot of the ESXi.
Please help …
I defintely need to see more cases of this defect before I consider offering downloads of premade fixed sections.
So if you run into this problem in the near future please contact me.
I will then create a fixed section and help you to safely inject it.
You can hire me on a “per-incident-level” – my help is most useful with recovery-problems.