moa-vmfs-recovery-remote-support-cd preview


Its time for a new version of the
MOA-VMFS-recovery CD.
ESXi 7 is coming …
Collecting the data for a quick analysis of a VMFS-recovery support case needs to become
way easier as it has been in the past…
 I am almost done with the new build – it has
all the features that I need for my recovery work.What about you ?
Do you need a remote support LiveCD too ?
If you have any suggestions let me know …
Call me via skype “sanbarrow”
moa-live-1 my current MOA-VMFS-recovery CD
was build in 2015 and had no GUI option
the new one will be available in 2 versions:GUI-version with a size of about 1 GB
CLI-version with a size of about 500 MBlast version:
CLI-version with a size of 460 MB
moa-live-2 experimental vmfs6-fuse support
vmfs5-fuse with support for large vmdks
moa-live-3 build for remote-work:

getting help is easy:
Teamviewer and Anydesk are ready to use
when you decide to launch the GUI
with Skype you can easily call me and
discuss your problems

moa-live-4 this is not another LiveCD with
GBs of tools you will never need
during remote-support
it just comes with the tools that I
typically use when I am doing housecalls
moa-live-6 VMplayer is not really needed
– but it makes a fine addon when
bundled with my mini ESXi-LiveCDs
a ESX 3.5 LiveCD VM needs
just 35 MB for the iso-file
moa-live-6 the graphical interface is optional

the LiveCD boots into command-line mode
which is preferable for recovery tasks that
are time-consuming …

to use it without GUI you can use Webmin

root-acount enabled
ssh and webmin access is available right after boot
user: root and moon
pass: sanbarrow
Ubuntu 18.04 Server
LiveCD build with a heavily edited script
based on the good old remastersys-scripts
WARNING: This CD does not come with any magical tools.
VMFS-recovery first of all requires experience …

VMFS-volumes with I/O-errors

Status: work in progress


In worst case an I/O-error on a VMFS-volume will result in an unmountable volume.
In less severe cases the I/O-error affects only a vmware.log or another file that can be recreated easily – such as vmx-files, vmdk-descriptorfiles or vmsd-files.
In a typical case the error affects a flat.vmdk or a delta.vmdk.
The last 2 cases will result in an unstartable VM or in a vmdk that can not be copied, cloned or used in a VM.
Effectively this often means that the content of the VM is lost.

What does the VMware documentation say about this problem ?

Unfortunately I am not aware of any knowledgebase entry or other official resource that gives any useful advice on this topic.
Please correct me if I am wrong.

My 2 cents …

Most admins will first of all consider hardware related issues when they see I/O errors.
In my experience only 1/3 or less of the I/O errors I see on VMFS-volumes really are caused by hardware-problems.
The majority of cases that I have seen are related to what I call “logical VMFS errors”
In this post I will only deal with this “logical VMFS errors”
These errors occur because there is an error in the VMFS-metadata that describe files and directories of the VMFS-volume.
I found ways to work around some of these error.

The procedures I explain in the following may contradict the VMware documentation or Knowledgebase.


First of all lets check how seriously the problem really is.
Serious: there are I/O errors in the area used for the VMFS – that means the I/O-error is located at the start of the VMFS-volume up to about  1200-1500MB
A command like
dd if=/dev/disks/vmfs-partition bs=1M count=1536 of=/tmp/vmfs-header.dump
will fail and report an I/O error. A typical location of the first I/O error is at 29 MB into the VMFS-partition.
This problem typically result in an unmountable volume.

Manageable: the I/O error affects only a single file
A command like
vmkfstools -i name.vmdk clone.vmdk
will fail with an I/O error.
The VM that uses this VMDK may still work but you can not copy or clone it anymore.
It is also possible that the VM will start and report a more specific error.


A flat.vmdk reports an I/O error

Lets assume we already tried
vmkfstools -i /vmfs/volumes/datastoreA/directoryA/name.vmdk  /vmfs/volumes/datastoreB/directoryB/clone.vmdk 
failed with an I/O error.
Next we try
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk
this also fails with an I/O error. We see that the error is at lets say 21 MB. dd reports 20 MB were copied and then it aborted.
Now we can split the dd-command in 2 steps: first one copies the part before the error, second step is to copy the part after the error.
Like this
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk count=20
dd bs=1M if=/vmfs/volumes/datastoreA/directoryA/name.vmdk of=/vmfs/volumes/datastoreB/directoryB/clone.vmdk skip=21 seek=21
If this works the flat.vmdk has only one error and the error-range is 1 MB.
Of course in real life it will not be so easy – we can have several I/O-errors with different error-ranges.
If ESXi had the tool ddrescue it would be easy – but we have to work with dd only.
Next thing to try is a vmkfstools command like
vmkfstools -p 0 /vmfs/volumes/datastoreA/directoryA/name.vmdk > mapping-name.txt
If that works we can use the mapping-name.txt file to create a dd-command that copies the vmdk fragment by fragment.
Unfortunately it is very rare that this command will work at this stage.

When we tried all the buildin ESXi tools and all of them failed the next step is to switch to Linux.
As always I use the MOA LiveCD
After the Linux-system has finished booting and the network is up you can access it via putty or WinSCP (user: root – password: sanbarrow)
create 2 directories:
sudo su
mkdir /esxi
mkdir /vmfs-out

connect to the target ESXi-host in readonly mode
sshfs -o ro root@ip-of-esxi:/ /esxi
connect to the same ESXi-host again in a writeable mode – mounting a single directory is enough.
sshfs  root@ip-of-esxi:/vmfs/volumes/datastoreB/directoryB /vmfs-out
Now we can use ddrescue to have an easy way to skip the error-areas.
ddrescue-syntax: ddrescue <file that needs to be copied> <output-file> <copy-log-file>
Make sure to create a log-file !
ddrescue /esxi/vmfs/volumes/datastoreA/directoryA/name-flat.vmdk /vmfs-out/clone-flat.vmdk /vmfs-out/copy.log
This command will try to copy the source file block by block and it can skip blocks that can not be copied.
The result will still be a damaged flat.vmdk – but this time we will at least get a vmdk that we can work with later.
This workaround will not always work – all I can say is that the attempt to try it is worth the effort.
Keep in mind that the VMDK will probably be a complete loss if you dont try …

There is a Plan C but that exceeds the scope of this blog.
Call me if necessary.


The VMFS-metadata area has an  I/O error

In this case you will not be able to mount the volume at all or you see corrupted directories or a bunch of corrupt vmdks.
Lets assume we already tried the normal procedure to create a VMFS header dump
dd if=/dev/disks/vmfs-partition bs=1M count=1536 of=/tmp/vmfs-header.dump
and the command failed with an I/O error.
Trying to work around the I/O error with several dd-commands is very inconvenient using ESXi only.
So we switch to Linux again.
Use the same sshfs commands to connect to the ESXi host as we used before.
Again I use ddrescue to locate the errors.
To do that I simply run
ddrescue /esxi/dev/disks/vmfs-partition /tmp/vmfs-header.dump /tmp/copy-log
I watch how it goes and abort with CTRL + C once the command copied at least 1300MB.
Inspecting the ddrescue copy log now tells me how many errors there are and how large the error-area actually is.
Now it gets a bit tricky ….
The VMFS-metadata area (the first 1200-1500MB of the VMFS-volume) has some areas that are supposed to be filled with zeroes.
If our error is located in such an area we can fix the VMFS-metadata by simply injecting zeroes.
Errors in the more important cant be fixed that easily.
The basic approach I use here is to try to get a header-dump with an error-area as small as possible.
Once I have such a dump I use it in my lab-environment and try to create dd-commands for the VMs I need to extract.

Please understand that I do not go deeper into this matter at the moment.
If you need to investigate errors like this one bad dd-command can make matters much worse.
If you need help here – call me.






Create a VMFS header dump using Linux


The ESXi-host is active in production.
A VMFS- datastore has a problem like:
– can not be mounted
– a directory or a file is suddenly missing
– a vmdk can not be used because of I/O errors
– other “strange effects”
The datastore can be active and there is no need to shutdown any VMs first.

Prepare the Linux-system

The Linux-system can be launched
– on a physical host
– as a VM on any other ESXi/Workstation-host
– as a VM on the target ESXi-host
–  the moa.iso Linux LiveCD
– network connection to the target ESXi
– if it is a VM assign at least 4 GB RAM,  2 CPUs or better and use a network that can access the IP of the target ESXi-host
– a VMDK is only necessary for extensive analysis

Create the VMFS-meta-data dump

After the Linux-system has finished booting and the network is up you can access it via putty or WinSCP (user: root  –  password: sanbarrow)
create a directory:
sudo su
mkdir /esxi 

connect to the target ESXi-host
sshfs -o ro root@ip-of-esxi:/ /esxi

check the list of devices that are in use by the target ESXi
cd /esxi/dev/disks
ls -lah | grep -v vml
You should now be able to identify the target VMFS-volume
Dump the first 1536 MB of the 3rd partition if the VMFS-volume is part of the ESXi-boot disk
dd if=name-vmfs-partition:3   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
Dump the first 1536 MB of the 1st partition if the VMFS-volume is on a disk of its own.
dd if=name-vmfs-partition:1   bs=1M count=1536 of=/tmp/name-vmfs-dump.1536
To analyse the partitiontable dump the first 1 MB of the disk
dd if=name-vmfs-disk   bs=1M count=1 of=/tmp/name-vmfs.disk

( Edit the lines above and use appropriate file names. )
If that worked you can download the dump-file with WinSCP .
Delete the dump after you downloaded it from /tmp to keep free space in /tmp available.

Locked files with VMFS 6 – updated for ESXi 6.5 and higher

Updated 12. June.2019

You probably know the error message: operation impossible since file is locked.
If a VMDK is locked you can not start it inside a VM and you can not copy it.
The knowledgebase basically tells you to stop all processes that access the file and reboot.
This is supposed to fix the problem.
This documentation is not good enough.
In VMFS 6 following this advice may not be sufficient to release the lock.
This means that you basically lost the file as you can not read it anymore.
Trying to access it via Linux is also impossible at the moment.
Recent recovery third party tools like UFSexplorer also failed to read the vmdks in such a case.
Here is a recent case for your reference ….
Problem with locked virtual machine after esxi host crash
I have seen a couple of this cases recently so I started to investigate the issue,
Now I have found a way to recover a VMDK that would other wise be a case for Ontrack or VMware Support.
My procedure requires a direct hexedit of the VMFS heartbeat- section so please do not ask for details at the moment.
I will post details once I know this approach is safe.
Anyway I believe this is a bug in the way ESXi 6.5 handles the heartbeat section of a VMFS-volume.
This issue should never affect single host environements.
Feel free to contact me via skype if you run into this yourself.


None of the following commands that are available on ESXi will work once a flat.vmdk is locked:
vmkfstools -i  name.vmdk new.vmdk
vmkfstools -p 0 name-flat.vmdk > mapping.txt
hexdump -C name-flat.vmdk | less
dd if=name-flat.vmdk of=new-flat.vmdk bs=1M
starting a VM which uses the locked VMDK will fail


Effectively the VM / VMDK is lost as you can not even read it one more time to copy the data.

Plan A:

Isolate the VMFS-volume so that it is exposed to a single ESXi.
Then follow the troubleshooting steps from the VMware Knowledgebase
If you follow the steps you are supposed to get rid of the stale lock.
Apparently this is no longer a 100% reliable procedure with VMFS 6.

Should I try VOMA ?
At the moment my answer to that question is NO.
We need to know the  IP / MAC of the ESXi-host that holds the lock.
To acquire that info good old vmkfstools is enough and that will work without silencing the VMFS-volume first.
In other words: in this case VOMA is most likely a waste of time.

Plan B:

One essential requirement for a Cluster-filesystem is the need to allow access to a VMDK for a single host inside the Cluster and prevent access for all other hosts.
In order to do this in an efficient and fast way every VMFS-volume uses a small section of the VMFS-metadata for so called heartbeats.
For this “heartbeat section” VMFS 6 uses an area inside the hidden volume header system file named .vh.sf

Structure of the .vh.sf
1. a blank area with a size of 2 MB

2. a section with a size of 1 MB starting at offset 0x200000  – magic value viewed with hexdump -C 5e f1 ab 2f.
This section lists volume information such as the “friendly name” of the datastore.

3. heartbeat section : lots of areas with a size of 512 bytes or 4096 bytes using the magic value 01 ef cd ab
VMware changed the size of the heartbeat section sometime between ESXi 6.0 and ESXi 6.5

vh.sf files that use 512 bytes for each heartbeat have a size of 4 MBs – VMFS 5 and early VMFS 6
vh.sf files that use 4096 bytes for each heartbeat have a size of 7 MBs – VMFS 6 created by ESXi 6.5 and later

Check which version is used in your case with
cd /vmfs/volumes/datastore
ls .*.sf -lah

This “heartbeat section” can be accessed in 2 ways:
1. dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=1 skip=20  of=heartbeat-section.bin
for the 4MB vh.sf or with
1. dd if=.vh.sf bs=1M count=4 skip=3 of=heartbeat-section.bin
2. dd if=vmfs-partition bs=1M count=4 skip=20  of=heartbeat-section.bin
for the 7 MB vh.sf
Option 1 appears to be the more reliable one as this location is independant on the actual location of the .vh.sf file.
But actually the .vh.sf is rarely fragmented so in allmost all cases both commands should create the same result.


Use a Linux-system or a Windows that has the commandline tool strings(.exe)
Windows versions is available here:
strings heartbeat-section.bin > strings.txt
In strings.txt you should see the IP-address or MAC-address that you already know from the error-message of the locked file.
If you find no such reference you can stop reading here – I assume you have a different problem.

Safety first:

As far as I know there does not exist any detailed public  documentation on the exact syntax of the heartbeat-section in a VMFS 6-partition.
That means that as long I do not definetely know all the fine details I have to take care that my instructions are as failsafe as possible.
When we edit such a critical section of the VMFS-metafiles we should avoid to use hexeditors or other tools that are a risk in the hand of an inexperienced user.
Instead I prefer a way that easily allows to create a backup of the relevant 1MB block first.
The command
dd if=.vh.sf bs=1M count=1 skip=3 of=heartbeat-section.bin # use for a 4 MB vh.sf
dd if=.vh.sf bs=1M count=4 skip=3 of=heartbeat-section.bin # use for a 7 MB vh.sf

does that.
Then I inject a clean-heartbeat-section.bin that I created on a newly created and freshly formatted VMFS 6 -volume (created by the same ESXi-build)
This will completely clean all eventually existing stale locks and appears to have the desired effect.
If still something goes wrong you can easily reinject the original section.

To inject a clean heartbeat-section use
dd of=.vh.sf bs=1M count=1 seek=3 if=clean-heartbeat-section.bin conv=notrunc # use for a 4 MB vh.sf
dd of=.vh.sf bs=1M count=4 seek=3 if=clean-heartbeat-section.bin conv=notrunc # use for a 7 MB vh.sf

WARNING: Do not inject anything if you can not isolate the VMFS-volume so that it is connected to a single host only !!!

According to my current experiences this injection will be effective almost immediatly.
If you see no change in the behaviour try a reboot of the ESXi.

Please help …

I defintely need to see more cases of this defect before I consider offering downloads of premade fixed sections.
So if you run into this problem in the near future please contact me.
skype: sanbarrow
I will then create a fixed section and help you to safely inject it.


You can hire me on a “per-incident-level” – my help is most useful with recovery-problems.

Virtual USB-disks

A VMware feature that I have missed since at least 10 years apparantly already exists sincequite a while.
Am I the only one who did not notice this ?
So whats new ?
We are used to the VMware virtual disk format (vmdk-files)
This VMDK-files can be attached to a VM so that the guestOS perceives them as:
– IDE-device
– SCSI-device
– SATA-device
– NVME-device
Until today I was not aware of the fact that there is one more option:
– USB-device
This feature is not exposed in the GUI but using it is quite easy and straight forward.
To define an existing VMDK as USB-device you have to edit the vmx-file.
First of all make sure that you have this line:
ehci.present = “TRUE”
You need this line as a main switch for USB 2 ports.
You should also see a line like
ehci.pciSlotNumber = “35”
Do not edit this line – instead simply delete it if you want to reset the port.
If you assign a “bad” port you will get obscure follow-up problems – so don’t do it.
Now to assign a VMDK as USB-device set this parameters:

ehci:0.present = “TRUE”
ehci:0.deviceType = “disk”
ehci:0.fileName = “usb-vmdk.vmdk”

ehci:1.present = “TRUE”
ehci:1.deviceType = “disk”
ehci:1.fileName = “usb2-vmdk.vmdk”

Using this appears to be possible for more than one VMDK – so it maybe possible that the full range from ehci0 – ehci5 is allowed.
This is just a first guess – I need to do more research here ….

I said that this feature is not exposed in the GUI – that is not entirely correct.
Once you created the required vmx-parameters and start the VM you will see the disks appear in the list of removable devices:

Inside a guest both vmdks appear like this: (using my Linux-LiveCD with Ubuntu 14)

During my experiments I noticed that the USB-vmdks may appear in a write protected mode.
At the moment I cant claim to have completely understood in which constellation the vmdks are write-protected.
This will require further research ….

Anyway – even at the moment I would call this an extremly useful “new” feature.

1. for all those guys that develope USB-bootable tools

If the USB-vmdks are created with the monolithicFlat VMDK-format the USB-images can be easily transferred to real USB-devices with a simple dd-command.

2. for all users who are looking for a way to assign VMDKS  as “optional”

A USB-vmdk is allowed to be temporarily unavailable !!!
All other options to assign VMDKs will fail if the file is not present.
With USB-vmdks the VM will start even if the file is not available.
This will open new paths to acchieve obscure constellations that were impossible until now.

3. for all users that missed the option to assign single-partition images as a VMDK.

All other options to assign VMDKs usually require a partitioned image including a valid MBR or GPT.

check since when this feature exists
check if this works in ESXi
find out when such a VMDK will be flagged as readonly






Create a VMFS-Metadata-dump using an ESXi-Host in production

1. Required: root-access to an ESXi-host via ssh
2. Identify the device that corresponds to the affected datastore:

login with root account
cd /dev/disks
ls -lisa | grep -v vml

In many cases you can identify the correct device by inspecting the referenced filesize – typically several hundred of GBs or several TBs.
If lots of datastores with similar size are in used – use
esxcli-scsidevs -m
for a more detailed description of the available devices.

3. dd command to dump the first 1536 MB of DeviceX into a file
dd if=/dev/disks/Device:1 bs=1M count=1536 of=/tmp/Casename.1536

3a. Very often there is not enough free space available in /tmp
Workaround: dump into an archive:

dd if=/dev/disks/Device:1 bs=1M count=1536 | gzip -c >  /tmp/Casename.1536.gz

3b. if that still does not work use another datastore – BUT never use the affected datastore itself!!!
dd if=/dev/disks/Device:1 bs=1M count=1536 of=/vmfs/volumes/ANOTHER-UNAFFECTED-DATASTORE/Casename.1536

4. connect to ESXiHost via WinSCP
download /tmp/Casename.1536 or /tmp/Casename.1536.gz to your admin-host and compress the file with an effective packer like 7zip or rar.
You should now have an archive that varies in size – typically range is 50 MB – 800 MB
Upload the archive to a freehostser, your webserver, or any other public location with a decent downloadrate.
(skype can be used too – but is a comparably slow option)
When upload is done – provide a downloadlink – typically this also is the perfect time for a short slype-chat

You may want to check wether the dump contains any confidential data that you are not allowed to share.
To evaluate which data is contained in a VMFS header dump  download the tool strings.exe from
after download unzip strungs.exe and copy it to the same path that already has Casename.1536open a cmd-box and execute
strings.exe Casename.1536 > Casename.1536.txt
Search through Casename.1536.txt

In most cases it takes one or two hours to get a solid overview of the prognosis and available recovery options.
There is a Knowledgebase-article that discuss the same topic – see

Latest LiveCD Download

This is the first version of a Commandline-only-LiveCD.

There is quite a large number of good Linux LiveCDs to chose from when you are looking for a toolbox for system-administrators. My personal favorite sure is PartedMagic
Unfortunately there is none that has all the tools that I need for my recovery work
Often I could get away with a recent Ubuntu-LiveCD by adding the essential tools using apt-get.

In most recovery cases fighting to get your regular tools is nothing that you want to happen too often.
Booting a Linux LiveCD inside a remote production-environment by giving instructions via skye or phone can be tricky enough.

So the expected use-case for this LiveCD can be described in this typical dialogue:

CUSTOMER> I have a problem with my VMFS-Volume …
ME> Let me see what I can do for you …
CUSTOMER> What do you need to get started ?
ME> Enable ssh, get putty and winscp , download this ISO-file. When ready give me a Teamviewer-login to your admin-host.

Depending on the type of connection there are different ways to get access to the damaged volume:

– VMFS-volume is stored on a local SCSI or SATA drive
– VMFS-volume is stored on a remote iSCSI-LUN
– VMFS-volume is stored on FibreChannel-LUN
– VMFS-volume is stored in a VMDK-file

Sometimes a Volume has to remain active as it is used for production – sometimes it can be unmounted:

– Volume is active – no exclusive access allowed/possible
– Volume is unmounted – exclusive access allowed/possible

Depending on the local environment different types of hosts can be used for the Recoveryenvironment

– any spare physical machine
– a physical ESXi host
– a VM running on one of the local ESXi-hosts
– a VM running inside Workstation installed on the remote admin-host

In all the cases listed above the Recovery-Environment must be able to maintain reliable direct access to the damaged
VMFS-volume and offer several options to store the recovered VMs.

The term LiveCD suggests that it only can be started from a physical CD-drive or ISO-file.
This release is not limited to CD-drive or ISO-files thats why I prefer to call it “Recovery Environment”

– can boot from CD-drive or ISO
– can boot from USB-flash-drives or USB-disks
– can boot from harddisk (IDE, SATA or SCSI)
– can boot like a regular bootable vmdk – just add a vmdk-descriptor that references the ISO-file

– can be deployed as OVA

– can boot from MBR and UEFI – allows to boot it inside almost all recent VMs on ESXi, WS or Fusion
– needs 64bit support to be enabled inside the VM

My list of required tools – to name just the essentials – looks like this:

– latest version of vmfs-tools
– sshfs
– tools to manage GPT disks
– complete iSCSI-support
– complete NFS-support
– ddrescue, testdisk, photorec and other forensic essentials
– web-interface to make some common tasks easier

The experienced admin may miss the following addons:

– vmware-mount included in the VDDK-package
– esxcli included in the VMware-CLI
– vmrun included in the VMware-VIX package

Unfortunately these packages are not redistributable so I can not include them.
As a workaround I include the
– dependancies for non-distributable VMware-packages
so that the mentioned tools can be installed on the fly when the Recovery-environment is booted with sufficient memory.

Even without vmware-mount I think a collection of tools for the expected scenarios should be able to mount the most frequent types of virtual disks so I added further commandline-tools like guestfish …
This allows stunts that probably should only be attempted by experts like
– mount vmdks – even when the vmdks are locked by ESXi


This collection of tools was created because I need this set of tools for
– recovery of damaged VMFS-volumes
– P2V using the Coldclone-approach

I do not claim that the collection includes every tool that would be useful in the VMware-context.

This collection simply includes 99 % of the tools I use whenever I offer remote-support.
Feel free to use it if you have similar needs.


Try this first – should boot on most systems

Try this version when the EFI-version fails – old hosts may prefer this