How to verify if a Raid-rebuild of a VMFS-volume failed – quick test

Scenario:

A VMFS-volume stored on a Raid-array had a failed disk. You replaced the disk and started the Raid-rebuild.
Once the rebuild finished ESXi is not able to mount the volume again.
All attempts to force mount and resignature the volume failed.

Problem:

If you run into a problem like described here it often boils down to a problem of time.
You dont have any reason to assume that your volume is damaged beyond repair so it appears to be reasonable to try one of the following:
– consult VMware support
– consult support of the vendor of your Raid-controller
– try a recovery with Linux vmfs-tools
– scan the volume with Diskinternals VMFS recovery tool
– scan the volume with UFS-explorer
– consult google and try to repair the VMFS
All approaches listed will not give you any results as fast as you need them – you need to decide wether you try to recover or start to rebuild the lost VMs.
So you cant wait days until you get the results of a scan or until you hear from support.
My instructions hopefully help you with this  decision.

Diagnosis:

You need a dump of the VMFS-metadata – see https://vm-sickbay.com/create-a-vmfs-header-dump-using-an-esxi-host-in-production
Install the tool strings on a Windows host – see https://docs.microsoft.com/en-us/sysinternals/downloads/strings
or use a Linux host which already has that tool.
Run the command:
strings dumpfile > strings.txt
Open strings.txt in a texteditor that can handle large files.
Search for “.vh.sf” or just scroll down slowly in the editor until you find a section like this:
.fbb.sf
.fdc.sf
.pbc.sf
.sbc.sf
.vh.sf
.pb2.sf
.sdd.sf
.jbc.sf

This plain text area inside the vmfs-metadata is used to reference filenames to inode-numbers.
The reference to .jbc.sf may not be present in your case as it is used by VMFS 6 only.

Conclusion:

Case 1:

There is no such section at all inside the first 1536 MB of the volume.
Bad news: this VMFS-volume is seriously damaged – there is no hope to read the volume with Linux vmfs-fuse or commercial VMFS-recovery-tools.

Case 2:

The section appears once.
Good news: this volume may be no longer mountable with ESXi but the data may be recoverable with Linux vmfs-fuse or commercial VMFS-recovery-tools.
Feel free to consult me for further assistance …

Case 3.

The section appears more than once and it may look like this:


dmdf
dmdf
dmdf
.fbb.sf
.fdc.sf
.pbc.sf
.sbc.sf
.vh.sf
.pb2.sf
.sdd.sf
.jbc.sf
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
.fbb.sf
.fdc.sf
.pbc.sf
.sbc.sf
.vh.sf
.pb2.sf
.sdd.sf
.jbc.sf
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs
directory-names-of-your VMs


Bad news: your Raid-rebuild failed.
There is no reason to hope that you will be able to recover the data without the help of support staff of the vendor of the Raid-controller or a recovery company.

The basic approach will probably be:
– create dd-images of each single disk
– try to rebuild a virtual raid using those disk images.
I highly recommend to consult support from your vendor.


Ulli Hankeln