Showing posts with label ESXi. Show all posts
Showing posts with label ESXi. Show all posts

Friday, 13 July 2018

How to determine physical location of bad VSAN components

Sometimes as a VSAN admin you must locate a bad VSAN component. This is especially true when you stumble upon this infamous "component metadata health" error:

 Using RVC you can get nice text representation of the components:


vsan.health.health_summary .



+-----------------------------------+--------------------------------------+--------+---------------+

  | Host                              | Component                            | Health | Notes         |

  +-----------------------------------+--------------------------------------+--------+---------------+

  | host7.something.com | c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8 | Error  | Invalid state |

  | host7.something.com | 5648245b-b4cb-91b4-c786-0cc47a39c320 | Error  | Invalid state |

  | host1.something.com | d4e3325b-3436-4aa5-6707-0cc47aa4e64e | Error  | Invalid state |

  | host1.something.com | fca2325b-fc57-cd89-4bf4-0cc47aa3cf00 | Error  | Invalid state |

  | host1.something.com | 2c32355b-f810-6782-4c49-0cc47aa4e64e | Error  | Invalid state |

  | host5.something.com | 59f91d5b-9c8f-362d-88e5-0cc47aa3cf00 | Error  | Invalid state |

  | host5.something.com | 380e1b5b-842e-75e9-d305-0cc47aa3cf00 | Error  | Invalid state |

  | host3.something.com | fcb10d5b-2cef-cb46-19a5-0cc47a39bab8 | Error  | Invalid state |

  +-----------------------------------+--------------------------------------+--------+---------------+
 

Having component ID, lets find disk ID:


/localhost/Datacenter/computers/Cluster> vsan.cmmds_find . -u c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8

+---+------+------+-------+--------+---------+

| # | Type | UUID | Owner | Health | Content |

+---+------+------+-------+--------+---------+

+---+------+------+-------+--------+---------+


/localhost/Datacenter/computers/Cluster> vsan.cmmds_find . -u 2c32355b-f810-6782-4c49-0cc47aa4e64e

+---+------+------+-------+--------+---------+

| # | Type | UUID | Owner | Health | Content |

+---+------+------+-------+--------+---------+

+---+------+------+-------+--------+---------+

As you can see, it's empty. Thanks to VMware support I was able to determine where those components are located by using below oneliner on affected host:

for i in $(vsish -e ls /vmkModules/lsom/disks/ | sed 's/.$//'); do echo; echo "Disk:" $i; localcli vsan storage list | grep $i -B 2 | grep Displ | sed 's/   / /'; echo "  Components:"; for c in $(vsish -e ls /vmkModules/lsom/disks/"$i"/recoveredComponents/ 2>/dev/null | grep -v ^626); do vsish -e cat  /vmkModules/lsom/disks/"$i"/recoveredComponents/"$c"info/ 2>/dev/null | grep -E "UUID|state" | grep -v diskUUID; done; done

Remember, it's a oneliner. If its get wrapped, edit it.

Result it gives is this:


Disk: 52fcf2cf-a2b3-765b-16da-6b1fbc17b623

 Display Name: naa.600605b00a63535021aa24b9dbc6fdae

  Components:



Disk: 52e53b16-d317-b32e-88c3-558c05fefec3

 Display Name: naa.600605b00a63535021aa24badbddc9de

  Components:



Disk: 52d3e0e5-e9c2-d6a5-9927-b10685a53dbf

 Display Name: naa.600605b00a63535021aa24c0dc3002a6

  Components:



Disk: 527923ad-74ac-80de-c24d-2204dacb91ee

  Components:



Disk: 52023556-e225-fa66-ac31-bbf91a968dee

 Display Name: naa.600605b00a63535021aa24c9dcc2a011

  Components:

   UUID:5648245b-b4cb-91b4-c786-0cc47a39c320

   state:10



Disk: 52545e5c-9acc-dd84-7351-fe500287cdb4

 Display Name: naa.600605b00a63535021aa24c5dc882959

  Components:



Disk: 52959ffa-298b-4183-c9ca-60265bbf1363

 Display Name: naa.600605b00a63535021aa24bedc154902

  Components:



Disk: 5202f5c4-7538-5964-fd55-975289da4d9b

 Display Name: naa.600605b00a63535021aa24c2dc4db53a

  Components:



Disk: 52772abe-e8b4-ec73-40fe-a75933126534

 Display Name: naa.600605b00a63535021aa24c7dca6d2ff

  Components:



Disk: 52eb25b2-c0b5-d629-ee44-0a3048d22701

 Display Name: naa.600605b00a63535021aa24c3dc6883f7

  Components:

   UUID:c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8

   state:10

  

Components with state:10 are those problematic.You can also see physical disk NAA ID. From here you can continue normal troubleshooting, removing VSAN disk from disk group in this case.


 

Thursday, 23 July 2015

Deploying multiple VMs fails with error "Error caused by file /vmfs/volumes/datastore_id/path_to/source.vmdk" on ESXi 5.5 u2

I've been puzzled over this issue for quite a long time.

Customer has some cloud solution on top of vSphere 5.5 u2 and deploying/cloning multiple VMs from template/VM fails with an error:

"Error caused by file /vmfs/volumes/datastore_id/path_to/source.vmdk".

This errors shown in vSphere client during "Apply Storage DRS recommendation".

I was able to replicate this behaviour with simple PowerCli script:

$vmquantity = 5
$template = Get-vm -Name "template_vm"
$dsclu = Get-DatastoreCluster -Name DS_clu1_foo
$clu = Get-Cluster -Name Clu1_foo
$vmlocation = Get-Folder -Name Folder_foo


1..$vmquantity |ForEach-Object {
$vmname = "testVM$_"
New-VM -ResourcePool $clu -Name $vmname -Datastore $dsclu -Location $vmlocation -VM $template -RunAsync 
}

Key here is "RunAsync" parameter, only then problem occur.

I did a lot of unnecessary steps, like disabling VAAI, changing datastore and storage controller queues, without success.

Simple solution resolved this issue: deleting ctk.vmdk file of source VM/template fixed the problem. Also, make sure that .vmx or .vmtx files doesn't have any reference to deleted files.

There is some VMware KB with this solution but it is not related to the error I was getting.VMware should update their KB's because solution was hard to find.

Update:

I noticed that backups also are affected by this (at least TSM backup can be), that is snapshots are not created and as a result backup fails. Deleting ctk file can help here also.