Friday 13 July 2018

How to determine physical location of bad VSAN components

Sometimes as a VSAN admin you must locate a bad VSAN component. This is especially true when you stumble upon this infamous "component metadata health" error:

 Using RVC you can get nice text representation of the components:


vsan.health.health_summary .



+-----------------------------------+--------------------------------------+--------+---------------+

  | Host                              | Component                            | Health | Notes         |

  +-----------------------------------+--------------------------------------+--------+---------------+

  | host7.something.com | c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8 | Error  | Invalid state |

  | host7.something.com | 5648245b-b4cb-91b4-c786-0cc47a39c320 | Error  | Invalid state |

  | host1.something.com | d4e3325b-3436-4aa5-6707-0cc47aa4e64e | Error  | Invalid state |

  | host1.something.com | fca2325b-fc57-cd89-4bf4-0cc47aa3cf00 | Error  | Invalid state |

  | host1.something.com | 2c32355b-f810-6782-4c49-0cc47aa4e64e | Error  | Invalid state |

  | host5.something.com | 59f91d5b-9c8f-362d-88e5-0cc47aa3cf00 | Error  | Invalid state |

  | host5.something.com | 380e1b5b-842e-75e9-d305-0cc47aa3cf00 | Error  | Invalid state |

  | host3.something.com | fcb10d5b-2cef-cb46-19a5-0cc47a39bab8 | Error  | Invalid state |

  +-----------------------------------+--------------------------------------+--------+---------------+
 

Having component ID, lets find disk ID:


/localhost/Datacenter/computers/Cluster> vsan.cmmds_find . -u c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8

+---+------+------+-------+--------+---------+

| # | Type | UUID | Owner | Health | Content |

+---+------+------+-------+--------+---------+

+---+------+------+-------+--------+---------+


/localhost/Datacenter/computers/Cluster> vsan.cmmds_find . -u 2c32355b-f810-6782-4c49-0cc47aa4e64e

+---+------+------+-------+--------+---------+

| # | Type | UUID | Owner | Health | Content |

+---+------+------+-------+--------+---------+

+---+------+------+-------+--------+---------+

As you can see, it's empty. Thanks to VMware support I was able to determine where those components are located by using below oneliner on affected host:

for i in $(vsish -e ls /vmkModules/lsom/disks/ | sed 's/.$//'); do echo; echo "Disk:" $i; localcli vsan storage list | grep $i -B 2 | grep Displ | sed 's/   / /'; echo "  Components:"; for c in $(vsish -e ls /vmkModules/lsom/disks/"$i"/recoveredComponents/ 2>/dev/null | grep -v ^626); do vsish -e cat  /vmkModules/lsom/disks/"$i"/recoveredComponents/"$c"info/ 2>/dev/null | grep -E "UUID|state" | grep -v diskUUID; done; done

Remember, it's a oneliner. If its get wrapped, edit it.

Result it gives is this:


Disk: 52fcf2cf-a2b3-765b-16da-6b1fbc17b623

 Display Name: naa.600605b00a63535021aa24b9dbc6fdae

  Components:



Disk: 52e53b16-d317-b32e-88c3-558c05fefec3

 Display Name: naa.600605b00a63535021aa24badbddc9de

  Components:



Disk: 52d3e0e5-e9c2-d6a5-9927-b10685a53dbf

 Display Name: naa.600605b00a63535021aa24c0dc3002a6

  Components:



Disk: 527923ad-74ac-80de-c24d-2204dacb91ee

  Components:



Disk: 52023556-e225-fa66-ac31-bbf91a968dee

 Display Name: naa.600605b00a63535021aa24c9dcc2a011

  Components:

   UUID:5648245b-b4cb-91b4-c786-0cc47a39c320

   state:10



Disk: 52545e5c-9acc-dd84-7351-fe500287cdb4

 Display Name: naa.600605b00a63535021aa24c5dc882959

  Components:



Disk: 52959ffa-298b-4183-c9ca-60265bbf1363

 Display Name: naa.600605b00a63535021aa24bedc154902

  Components:



Disk: 5202f5c4-7538-5964-fd55-975289da4d9b

 Display Name: naa.600605b00a63535021aa24c2dc4db53a

  Components:



Disk: 52772abe-e8b4-ec73-40fe-a75933126534

 Display Name: naa.600605b00a63535021aa24c7dca6d2ff

  Components:



Disk: 52eb25b2-c0b5-d629-ee44-0a3048d22701

 Display Name: naa.600605b00a63535021aa24c3dc6883f7

  Components:

   UUID:c30d1b5b-586b-8b74-b3c6-0cc47aa4b1b8

   state:10

  

Components with state:10 are those problematic.You can also see physical disk NAA ID. From here you can continue normal troubleshooting, removing VSAN disk from disk group in this case.