Sunday 17 September 2017

Zerto VPG creation failure (vim.fault.CannotCreateFile)

I just solved annoying issue. I'm using Zerto to replicate VM's between sites. Target site has vSAN as storage. Some time ago we had host failure and we replaced host with new one.

When creating new VPG, Zerto throws very generic error:




In tasks I found additional info:

Cannot complete file creation operation.. Fault: Vim25Api.CannotCreateFile.

I started to dig, first steps was to look at /var/log/vpxa.log file. I found this:

2017-09-15T20:27:34.791Z info vpxa[2F85EB70] [Originator@6876 sub=Default opID=326fd094-3b] [VpxLRO] -- ERROR task-internal-108298 -- vpxa -- vpxapi.VpxaService.reserveName: vim.fault.CannotCreateFile:
--> Result:
--> (vim.fault.CannotCreateFile) {
--> faultCause = (vmodl.MethodFault) null,
--> file = "ds:///vmfs/volumes/vsan:a5c518a4ceaa4b9e-8cd24fc5c9c0cad3/44e38a59-5440-a5e4-a8e5-0cc47aa432a8/256_vsanDatastore_vm-393_history_10_134_9_132_log_volume.vmdk",
--> msg = ""
--> }
--> Args:
-->
--> Arg spec:
--> (vpxapi.VmLayoutSpec) {
--> vmLocation = (vpxapi.VmLayoutSpec.Location) null,
--> multipleConfigs = <unset>,
--> basename = "Z-VRA-host2.domain.com",
--> baseStorageProfile = <unset>,
--> disk = (vpxapi.VmLayoutSpec.Location) [
--> (vpxapi.VmLayoutSpec.Location) {
--> url = "ds:///vmfs/volumes/vsan:a5c518a4ceaa4b9e-8cd24fc5c9c0cad3/44e38a59-5440-a5e4-a8e5-0cc47aa432a8/256_vsanDatastore_vm-393_history_10_134_9_132_log _volume.vmdk",
--> key = 16001,
--> sourceUrl = <unset>,
--> urlType = "exactFilePath",
--> storageProfile = <unset>
--> }
--> ],
--> reserveDirOnly = <unset>
--> }
 


Not very helpfull... 'msg' line was empty, so I started to chase my tail to find something. 
 Then I looked into /var/log/hostd.log file and I found this:

2017-09-16T21:44:11.266Z info hostd[69401B70] [Originator@6876 sub=Solo.Vmomi opID=179ca1eb-c-cc73 user=vpxuser:VSPHERE.LOCAL\prod-Zerto-fcbdf1fb-3575-4d99-9fde-0be131222758] Result:
--> (vim.fault.FileAlreadyExists) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = (vmodl.LocalizableMessage) [
-->       (vmodl.LocalizableMessage) {
-->          key = "com.vmware.esx.hostctl.default",
-->          arg = (vmodl.KeyAnyValue) [
-->             (vmodl.KeyAnyValue) {
-->                key = "reason",
-->                value = "Failed to create directory 44e38a59-5440-a5e4-a8e5-0cc47aa432a8 (File Already Exists)"
-->             }
-->          ],
-->          message = <unset>
-->       }
-->    ],
-->    file = "44e38a59-5440-a5e4-a8e5-0cc47aa432a8"
-->    msg = ""
--> }


 Ah, it can't create directory. Quick look to vSAN content and I noticed that this directory exists. But deleting it from Web client failed. So I went back to ESXi:


ls: ./44e38a59-5440-a5e4-a8e5-0cc47aa432a8: No such device or address


Hmm, something's wrong here, can I remove it by force? Of course:

 /usr/lib/vmware/osfs/bin/osfs-rmdir 44e38a59-5440-a5e4-a8e5-0cc47aa432a8 -f

Result:

Deleting directory 44e38a59-5440-a5e4-a8e5-0cc47aa432a8 in container id a5c518a4ceaa4b9e8cd24fc5c9c0cad3 backed by vsan (force=True)

 (yes, I used The Force)

Verified in GUI that catalog was deleted (it was) and tried to create VPG again... Voila, it worked! 

So, leftovers of old host/VRA caused this issue. Interesting, vSphere didn't give any information that there is something wrong. Health service reported everything as green. 

Oh, BTW I passed some time ago VCAP6 design exam and now I'm VCIX :P