Thursday 9 October 2014

IBM BladeCenter Qlogic HBA and Brocade modules quirks

This blog post is more for my future reference, but if anyone find it helpfull, cool then :)

Recently, I came across old IBM BladeCenter H chassis and new HS23 blades and IBM Storvize v7000 storage.
Chassis had Brocade 8gig SAN 20 port modules installed. All with old, 2010 firmware....

OK, time for some firmware update joy:
- BladeCenter H updated to BPET66K -good
- SAN switches updated to 7.0.2d6 OK (latest available/supported on IBM FixCentral?) -looks good!
- LAN switches updated to 5.3.5 -fine
- blades updated to TKE140YUS (FW/BIOS), DSYTD8G (Diagnostics), 1AOO64N (mgmt processor) -nice!
- v7000 updated nicely to latest 7.3.06 - great!

Additionally, IBM UpdateExpress DVD updated other parts of blades (mentioned HBA's and other stuff). - fine for me.

I THOUGHT IT DID...

Anyway, work continues - connected the cables, everything should work. But it doesn't. I don't have no light between SAN switch and v7000.

What?

Ok, typical troubleshooting:

Cables are fine.

Zoning is not configured yet, but upstream to other switches works great.
External ports are enabled in AMM.
I did other crazy stuff, disabled ports, switches, changing port speed, etc - still to no avail.

But then my guru said: check fillword parameter. Et voila, changing fillword parameter from 1 to 3 worked like a charm :)

Here's IBM link.

I thought all my problems went away, God, I was so wrong...

Zoning went fine, host were added to v7000, all should be good. But it came out that 2 of 3 of Windows Server 2012 R2 blades reported "Loop down" on one of ports. Second port was fine, so LUNs were accessible, but no redundancy. Obviously, v7000 reported those ports as "inactive".

So digging started:
- ports status on switches? Check!
- cables? Check!
- latest drivers? Check!
- latest firmware? Che... wait, something wrong is going on here - it seems that 2 of those servers have different FW versions, despite running successfully exactly the same UpdateExpress DVD!



Working server had following versions:
Running FW ver: 7.01.00
FW preload table: 2.4.0
FW Serdes Table: 1.0
BIOS ver: 2.14
FCODE ver: 3.20
EFI ver: 2.51

After I updated servers with correct versions (by using single file, qlgc_fw_fc_8g-f70100-b214-e251_windows_32-64.exe, which btw was on UpdateExpress DVD, all problems went away.
(One more reset of switch ports helped a little :) )

Oh, and SDDDSM_x64_2435-3_140627 is installed also.

TL;DR: pay attention to fillword parameters if you are implementing IBM Storvize v7000. Double check installed firmware versions!

uh, it took me too much time.. 
 



Wednesday 23 April 2014

Paged and nonpaged pool memory leaks... caused by zombies!

Ugh, it's been over a year since I posted here... hope I'll make updates more frequently...

Anyway, recently I struggled with pinpointing cause of memory leak on some Windows Serve r2008 R2.

This leak was quite rare, because it wasn't caused by working applications, instead, due to some driver bug, system wasn't able to release all memory from closed proccesses (20K to be precise) and this eventually caused all physical memory to fill up... and crash the server.


It looked like this in RAMmap:





Thanks to this post I was able to find problematic driver. And it was indeed, Alladin HASP USB dongle driver.

So, lesson learned: if you think you have memory leaks, go check it with RAMmap also. Proccess explorer or similar tools can show you leaking active app.

And just for reference: fltmc command can help find active filter drivers which may cause memory leaks.