Opened 2 years ago

Closed 3 months ago

#29 closed Problem (Done)

Pull spare disks on kb2018!

Reported by: D Delmar Davis Owned by: Joe Dumoulin
Priority: Priority Milestone: Make Shit Happen / Own Your Shit.
Component: kb2018 (hp) Keywords: #disk
Cc: Joe Dumoulin

Description (last modified by D Delmar Davis)

Either the on board raid controller doesn't like the 1TB sas disk or it is failing.

Slot 0                                       
1719-Slot 0 Drive Array - A controller failure event occurred prior to this 
     power-up.  (Previous lock up code = 0xab)

Usually powercycling the box resolves this issue however it's a frightening PITA wondering at each reboot.

This may be a firmware issue though the low hanging fruit would be to remove the unused disks (bay 7 and 8).

Change History (12)

comment:1 Changed 2 years ago by D Delmar Davis

what? no mail?

comment:2 Changed 2 years ago by D Delmar Davis

Redeployed disk as a zfs pool, however disk does not come up on reboot, taking all containers with it
(zfs fails -> lxd fails -> !@#$!!!!)

Powercycled system and disk is back.

There is definitely something that the controller doesn't seem to like about this SSD.

I am currently just experimenting with the local cloud idea so the disk doesn't need to be replaced immediately however we should look at getting a different disk in there at some point soon. I may have a 1T non SSD around here...

comment:3 Changed 2 years ago by D Delmar Davis

Milestone: Server Modernization Phase IIMake Shit Happen / Own Your Shit.

comment:4 Changed 2 years ago by D Delmar Davis

Joe,

Am shipping 72k sas disk to you.

https://www.ebay.com/itm/362452271567

Will write up mop for transition.

Don

comment:5 Changed 2 years ago by D Delmar Davis

Owner: changed from D Delmar Davis to Joe Dumoulin

Joe,
The disk I ordered was through a company that was effected by Ca. wildfires.
They said they should have it to you by months end.
Can you let me know when it arrives?

D

comment:7 Changed 2 years ago by D Delmar Davis

Cool!
As per our conversation let me know when I can take this on.

comment:8 Changed 21 months ago by Joe Dumoulin

I have two 1 tb disks prepared for kb2018. Which bay should I put them in? They do not have zfs format as of yet.

I have two more 1 tb disks for later for later.

comment:9 Changed 21 months ago by D Delmar Davis

Joe,

Put them in bays 3 and 4 next to the ssd in bay 1. Then I can migrate the pool on that drive so we can pull it. When they are in assign this back to me and I will get the zfs set up.

It will be nice to reboot without having to power cycle the machine.

At some point it would be nice to see if that drive behaves any better in bs2020 but that will require spitting one of its pools up and shuffling things around a bit.

D

Last edited 21 months ago by D Delmar Davis (previous) (diff)

comment:10 Changed 18 months ago by D Delmar Davis

I removed the /fast zpool so it should be easier to remove the ssd.

comment:11 Changed 11 months ago by D Delmar Davis

Description: modified (diff)
Priority: ImportantPriority
Summary: Large disk stopped talking to the raid controller on kb2018Pull spare disks on kb2018!

comment:12 Changed 3 months ago by D Delmar Davis

Resolution: Done
Status: assignedclosed

Added the disk to the the controller and rebooted.

Disk behaved as previously. Was not ready when the raid controller looked for it. Controller downed it.

  • deleted logical disk freeing failed physical disk.
  • pulled defective ssd.
  • rebooted server to verify the problem went away.
  • labled disk as dead.
Note: See TracTickets for help on using tickets.