Opened 2 years ago

Closed 18 months ago

#36 closed Problem (Works For Me)

console redirection has stopped on bs2020

Reported by: D Delmar Davis Owned by: Joe Dumoulin
Priority: Major PITA Milestone: Make Shit Happen / Own Your Shit.
Component: bs2020 (dell) Keywords: Idrac
Cc: Joe Dumoulin

Description

Noticed that one of the recent updates walked on the boot parameters again however I fixed that and its still complaining that it can't connect.

I need to re enable the web interface and see if we can see what is going on.

Change History (14)

comment:1 Changed 2 years ago by D Delmar Davis

/admin1-> console com2
console: failed to read Serial Over LAN Configuration.
/admin1-> 

comment:2 Changed 2 years ago by D Delmar Davis

Owner: changed from D Delmar Davis to Joe Dumoulin

Joe,
Can you at some point take a look at the IDRAC from the physical console and check the ssh/serial console settings?

Let me know and reassign this to me please.

PS Just booted Bernie.

comment:3 Changed 22 months ago by Joe Dumoulin

I was able to get on tho the idrac through the web interface and I went to change my password, but the page began to time out. I want to reboot the iDRAC. this means we need to install racadm as far as I can tell (https://www.dell.com/community/Systems-Management-General/IDRAC-6-reboot/td-p/4698944).

Should I install racadm on bernie? thoughts?

comment:4 Changed 22 months ago by D Delmar Davis

If possible you should avoid using the web interface all together and keep the ports closed.
I had looked into racadm but most of the functionality was possible through ssh.

I hadn't really thought about resetting the idrac as a solution.

Also racadm tends to be used on a computer other than the one you are managing so if you installed it start on kb.

Let me ssh in and see if I can reset it from there.

comment:5 Changed 22 months ago by D Delmar Davis

I can't ssh into it at all now.
It complains about my keys which should be the same.
I also am getting timeouts during the handshake.
I wonder if its getting hammered since its at a well known port.

comment:6 Changed 22 months ago by D Delmar Davis

Also,
If you can ssh to the idrac you type racadm.
no installation necessary.

https://frednotes.wordpress.com/2012/11/13/reset-dell-idrac-using-ssh/

comment:7 Changed 22 months ago by D Delmar Davis

I can't ssh or https to it from kb2020 so....

It is wedged for sure.

And I stand corrected. racadm should be able to connect directly in the same way that ssacli talks to the raid controller.

So yes.

Please install racadm on Bernie if you can.

comment:8 Changed 22 months ago by D Delmar Davis

Did you make any progress on this?

comment:9 Changed 22 months ago by Joe Dumoulin

I've attempted to apt install this a couple of times but I am struggling with problems getting the gpg keys to verify. I have not completed this. specifically, dell has instructions to do this:

gpg --keyserver pool.sks-keyservers.net --recv-key 1285491434D8786F

But going to that key server and attempting to get that key fails.

Still investigating but no joy.

comment:10 Changed 21 months ago by D Delmar Davis

I added the "trusted" workaround to the /etc/apt/sources.d for the repository which allowed me to attempt to install the racadm provided however there were unmet dependancies which were not in the repo. Given that in a few months we will be looking at installing a new LTS (20.04) I say fuck it. It's a rathole. Resetting the idrac may just have to be done at the console itself.

comment:11 Changed 21 months ago by D Delmar Davis

I started down the path of apt-get install --fix-misssing with our trust everything in the dell repo policy and discovered that it was missing sfcb which is available in the "Multiverse" who knows what hell awaits us but the software has in theory been installed.

Will post here with the results of trying it out.

comment:12 Changed 21 months ago by D Delmar Davis

Ok so I did a racadm racreset and waited an ungodly amount of time before rebooting bernie.

Have not been able to connect to the idrac or to bernie since....

comment:13 Changed 21 months ago by D Delmar Davis

Bernie came up and I am able to get some of the dell stuff to talk.

I still can't reset the idrac or get racadm to work locally but I am told that we have a couple of bad memory stick and we still need to replace that fan.

root@bs2020:/etc/logcheck# /opt/dell/srvadmin/sbin/srvadmin-services.sh status
● instsvcdrv.service - Systems Management Device Drivers
   Loaded: loaded (/etc/systemd/system/instsvcdrv.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-03-15 15:40:36 PDT; 49s ago

Mar 15 12:43:12 bs2020 systemd[1]: Starting Systems Management Device Drivers...
Mar 15 12:43:15 bs2020 systemd[1]: Started Systems Management Device Drivers.
Mar 15 15:40:36 bs2020 systemd[1]: Stopping Systems Management Device Drivers...
Mar 15 15:40:36 bs2020 systemd[1]: Stopped Systems Management Device Drivers.
● dsm_sa_datamgrd.service - Systems Management Data Engine
   Loaded: loaded (/etc/systemd/system/dsm_sa_datamgrd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-03-15 15:40:36 PDT; 50s ago
 Main PID: 6141 (code=exited, status=0/SUCCESS)

Mar 15 12:43:15 bs2020 systemd[1]: Starting Systems Management Data Engine...
Mar 15 12:43:26 bs2020 systemd[1]: Started Systems Management Data Engine.
Mar 15 15:40:26 bs2020 systemd[1]: Stopping Systems Management Data Engine...
Mar 15 15:40:36 bs2020 systemd[1]: Stopped Systems Management Data Engine.
● dsm_sa_eventmgrd.service - Systems Management Event Management
   Loaded: loaded (/etc/systemd/system/dsm_sa_eventmgrd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-03-15 15:40:36 PDT; 49s ago
 Main PID: 5886 (code=exited, status=0/SUCCESS)

Mar 15 12:43:26 bs2020 Server_Administrator[5886]: 5886 1306 - Instrumentation Service  Redundancy lost 
                                                   Redundancy unit: System Board Fan Redundancy 
                                                   Chassis location: Main System Chassis 
                                                   Previous redundancy state was: Unknown
Mar 15 12:43:26 bs2020 Server_Administrator[5886]: 5886 1104 - Instrumentation Service  Fan sensor detected a failure value 
                                                   Sensor location: System Board FAN MOD 5B RPM 
                                                   Chassis location: Main System Chassis 
                                                   Previous state was: Unknown 
                                                   Fan sensor value (in RPM): 0
Mar 15 12:43:26 bs2020 Server_Administrator[5886]: 5886 1012 - Instrumentation Service  IPMI status 
                                                   Interface: OS
Mar 15 12:43:26 bs2020 Server_Administrator[5886]: 5886 1001 - Instrumentation Service  Server Administrator startup complete
Mar 15 12:43:26 bs2020 Server_Administrator[5886]: 5886 1008 - Instrumentation Service  Systems Management Data Manager Started
Mar 15 12:46:18 bs2020 Server_Administrator[5886]: 5886 1404 - Instrumentation Service  Memory device status is critical 
                                                   Memory device location: DIMM_B2  
                                                   Possible memory module event cause:Multi bit error encountered
Mar 15 12:46:40 bs2020 Server_Administrator[5886]: 5886 1404 - Instrumentation Service  Memory device status is critical 
                                                   Memory device location: DIMM_B2  
                                                   Possible memory module event cause:Multi bit error encountered
Mar 15 15:40:26 bs2020 Server_Administrator[5886]: 5886 1009 - Instrumentation Service  Systems Management Data Manager Stopped
Mar 15 15:40:36 bs2020 systemd[1]: Stopping Systems Management Event Management...
Mar 15 15:40:36 bs2020 systemd[1]: Stopped Systems Management Event Management.
● dsm_sa_snmpd.service - Systems Management SNMP
   Loaded: loaded (/etc/systemd/system/dsm_sa_snmpd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-03-15 15:40:26 PDT; 59s ago
 Main PID: 8041 (code=exited, status=0/SUCCESS)

Mar 15 12:43:26 bs2020 systemd[1]: Starting Systems Management SNMP...
Mar 15 12:43:29 bs2020 systemd[1]: Started Systems Management SNMP.
Mar 15 15:40:26 bs2020 systemd[1]: Stopping Systems Management SNMP...
Mar 15 15:40:26 bs2020 systemd[1]: Stopped Systems Management SNMP.
● dsm_om_connsvc.service - DSM SA Connection Service
   Loaded: loaded (/etc/systemd/system/dsm_om_connsvc.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2020-03-15 15:40:26 PDT; 1min 0s ago
 Main PID: 5861 (code=exited, status=0/SUCCESS)

Mar 15 12:43:12 bs2020 systemd[1]: Starting DSM SA Connection Service...
Mar 15 12:43:13 bs2020 systemd[1]: Started DSM SA Connection Service.
Mar 15 15:40:26 bs2020 systemd[1]: Stopping DSM SA Connection Service...
Mar 15 15:40:26 bs2020 systemd[1]: dsm_om_connsvc.service: Killing process 5862 (dsm_om_connsvcd) with signal SIGKILL.
Mar 15 15:40:26 bs2020 systemd[1]: Stopped DSM SA Connection Service.
root@bs2020:/etc/logcheck#

comment:14 Changed 18 months ago by D Delmar Davis

Resolution: Works For Me
Status: assignedclosed

Rebooted Bernie for this weeks kernel mods and was able to ssh to the idrac so I reset the idrac and also tested it by hard resetting the system.

Have console now.

Note: See TracTickets for help on using tickets.