• We did not send an email asking for donations - please read this post.

Sun Ultra 24 - Freeze on shutdown problem

Groucho

New Member
Joined
Mar 7, 2019
Messages
3
Reaction score
2
Credits
0
Hello:

I'm trying to throubleshoot an annoying shutdown problem with my Sun Ultra 24 Workstation running under Devuan ASCII.

Code:
[email protected]:~$ inxi -b
System:    Host: devuan Kernel: 4.9.0-8-amd64 x86_64 (64 bit) Desktop: Xfce 4.12.3
Distro: Devuan GNU/Linux ascii
Machine: Device: portable System: Sun Microsystems product: Ultra 24 v: 0.00.01
Mobo: Sun Microsystems model: Ultra 24 v: 50 BIOS: American Megatrends v: 1.56 date: 01/21/2011
--- snip ---
[email protected]:~$

Obviously it's not a portable system.
It's just that this BIOS file was published post Jan 2010, date of Sun's demise.

Code:
[email protected]:~$ uname -a
Linux devuan 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
[email protected]:~$

This seems to be a distribution agnostic problem. It also happens on the same rig with an emergency TCore Linux I have on a memory stick accessible through F8 at boot-time.

I don't know if it happens in MSOS installations as I don't have one, just a VM running XP for testing Win stuff.

The issue is basically this:

On shutdown, the machine will do one of two things:

1. shut down properly
2. freeze during the shutdown at this point ...

Code:
e1000e: EEE Tx LPI Timer
Preparing to enter sleep state S5
Reboot: Power Down

... with the fans blowing at full speed.

Originally this was a two-part problem: the first part was a reboot on shutdown issue but (apparently) that got fixed by disabling WoL and it has not happened again.

The second part ocurrs (like the first part) in a totally unpredictable manner and I have not been able to reproduce it or link it to anything in particular.

Besides disabling WoL (a hassle of sorts as it cannot be done via BIOS) I also disabled the Intel e1000e controller's EEE settings but to no avail.

Unloading the e1000e driver module with a script at shutdown or inserting a variety of reboot= stanzas in the kernel command line have not worked either. ie: reboot=force, reboot=acpi, reboot=BIOS, etc.

To try to get a glimpse of what was going on, I decided to shut down the rig using a script that would (hopefully) isolate each of the stages of the shut down process and (maybe) give me some feedback at the terminal, much like what I did in my MS-DOS days by running config.sys and autoexec.bat in a step-by-step manner to weed out start-up issues:

Code:
#!/bin/sh
# Shut down system without the use of shutdownhelper
#
PATH=/sbin:/bin:/usr/sbin:/usr/bin:
for i in s u s o; do echo $i | sudo tee /proc/sysrq-trigger; sleep 2; done  # halt

But no, after a number of shutdowns it eventually occurs again and this is what I see on screen:

Code:
s
u
sudo: unable to open log file: /var/log/sudo.log: read only file system
s
sudo: unable to open log file: /var/log/sudo.log: read only file system

... ... with the fans blowing at full speed.

Is there any other way to inspect the shutdown process to troubleshoot this further?

Thanks in advance,

G.
 


wizardfromoz

Administrator
Staff member
Gold Supporter
Joined
Apr 30, 2017
Messages
7,779
Reaction score
6,605
Credits
28,603
Harpo, Chico, Gummo or Zeppo might have an answer, but if Harpo know, he probably wouldn't say anything :rolleyes:

OK, I probably could have helped that ... but naahh, I'm only 2 years your junior, and I'm Aussie :D

G'day @Groucho and welcome to linux.org.

Mate @JasKinasis is more likely to be helpful with the scripting, and by mentioning him, I have pinged him. I can put this Thread in Command Line if you wish.

Other than that, define

Is there any other way to inspect the shutdown process to troubleshoot this further?

... a little more, so we know what you have tried.

For a visual inspection, you could use "noquiet nosplash" in your /etc/default/grub , to see errors or warnings, but if you have 8GB or more RAM, that can scroll by pretty quickly.

Something like

Code:
grep -i shutdown /var/log/syslog

... will generate a lot of output, but can be filtered.

Sing out if you want to be "Moved" (it doesn't hurt)

Chris Turner
wizardfromoz
 
OP
Groucho

Groucho

New Member
Joined
Mar 7, 2019
Messages
3
Reaction score
2
Credits
0
Hello:
Harpo, Chico, Gummo ...
Nah, too unreliable.
Always running after some broad.

G'day @Groucho and welcome to linux.org.
Thank you.

@JasKinasis ... likely to be helpful with the scripting ...
Let's see if he pops in.

But although a hand with the scripting will undoubtedly be a plus, I don't think there's much to be done with the script I'm using.

It's basically a scripted use of the Magic SysRq key and has helped me nail the problem as ocurring at the 'shutdown' stage.

ie: s (sync) and u (unmount and re-mount ro), which are part of the typical shut down process (apparently) work properly.

There's something evidently happening at o (shut off system), before 'halt'.

Unfortunately, as it has eluded repeatability, I have to wait for it to happen again.
I have to catch it in the act, so to speak.

When the freeze happens the system will write a long series of ascii "non-text" codes to the pertinent log files, specifically 0xx (string terminating character), which seems to be the standard behaviour with ungracefull halts such as the one caused by the freeze.

This screws up the log files and the usual text editors will show just up to that point in the file (Leafpad) of directly refuse to open the logfile (Pluma). You have to openn it either with a hex editor or with MC, which will actually show you everything and anything there is to see in a file.

---
Nice to see that the old and trusty NC 5.0 lives on in Linux.
Probably the best from those guys. =-)
---

So what is needed now is a way to look into that part of the shutdown process with a bit more granularity and see what brings about the freeze.

Then we may be able to reliably reproduce it.

It may be caused by the Ultra 24's ugly BIOS or maybe it's an obscure bug in the kernel, of the type that has gone by unseen or been neglected by the maintainers because it did not affect a sufficient number of installations, has not been reported, assigned and was dropped or simply bothered with because support ended with that whatever.

I've seen examples of bugs being assigned (eg: LibreOffice), dropped by the assignee due to the lack of available time, passed on to someone else who did not take it up and then gone unassigned for years.

... use "noquiet nosplash" ...
I've always eliminated 'quiet' and 'splash' from all Linux installations and insisted on a colour output in the display at boot. PCLinuxOS did not have it so I eventually moved to Devuan.

A fast boot is not something I am in dire need of.

Like you say, they scoll by quickly but the colour code helps see if anything is amiss but I've already been through all the logs in /var/log.

Thank you for your input.
Regards to D.

G.
 
Last edited:

JasKinasis

Well-Known Member
Joined
Apr 25, 2017
Messages
1,562
Reaction score
2,224
Credits
11,517
Well from looking at the output - the error messages are unsuprising.

The u unmounts all file-systems and remounts them as read-only.
So after the sysrq-trigger has been called using the u parameter, the sudo command will be unable to write to sudo.log

So those error messages are nothing to worry about IMO. I'd say they are expected.

However - after unmounting the file-systems, why are you trying to sync them again?

I wonder if the second sync is the cause of the problem. Because the filesystems are unmounted, the sync will not work.

o is the one that shuts the system down, but I don't think your o is getting called because the second sync is failing.

So perhaps you need to be using s,u,o instead of s,u,s,o. But hold off on that for a second.....

I know that in an emergency, on a QUERTY keyboard you can hold down alt+SysRq (which is usually also the PrtScn key) and then use the combo r,e,i,s,u,b to safely reboot your system. The alt+prtScn keybind passes whatever keys are pressed to sysrq-trigger.

What those options do is:
r - takes the keyboard out of raw mode
e - sends the SIGTERM signal to all processes except init
i - sends the SIGKILL signal to all processes except init - forcibly shutting down any that did not respond to the SIGTERM signal.
s - syncs all pending data to disc
u - unmounts all file-systems and remounts them as read-only
b - reboots the system

So, in the context of shutting your PC down safely in your script - perhaps you aught to try sending the sequence r,e,i,s,u,o to sysrq-trigger instead.

NOTE: Those are only for a QUERTY keyboard layout. They are different for AZERTY, COLEMAC and DVORAK

For anybody who is interested in this topic - there is a wiki page that lists all of the keybinds available via https://en.wikipedia.org/wiki/Magic_SysRq_key
 
OP
Groucho

Groucho

New Member
Joined
Mar 7, 2019
Messages
3
Reaction score
2
Credits
0
Hello:

...error messages are unsuprising.
Yes, but the problem is not with the error message (sudo.log being RO)
As you point out, it's expected.

... why are you trying to sync them again?
Good question ... =-/

You're quite right.

But the shutdown problem also occurred without the second sync and the machine shuts down even with the second sync in the script.

I had added the second sync as a test (after getting a freeze with the original script), without taking into account that a fs remounted as R/O could not sync. (!)

Note that there's nothing with respect to the second sync in the /var/log/messages log file. Same with syslog and kern.log.

A snip from var/log/messages

Code:
Mar  8 09:37:16 devuan kernel: [ 8831.030260] sysrq: SysRq : Emergency Sync
Mar  8 09:37:16 devuan kernel: [ 8831.051494] Emergency Sync complete
Mar  8 09:37:18 devuan kernel: [ 8833.038247] sysrq: SysRq : Emergency Remount R/O
Mar  8 09:37:18 devuan kernel: [ 8833.069992] EXT4-fs (sdb1): re-mounted. Opts: (null)
Mar  8 09:37:18 devuan kernel: [ 8833.139131] EXT4-fs (sdb6): re-mounted. Opts: (null)

I've edited it out:

Code:
#!/bin/sh
# Shutdown system without the use of shutdownhelper
#
PATH=/sbin:/bin:/usr/sbin:/usr/bin:
for i in s u o; do echo $i | sudo tee /proc/sysrq-trigger; sleep 2; done  # halt

Thanks for the heads-up, just slipped my mind.

... don't think your o is getting called because the second sync is failing.
No, the script works with or without the second sync.
o gets called and the rig shuts down.

... try sending the sequence r,e,i,s,u,o to sysrq-trigger instead.

It seems the system shuts down safely with just s, u and o.

Adding r, e and i will not allow me to use a script in a terminal, which I do so as to make the shutdown a simple and straightforward thing to do and see what is going on at the same time.

Thanks for your input.

G.
 

wizardfromoz

Administrator
Staff member
Gold Supporter
Joined
Apr 30, 2017
Messages
7,779
Reaction score
6,605
Credits
28,603
My bad - I'll clean up my act (yeah, right)

But actually, I am glad I brought the pair of them together, Groucho and Jas ... I just sit back and try to digest WTF is going on ;) (and learn something)
 
$100 Digital Ocean Credit
Get a free VM to test out Linux!

Linux.org Hosting Donations
Consider making a donation

Members online


Top