System getting shutdown every 30-40 mins

Kazamu

New Member
Joined
Jul 16, 2023
Messages
1
Reaction score
0
Credits
14
Hello,

My system getting shutdown every 30-40 mins and while i start up manually it pops up the below error.

Using centos on my machine.
Had this problem with my previous OS as well could not figure out where is the issue exactly.

you help will be appreciated.

[ 0.071412] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ae2000000003110a
[ 0.071412] mce: [Hardware Error]: TSC 0 ADDR ffb05000 MISC 78a0000086
[ 0.071412] mce: [Hardware Error]: PROCESSOR 0:306a9 TIME 1661709811 SOCKET 0 APIC 0 microcode 21
 


This is lifted from the intel support site.

""Thank you for posting on the Intel® communities.
We understand that you are experiencing some error messages on boot with your Intel® NUC10i7FNH using Proxmox*.

I need to inform you that Intel has not tested and validated this Intel NUC model with any Linux distribution; however, we have plenty of positive reports from customers that are using the unit successfully on many different Linux distros.

You can check the Intel Compatibility Tool for the NUC model in case you are interested.

With that being said, for the behavior that you are experiencing I would strongly recommend check your Linux distribution website and forums here for peer assistance with this issue.



Intel Technical Support Technician""
 
Wikipedia describes the machine check exception errors you've provided and gives possible causes that you might investigate further. In another Intel forum thread with similar errors, it is suggested to:

1) Re-install the operating system and check again if the error persists.

2) Try putting the BIOS settings back to default.

At least give your BIOS settings a careful look, if not a full reset to default. Upgrade the BIOS if you can!

After looking at BIOS, I would probably start by taking a real hard look at your RAM. Are they all the same, or mixed and matched? I would test the RAM by removing modules one-at-a-time (or in pairs) and running the system for an hour+ to see if it fails with every memory configuration of moving the modules around. You could try running memtest, but it takes a very long time.

Maybe even before the RAM test, it would be good to unplug all peripheral devices (printers, webcams, USB hubs, etc) to try to be sure there is not an excess power demand being put on your system power supply.

If your motherboard is old, you might give it a careful visual inspection... looking for "swollen" capacitors (bulging tin can types), any other physical defect or evidence of failure (burn marks), or loose connections such as power and data cables, cards are seated properly in slots), and that CPU fan is firmly attached to CPU with thermal paste. A fresh application of thermal paste is a simple step you could try, though it is just guessing without better evidence of CPU overheating.

If CentOS was also your previous OS with the same trouble, I'd certainly try something different if you get so far as to reinstall Linux again, but your errors seem to indicate a hardware problem so you may have trouble with any distro. Good luck!
 
This is lifted from the intel support site
I'm pretty certain the there are more computers that haven't been "Linux certified" than have been, and I'm not certain why you posted that as a response to his problem? No offense intended. ;)

My system getting shutdown every 30-40 mins

Have you checked your CPU temps? It's one of the suggestions that actually makes sense given it takes 30-40 minutes to shutdown. If it is overheating, it doesn't matter what OS you're using, you'll get the same result, be it Windoze or Linux (shutdown by bios due to CPU overheating...thermal protection). At least with Linux you have a log file that you can access .
Here are a couple more resources related to your errors. The first link actually has two suggestions that list overheating as a cause.

https://askubuntu.com/questions/928...ld-i-worry-hardware-error-processor-0406e3-ti

Excerpt from first link:

Several things to try:


https://bbs.archlinux.org/viewtopic.php?id=266210

Excerpt from the second link:

I'm dealing with this same exact issue and trust me when I say it's not "fine" or a bug.. and has a high-risk of leading to more serious issues down the road. This error is typically going to be a bad memory controller/cache on your cpu 75% of the time, bad chipset on your motherboard, bent pins on your cpu socket, or a bad trace on your motherboard.. although on rare occasions other hardware that shorts out the motherboard such as a bad power supply, pci card, or bad caps on the motherboard can cause it also, however unless it's directly linked to the CPU it's extremely rare.).

On a side note, you'll find a plethora of posts about your errors if you simply google your error codes one line at a time.
I hope this helps? ;)
 
Last edited:
Have you checked your CPU temps? It's one of the suggestions that actually makes sense given it takes 30-40 minutes to shutdown. If it is overheating, it doesn't matter what OS you're using, you'll get the same result, be it Windoze or Linux (shutdown by bios due to CPU overheating...thermal protection). At least with Linux you have a log file that you can access .
I agree.
My suggestion.
Open your bios and check the temps and fan speed.
When processors start to reach a higher than normal operating temperature they usually start to throttle down to prevent damage to the processor and eventually will shutdown the system.
A slow running processor fan speed would cause the processor to run hotter than normal and shutdown the system to prevent damage.
Have you ever cleaned the processor fan out with compressed air to remove any dust that may be blocking airflow for cooling.
Over time all computers regardless of type collect dust in the cooling fans which could restrict airflow and cause overheating.
 
I have a laptop that requires extra cooling in really hot weather, is your ambient temperature higher than normal?

I raise the base of my laptop to allow a better air flow, & sometime use an external fan to help keep the processor temperature down.
 
I'm not sure which distro the op is using, But on many you can issue this command in a terminal to get the temps
Code:
inxi -s
Better give us the output of
Code:
inxi -Fxxzr
 

Members online


Latest posts

Top