high cpu load

zapeador

Member
Joined
Jan 15, 2022
Messages
44
Reaction score
13
Credits
401
hello
I have a machine that sometimes puts the cpu at 100% and stops the web application, including console access,

Do you recommend an application that collects cpu statistics (more if that's better) to know which process is the one that consumes so much cpu during those periods of time
 


htop
btop
 
The iostat command from the sysstat package provides clear output on "user", "system" and "idle" percentages of use of the CPU. For multiple CPU systems, it averages the use.
If you wish to know which programs are using the CPU most, instead of hunting through htop output you could run something like:
Code:
ps axch -o cmd,%cpu --sort=-%cpu | head
and it'll list the top ten programs using the CPU and the percentages of CPU they are using.

If you want to collect stats in successive intervals over time, like every 20 seconds etc, iostat can do that. For the ps command above, you could write a script with a while loop and sleep command to create the interval between outputs.
 
Last edited by a moderator:
I would keep an eye on memory too. If memory runs short a machine can begin to thrash and the symptom is complete unresponsiveness.

If an out of memory problem doesn't resolve itself, the Out Of Memory killer may act and terminate processes until the problem is resolved, Any OOM action will have been logged in the journal (journalctl as root).

htop can sort by memory use. More basic commands would be free, vmstat, and ps sorted by RSS. Vmstat can be set to repeat:
Code:
# Simple one shot
free

# Repeat evert five seconds
vmstat 5

# top 10 processes by RSS:
ps aux --sort -rss | head -10

# Or repeat the ps indefinitely displaying it in a top/htop line fashion:
watch ps aux --sort -rss
 
@OP -- how much RAM is in your machine and how large is your /swap partition?
FYI -- Swappiness
 
the solutions you give are fine but at the time of maximum load I cannot access the console and I need to know which one or which are the processes that load the cpu like this to see if it is something normal or I have to solve something
 
zapeador wrote:
the solutions you give are fine but at the time of maximum load I cannot access the console and I need to know which one or which are the processes that load the cpu like this to see if it is something normal or I have to solve something
In that case you need a command such as the the ps ones suggested in post #3 or #4 to run from boot up to the time of "maximum load" writing to a log file so that when you get to the point of "maximum load", you can reboot and inspect the log.

A script such as the following should do it. You could write it into a file called: cpuLogger
Code:
#!/bin/sh
 while :
        do
             ps axch -o cmd,%cpu --sort=-%cpu | head
             echo "---------------------"
               sleep 30s
        done >> /home/<your_username>/cpuLogFile

This script will write the percentage of cpu usage of the top ten processes into the file: cpuLogFile in your home directory once every 30 seconds. You can of course use any time interval that you think best. Replace <your_username> with your user name.

Then you need to give the script file execute permissions:
Code:
chmod 775 cpuLogger

You could run it thus from the directory in which it resides:
Code:
./cupLogger &

The "&" at the end will run the script in the background.

If you want to stop the script you will need to use the kill command. When you start cpuLogger it will output a process ID number on the screen like this:
Code:
[ben@owl ~]$ ./cpuLogger &
[1] 21577
To stop the script from running, run: kill 21577.

You can always find the process ID by asking ps thus:
Code:
[ben@owl ~]$ ps aux |grep cpuLogger
ben        21577  0.0  0.0   2536  1520 pts/8    S    21:47   0:00 /bin/sh ./cpuLogger
ben        21733  0.0  0.0   6288  2168 pts/8    S+   21:49   0:00 grep --color=auto cpuLogger

But you may not need to stop the script because you'd want to know what the output is when the machine gets to maximum load, so you could let it run to the point where the machine freezes up. You could of course inspect the file at any point before that point in order to get an idea of what is happening. If it runs to the end, then reboot and inspect the file.

You could also do this whole investigation looking at the memory instead of cpu by replacing the cpu element in the primary ps command to mem, like this:
Code:
ps axch -o cmd,%mem --sort=-%mem | head
And, you would best change the script file and log file names to something like memLogger and memLogFile to keep cpu and mem outputs separate.
 
Last edited by a moderator:
zapeador wrote:

In that case you need a command such as the the ps ones suggested in post #3 or #4 to run from boot up to the time of "maximum load" writing to a log file so that when you get to the point of "maximum load", you can reboot and inspect the log.

A script such as the following should do it. You could write it into a file called: cpuLogger
Code:
#!/bin/sh
 while :
        do
             ps axch -o cmd,%cpu --sort=-%cpu | head
             echo "---------------------"
               sleep 30s
        done >> /home/<your_username>/cpuLogFile

This script will write the percentage of cpu usage of the top ten processes into the file: cpuLogFile in your home directory once every 30 seconds. You can of course use any time interval that you think best. Replace <your_username> with your user name.

Then you need to give the script file execute permissions:
Code:
chmod 775 cpuLogger

You could run it thus from the directory in which it resides:
Code:
./cupLogger &

The "&" at the end will run the script in the background.

If you want to stop the script you will need to use the kill command. When you start cpuLogger it will output a process ID number on the screen like this:
Code:
[ben@owl ~]$ ./cpuLogger &
[1] 21577
To stop the script from running, run: kill 21577.

You can always find the process ID by asking ps thus:
Code:
[ben@owl ~]$ ps aux |grep cpuLogger
ben        21577  0.0  0.0   2536  1520 pts/8    S    21:47   0:00 /bin/sh ./cpuLogger
ben        21733  0.0  0.0   6288  2168 pts/8    S+   21:49   0:00 grep --color=auto cpuLogger

But you may not need to stop the script because you'd want to know what the output is when the machine gets to maximum load, so you could let it run to the point where the machine freezes up. You could of course inspect the file at any point before that point in order to get an idea of what is happening. If it runs to the end, then reboot and inspect the file.

You could also do this whole investigation looking at the memory instead of cpu by replacing the cpu element in the primary ps command to mem, like this:
Code:
ps axch -o cmd,%mem --sort=-%mem | head
And, you would best change the script file and log file names to something like memLogger and memLogFile to keep cpu and mem outputs separate.


What you send me is really very good, but something must be wrong because it doesn't show me two, in a very rough plan to test it I put it every second and if it is in this capture at 16:12:23 the mysql process occupied a 23% cpu

Screenshot_104.png


and as you can see in the log several seconds before and after it gives me 0.0

--------------------- 16:12-19
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-20
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-21
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-22
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
---------------------
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-23
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-24
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-25
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-26
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-27
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-28
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-29
systemd 0.0
systemd-journal 0.0
systemd-logind 0.0
rsyslogd 0.0
dbus-daemon 0.0
cron 0.0
php 0.0
php-fpm7.3 0.0
snmpd 0.0
agetty 0.0
--------------------- 16:12-30
 
FWIW (and not necessarily helpful), that's a whole lot of CPU usage for MySQL. That's usually an I/O issue and not a high CPU issue. I never see the CPU usage that high for any database tasks, even when writing a great deal of data.

I have no further assistance. I only have that observation.
 
the solutions you give are fine but at the time of maximum load I cannot access the console and I need to know which one or which are the processes that load the cpu like this to see if it is something normal or I have to solve something
You don't need to be constrained to do exactly what has been suggested, you can improvise. As others have suggested, it's easy enough to periodically gather such stats. If you want to do more formal permanent resource monitoring, you could install sar (just google linux sar).

I mentioned that you may find useful clues have been logged? This especially applies to out of memory events. Presumably your system would be using journald logging, for example:

Code:
sudo journalctl --since "2022-02-20 10:30" --until "2022-02-20 11:30"

This example assumes you have a persistent journal (storage=auto in /etc/systemd/journald.conf). If you don't you can only query periods in the current boot, but that may be all you need.

If journald is not being used, then /var/log/messages might be where a lot of logging is going.

There may be other logs worth checking, but that depends on what the server is running.
 
zapeador wrote:
the mysql process occupied a 23% cpu
That certainly is a large usage of the cpu, but from your finding, it was less than a second, so one wouldn't suppose that it was an event that would "stop" everything to the point that you couldn't get to a console. It still seems to be a conundrum. You may need to run the computer until it actually blocks up and then see what has accumulated to do that. Momentary 100% cpu loads are not in themselves a problem. For example, when I begin to run htop in a terminal, the cpu reading is 100% just momentarily, which is visible across the top of the terminal screen in red lines. I expect that's because it's part of the process of gathering all the data for a moment to print below in its output. The problem of a cpu that blocks up and stops usage is altogether of a different magnitude.
 
Last edited by a moderator:
You don't need to be constrained to do exactly what has been suggested, you can improvise. As others have suggested, it's easy enough to periodically gather such stats. If you want to do more formal permanent resource monitoring, you could install sar (just google linux sar).

I mentioned that you may find useful clues have been logged? This especially applies to out of memory events. Presumably your system would be using journald logging, for example:

Code:
sudo journalctl --since "2022-02-20 10:30" --until "2022-02-20 11:30"

This example assumes you have a persistent journal (storage=auto in /etc/systemd/journald.conf). If you don't you can only query periods in the current boot, but that may be all you need.

If journald is not being used, then /var/log/messages might be where a lot of logging is going.

There may be other logs worth checking, but that depends on what the server is running.
sar seemed very interesting to me, I'm testing it in a development environment and I like it, it gives a lot of information, which I can't see (which I don't know if you can) at peak times like this if you can find out the process that did it

Screenshot_106.png
 
zapeador wrote:

That certainly is a large usage of the cpu, but from your finding, it was less than a second, so one wouldn't suppose that it was an event that would "stop" everything to the point that you couldn't get to a console. It still seems to be a conundrum. You may need to run the computer until it actually blocks up and then see what has accumulated to do that. Momentary 100% cpu loads are not in themselves a problem. For example, when I begin to run htop in a terminal, the cpu reading is 100% just momentarily, which is visible across the top of the terminal screen in red lines. I expect that's because it's part of the process of gathering all the data for a moment to print below in its output. The problem of a cpu that blocks up and stops usage is altogether of a different magnitude.

It is something very periodic, every 5 min as the graph of the machine shows, and I am almost sure that it is something from mysql and it is not a normal behavior

Screenshot_105.png
 
sar seemed very interesting to me, I'm testing it in a development environment and I like it, it gives a lot of information, which I can't see (which I don't know if you can) at peak times like this if you can find out the process that did it

View attachment 12097
As far as I understand it sar aggregates. You might also look into linux process accounting, which also aggregates, but per process, the relevant commands being lastcomm(1), accton(8), sa(8).

I take it there was nothing interesting in your logs? In my own experience with Linux (30 years worth), the most common way to make a Linux box completely unresponsive is to put in a situation where it is short of memory. Hogging the CPU almost never causes total lack of responsiveness unless I apply some abusive settings with a tool such as stress-ng. In rare circumstances a hardware exception can also make a box unresponsive.
 
I'm back with another dumb question/point of consideration. Again, no further help than this:

Now that you see the consistency of the spikes - have you looked to see what's (if anything) scheduled at that time interval?
 
your questions are not silly, nor are those of anyone here, that we help each other a lot

I have come to the conclusion that it is the polling that makes freenms every X min, I do not think that this consumption is very normal and I have made the easiest decision since I do not control much of the database... being a maquima virtual because I raised the processors from 2 to 4... it's not the best solution, I know, but it temporarily removes the problem XD
 
Well, that is *a* solution. I do love me some virtual hardware. (I have at least two VPSes sitting idle just 'cause I got 'em cheap and haven't started my next projects.) You just point and click, maybe wait for the folks at the hosting company to notice, and you've got as much horsepower as you need.
 

Members online


Top