• Rob's side project: I recently started Gun-Forums.com and we currently have a $100 raffle - if you're into guns, join up and enter the raffle to win $100.00. Gun Forums

Can I run ls with with multiple arguments

granidier

New Member
I'm having a problem where I have to sort through our logs folder on one of our servers which has about 32mil log files on it. ls -l command by itself does not work, which is what I need to see the date of creation. I'm able to run ls -U | head -50 for example to show 50 files at a time but it doesn't display same data that I need like I would with ls -l. And running ls -l | head -50 does not work.

Does anyone know a possible solution for this as I'm not able to find anything?
 


Tolkem

Active Member
I'm having a problem where I have to sort through our logs folder on one of our servers which has about 32mil log files on it. ls -l command by itself does not work, which is what I need to see the date of creation. I'm able to run ls -U | head -50 for example to show 50 files at a time but it doesn't display same data that I need like I would with ls -l. And running ls -l | head -50 does not work.

Does anyone know a possible solution for this as I'm not able to find anything?
I'm not expert but maybe
Code:
stat * | less
might work since it'll show all timestamps for every file in the directory.

Hope this helps! :)
 

JasKinasis

Well-Known Member
There are three timestamps that are associated with files in unix based systems.
ctime, mtime and atime.

ctime is the timestamp of the last time the inode for the file was changed - e.g. the last time the permissions/ownership was changed.

mtime is the timestamp of the last time the actual contents of the file were updated.

atime is the timestamp of the last time the file was accessed.

Certain file-systems MIGHT support a creation/birth-date for a file, for example - I think the NTFS file-system for Windows does.
But AFAIK - generally speaking, the actual creation date of the file is probably NOT going to be stored.

AFAIK - all of the common Unix-like file searching and listing tools can only list files by ctime, mtime or atime.

So with ls:
You can use ls -lt to list files in their order of mtime (most recently modified files first)
or ls -ltu to list files in order of atime (most recently accessed first).
or ls -ltc to list files in order of ctime (most recently changed permissions/ownership).

If you want the oldest files first (the files that have not been modified/accessed/changed for a long time) add the --reverse option.
e.g.
ls -lt --reverse, or ls -ltu --reverse, or ls -ltc --reverse

stat is an alternative, you could stat all of the files, but if you have over 32 million log files - that's probably going to yield some slightly unwieldy results.

Another option would be to use the find command to find log-files that meet certain criteria and then use it's -exec, or -execdir options to use stat on the files that you are most interested in.

That way then, you only get a listing of the sub-set of files that you are actually interested in.

e.g.
Bash:
find /path/to/logfiles/ -type f -mtime +30 -execdir stat {} \; > /path/to/oldFiles
Above will find all log-files that have NOT been modified in the last 30 days and will stat them. The output is redirected to a file at /path/to/oldFiles - which you can then inspect using less, or more, or a text editor, or whatever you need to do!

And if you have different needs, you just need to tweak the parameters to find, in order to find the files that you are interested in.

But ultimately - it all depends on exactly what you are trying to do.... I'm assuming there's more to the task than trying to list the timestamps of the files.
e.g.
Are you trying to find log files that have been updated recently? Or ones that have NOT been updated for a while?
Or files that have (or have not) been accessed recently?

Let us know a bit more about what you're trying to do and we can help with a more specific answer.
 
Last edited:

granidier

New Member
There are three timestamps that are associated with files in unix based systems.
ctime, mtime and atime.

ctime is the timestamp of the last time the inode for the file was changed - e.g. the last time the permissions/ownership was changed.

mtime is the timestamp of the last time the actual contents of the file were updated.

atime is the timestamp of the last time the file was accessed.

Certain file-systems MIGHT support a creation/birth-date for a file, for example - I think the NTFS file-system for Windows does.
But AFAIK - generally speaking, the actual creation date of the file is probably NOT going to be stored.

AFAIK - all of the common Unix-like file searching and listing tools can only list files by ctime, mtime or atime.

So with ls:
You can use ls -lt to list files in their order of mtime (most recently modified files first)
or ls -ltu to list files in order of atime (most recently accessed first).
or ls -ltc to list files in order of ctime (most recently changed permissions/ownership).

If you want the oldest files first (the files that have not been modified/accessed/changed for a long time) add the --reverse option.
e.g.
ls -lt --reverse, or ls -ltu --reverse, or ls -ltc --reverse

stat is an alternative, you could stat all of the files, but if you have over 32 million log files - that's probably going to yield some slightly unwieldy results.

Another option would be to use the find command to find log-files that meet certain criteria and then use it's -exec, or -execdir options to use stat on the files that you are most interested in.

That way then, you only get a listing of the sub-set of files that you are actually interested in.

e.g.
Bash:
find /path/to/logfiles/ -type f -mtime +30 -execdir stat {} \; > /path/to/oldFiles
Above will find all log-files that have NOT been modified in the last 30 days and will stat them. The output is redirected to a file at /path/to/oldFiles - which you can then inspect using less, or more, or a text editor, or whatever you need to do!

And if you have different needs, you just need to tweak the parameters to find, in order to find the files that you are interested in.

But ultimately - it all depends on exactly what you are trying to do.... I'm assuming there's more to the task than trying to list the timestamps of the files.
e.g.
Are you trying to find log files that have been updated recently? Or ones that have NOT been updated for a while?
Or files that have (or have not) been accessed recently?

Let us know a bit more about what you're trying to do and we can help with a more specific answer.
Really appreciate your input Jas.

So little background for this. We have a 3rd party program that queries these log files that we use to troubleshoot any issues with device discovery. It's our network automation platform that logs all events and then archives them. 3rd party program does API call to this archive and outputs these logs files in a nice HTML format. Problem is with 32+ mil files this 3rd party program basically explodes...

My goal is just to build a python script that would wipe out any log files that are older then X number of days. But to do that I need to get some insight of how old some of these files are. So literally all I need is the output of ls -l command which will show me date of creation and file size. As I mentioned before ls -l does not work because size of the folder. As you can tell I'm not expert on Linux : ) so I appreciate any input. Thanks again.
 

Tolkem

Active Member
@granidier 32+ millions of files? Wow! That's way too many files! no wonder ls, stat or your 3rd party program hangs. What if you redirect the output to a file? Maybe something like
Code:
stat * > logs.txt
or
Code:
ls -lt > logs.txt
might work and later you can review logs.txt to narrow your search regarding which files are the oldest ones.
 

granidier

New Member
@granidier 32+ millions of files? Wow! That's way too many files! no wonder ls, stat or your 3rd party program hangs. What if you redirect the output to a file? Maybe something like
Code:
stat * > logs.txt
or
Code:
ls -lt > logs.txt
might work and later you can review logs.txt to narrow your search regarding which files are the oldest ones.
tried that but don't have permission to write to a file : (
 

Tolkem

Active Member
tried that but don't have permission to write to a file : (
Maybe try again with sudo?
Code:
sudo stat * > logs.txt
or
Code:
sudo ls -lt > logs.txt
By the way, which Linux distro is this?

Just tried here in /etc/ and didn't work, I had to become root so it worked. Guess you'll have to become root too and run those commands then move the logs.txt file with something like
Code:
mv logs.txt /some/other/dir
note that you must have write/read rights in the dir you're moving logs.txt to so you can edit it.
 
Last edited:

granidier

New Member
Maybe try again with sudo?
Code:
sudo stat * > logs.txt
or
Code:
sudo ls -lt > logs.txt
By the way, which Linux distro is this?

Just tried here in /etc/ and didn't work, I had to become root so it worked. Guess you'll have to become root too and run those commands then move the logs.txt file with something like
Code:
mv logs.txt /some/other/dir
note that you must have write/read rights in the dir you're moving logs.txt to so you can edit it.
Yea I've tried sudo but still doesn't grant permission.

Linux version:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

Only output I can get is with ls -U | head -50. This gives me 50 records at a time but just file names. Is there a way to append some argument to -U that will include date/time and size?

ls -l | head -50 does not work. Just hangs. Any argument with -l does not work.
 

Rob

Administrator
Staff member
i'm surprised you haven't run out of inodes on there.. whenever I ran into too many files for ls to handle, I always ended up using /find.

I'd go w/ a combination of what @JasKinasis came up with .. and change around the filters a bit. You probably know the naming convention of them.. log files generally have a date in them.. so start listing by day, month.. delete as many as you can this way also to make it easier/faster to run future commands:

find /path/to/files -name "logfile-2002-10-??.log" -delete

You don't need to use the exec, etc.. to delete em either ^ which is nice :)
 

JasKinasis

Well-Known Member
i'm surprised you haven't run out of inodes on there.. whenever I ran into too many files for ls to handle, I always ended up using /find.

I'd go w/ a combination of what @JasKinasis came up with .. and change around the filters a bit. You probably know the naming convention of them.. log files generally have a date in them.. so start listing by day, month.. delete as many as you can this way also to make it easier/faster to run future commands:

find /path/to/files -name "logfile-2002-10-??.log" -delete

You don't need to use the exec, etc.. to delete em either ^ which is nice :)
@granidier - If you're looking at deleting old files then @Robs suggestion is one way of doing it - if the date of the logs creation is included in the file-name. Then you can simply search for files with filenames containing dates that are older than your cut-off and remove them.

However - if you want to delete files that are older than X days - personally I'd run with this:
Bash:
find /path/to/logfiles/ -type f -mtime +X -delete
Where /path/to/logfiles is the path to the logfiles and X is the number of days e.g. 30, or 100, or whatever!

My rationale for this is simple:
If the mtime hasn't changed in over X days - then the content of the file hasn't changed in over X days - Therefore it's pretty safe to say that the log-file is at least X days old and is a solid candidate for removal!

Also you will almost certainly need to be root in order to execute the above command, so either su to root, or use sudo - if it is set-up on the CentOS system.

Also - because of the huge number of files - this process will almost certainly take a long time to complete. So if you want to be able to continue using the terminal whilst the job is running, you might want to consider running find in the background by putting an apersand & at the end of the command.
e.g.
Bash:
find /path/to/logfiles -type f -mtime +X -delete &
Again where /path/to/logfiles is the location of the logs and X is the number of days. And once again, you'll also need to make sure you're running the command as root.

When you run that - the terminal will display a PID and a job number and you will be able to continue to use the terminal whilst the job is running in the background.

I don't know how much you know about job-control, but JIC you don't know:
You can see what jobs you have running in the background using the jobs -l command. There are a number of other options that can be used with the jobs command to allow you to view only running jobs, or only stopped jobs etc etc.
I won't list them all here, but you can read about them using the command help jobs

If you need to temporarily pause the background job, or restart it, or properly kill it - you can use the kill command to send various signals to the job using it's PID.

So if your find command is running in the background and has a PID of 4587, and you want to pause it you would use:
Bash:
kill -STOP 4587
or
Bash:
kill -17 4587
That will pause the background job by sending the process the SIGSTOP signal.

And then to resume it:
Bash:
kill -CONT 4587
or
Bash:
kill -19 4587
That will resume the job running in the background by sending the SIGCONT signal.

And if you need to completely kill it:
Bash:
kill -TERM 4587
or
Bash:
kill -15 4587
That will send the SIGTERM signal to the process, which should allow the process to cleanly shut-down. Any pending file-writes are completed and the process can clean up after itself as it shuts down.

After sending a signal to a job that has been started in the background - you can use the jobs or jobs -l command to list all jobs and verify that the signal you sent has had the required effect.
So after sending SIGTERM, you can check to see if the process has actually ended using the jobs command.

If sending SIGTERM fails to kill the process the absolute final resort is SIGKILL:
Bash:
kill -KILL 4587
or
Bash:
kill -9 4587
That will send the SIGKILL signal which will force the process to end no matter what - but this should only be used in extreme circumstances because it can potentially leave corrupted files behind in its wake. For example if the process is in the middle of writing to a file and it receives the SIGKILL signal, it will just end mid-write, leaving the file in a potentially corrupt/incomplete state.

To see more about the signals that can be sent via kill, read the manpage for kill man kill.
 

granidier

New Member
@granidier - If you're looking at deleting old files then @Robs suggestion is one way of doing it - if the date of the logs creation is included in the file-name. Then you can simply search for files with filenames containing dates that are older than your cut-off and remove them.

However - if you want to delete files that are older than X days - personally I'd run with this:
Bash:
find /path/to/logfiles/ -type f -mtime +X -delete
Where /path/to/logfiles is the path to the logfiles and X is the number of days e.g. 30, or 100, or whatever!

My rationale for this is simple:
If the mtime hasn't changed in over X days - then the content of the file hasn't changed in over X days - Therefore it's pretty safe to say that the log-file is at least X days old and is a solid candidate for removal!

Also you will almost certainly need to be root in order to execute the above command, so either su to root, or use sudo - if it is set-up on the CentOS system.

Also - because of the huge number of files - this process will almost certainly take a long time to complete. So if you want to be able to continue using the terminal whilst the job is running, you might want to consider running find in the background by putting an apersand & at the end of the command.
e.g.
Bash:
find /path/to/logfiles -type f -mtime +X -delete &
Again where /path/to/logfiles is the location of the logs and X is the number of days. And once again, you'll also need to make sure you're running the command as root.

When you run that - the terminal will display a PID and a job number and you will be able to continue to use the terminal whilst the job is running in the background.

I don't know how much you know about job-control, but JIC you don't know:
You can see what jobs you have running in the background using the jobs -l command. There are a number of other options that can be used with the jobs command to allow you to view only running jobs, or only stopped jobs etc etc.
I won't list them all here, but you can read about them using the command help jobs

If you need to temporarily pause the background job, or restart it, or properly kill it - you can use the kill command to send various signals to the job using it's PID.

So if your find command is running in the background and has a PID of 4587, and you want to pause it you would use:
Bash:
kill -STOP 4587
or
Bash:
kill -17 4587
That will pause the background job by sending the process the SIGSTOP signal.

And then to resume it:
Bash:
kill -CONT 4587
or
Bash:
kill -19 4587
That will resume the job running in the background by sending the SIGCONT signal.

And if you need to completely kill it:
Bash:
kill -TERM 4587
or
Bash:
kill -15 4587
That will send the SIGTERM signal to the process, which should allow the process to cleanly shut-down. Any pending file-writes are completed and the process can clean up after itself as it shuts down.

After sending a signal to a job that has been started in the background - you can use the jobs or jobs -l command to list all jobs and verify that the signal you sent has had the required effect.
So after sending SIGTERM, you can check to see if the process has actually ended using the jobs command.

If sending SIGTERM fails to kill the process the absolute final resort is SIGKILL:
Bash:
kill -KILL 4587
or
Bash:
kill -9 4587
That will send the SIGKILL signal which will force the process to end no matter what - but this should only be used in extreme circumstances because it can potentially leave corrupted files behind in its wake. For example if the process is in the middle of writing to a file and it receives the SIGKILL signal, it will just end mid-write, leaving the file in a potentially corrupt/incomplete state.

To see more about the signals that can be sent via kill, read the manpage for kill man kill.
So after some further investigation author did create a clean up job that is supposed to keep these log files count under 1000 but at some point that process ran into a loop that killed the CPU and the VM, so the process was killed and in the mean time these logs ran out of control to the point where log cleanup script couldn't not query through it anymore because of the size...

So I will do what you and Rob are suggesting. I'm just going to start wiping out the files. Problem I have is that my account does not have permission to rm. However I do have super user and wanted to know is it possible to execute sudo su user command along with rm argument or is there a different process for that?

Thanks again guys!
 


Members online


Latest posts

Top