recursivley output through directory tree last modified date/time for files and folders to csv

jimleslie1977

New Member
Joined
Feb 25, 2020
Messages
16
Reaction score
1
Credits
111
Hi,

Could I have some assistance in creating a a script to recurivley search through a diectory tree outputting files and sub folders created or modiefied after a specific date and time to a csv file?


Thanks in advance
 


Just use gnu find.
To find files or directories created/modified in the last 90 days:
Code:
find /path/to/search/ -mtime -90 -printf "%p\t" > /path/to/outputfile.csv

NOTE: That will output ALL results to a tab delimited file.
But all of the results will be output in a single line, with each filename separated by a tab.

And if you want to search for a very specific date/time, you can do something like this:
Code:
find /path/to/search/ -newermt "2020-04-20 06:05:15" -printf "%p\t" > /path/to/output.csv
Above will find and list all files and directories/sub-directories, that were created, or modified any time AFTER 5 minutes and 15 seconds past 6 in the morning on the 20th of April 2020!
 
Last edited:
Just a thought:
If you want to see the files in the order that they were created/modified, then you could perhaps change the output format using -printf, to output the date/time of modification and the filename and have tab separated output, but with one file-entry per line.....
Code:
find /path/to/search/ -newermt "2020-04-20 06:05:15" -printf "%TY-%Tm-%Td@%TT\t%p\n" | sort -n > /path/to/output.csv
Above uses the previous example - to find any files modified after 5 mins and 15 seconds past 6 in the morning on Apr 20th 2020, but we've changed the output format using the -printf option.
This time each line in the file will contain a time-stamp in the format date@time (YYYY-MM-DD@HH:MM:SS.xxxxxx) a tab, and then the path to the file.
%TY is the year, %Tm is the month (as a number), %Td is the numeric day of the month and %TT is the actual time. and %p is the path/filename of the file.
All of the output is piped to sort, which will sort the list by the timestamps in each line. The sorted output is redirected to the output file.

Now you'll have a list of files and the exact date/times they were modified, in the exact order they were modified.
Older files will be at the top of the list, most recent will be at the bottom of the list.

Either way, if you're searching for files, gnu find is what you need.
It has a LOT of different options, you can search based on lots of different criteria.
If you only want to see the names/paths of the files - then you can use its -printf option and set what you want to see in the output (as we've just done!)
You can even use find to simultaneously search for several different types of files and have it use a separate action for each file-type.
e.g.
Find all .pngs, gifs, jpegs and then move all pngs to directoryA, gifs to directoryB, jpegs to directoryC.

GNU find is an extremely useful and powerful tool.
 
Thanks for your assistnace, its much appreciated, could I pull in a csv list for source directories for this approach in any way? also the out csv file does not seem to list the file outputs 1 per line, more 1 single string of text, is there a way to put the file list findings 1 per line?

final point, i seem to be getting these errors running your final option:

[root@RCRVNXCS0 nasadmin]# find /nas/quota/slot_2/root_vdm_2/isadhomes01/home/ph407/ -newermt "2020-04-20 15:00:00" -printf "%TY-%Tm-%Td@%TT\t%p\n" | find /nas/quota/slot_2/root_vdm_2/isadhomes01/home/ph407ot_vdm_2/isadhomes01/home/jltest/output.csv
find: invalid predicate `-newermt'
find: invalid predicate `-n'



thanks
 
Last edited:
I'm answering these slightly out of order.
also the out csv file does not seem to list the file outputs 1 per line, more 1 single string of text, is there a way to put the file list findings 1 per line?
When you said a csv file - you didn't specify the format - so initially I went with a monolithic tab-delimited approach.
To do one file-name per line, you can just use skip using -printf and the format/field specifiers.

So my first example would become:
Code:
find /path/to/search/ -mtime -90 > /path/to/outputfile.csv

My second example would become:
Code:
find /path/to/search/ -newermt "2020-04-20 06:05:15" > /path/to/output.csv

final point, i seem to be getting these errors running your final option:

[root@RCRVNXCS0 nasadmin]# find /nas/quota/slot_2/root_vdm_2/isadhomes01/home/ph407/ -newermt "2020-04-20 15:00:00" -printf "%TY-%Tm-%Td@%TT\t%p\n" | find /nas/quota/slot_2/root_vdm_2/isadhomes01/home/ph407ot_vdm_2/isadhomes01/home/jltest/output.csv
find: invalid predicate `-newermt'
find: invalid predicate `-n'

Hmmm, well, what you've posted above isn't in any of my posts.
From what I can see there - you've attempted to chain the output of the first find to the input of a second find via a pipe...

Syntactically - what you've posted above is:
find {searchpath} -newermt {timestamp} -printf {format specifiers} | find /path/to/outputfile.csv

Which is NOT what I posted.

The syntax of the last command in my initial post was:
find {searchpath} -newermt {timestamp} -printf {format specifiers} > /path/to/outputfile.csv

And the last command in my previous post was:
find {searchpath} -newermt {timestamp} -printf {format specifiers} | sort -n > /path/to/outputfile.csv

Both are very different to what you posted.

Thanks for your assistnace, its much appreciated, could I pull in a csv list for source directories for this approach in any way?

Yes, you could do that.
This is one way:
Code:
while read searchpath; do
    if [[ -d "$searchpath" ]]; then
        find "$searchpath" -newermt "2020-04-20 15:00:00"  >> /path/to/outputfile.csv
    fi
done < /path/to/searchpaths.csv
That literally just reads the file /path/to/searchpaths.csv, line by line. Checks if each line is a valid path to a directory. If it is a valid path, it performs a find and appends the paths to anything it finds to /path/to/outputfile.csv.
That would be the simplest method.

If you wanted the output sorted by modification time, and to see the sorted output in blocks - with one block per find command - you could modify it like this:
Code:
while read searchpath; do
    if [[ -d "$searchpath" ]]; then
        echo "Found at $searchpath:" >> /path/to/outputfile.csv
        find "$searchpath" -newermt "2020-04-20 15:00:00" -printf "%TY-%Tm-%Td@%TT\t%p\n" | sort -n >> /path/to/outputfile.csv
        echo >> /path/to/outputfile.csv
    fi
done < /path/to/searchpaths.csv

That will put all of the output into a single file, with each search separated into sorted blocks. Each line would contain a time-stamp and a path/filename.

Optionally, if you want all of the results in a single file and you don't care about which find operation they're from and you just want ALL of them to be sorted by timestamp - you could simply omit the lines used to separate the blocks of results and defer the sort until the last find has been performed - yielding a complete list of paths sorted by modification date, starting with the oldest and ending with the newest.
e.g. something like this:
Code:
while read searchpath; do
    if [[ -d "$searchpath" ]]; then
        find "$searchpath" -newermt "2020-04-20 15:00:00" -printf "%TY-%Tm-%Td@%TT\t%p\n" >> /path/to/outputfile.csv
    fi
done < /path/to/searchpaths.csv
sort -n /path/to/outputfile.csv > /path/to/sortedoutputfile.csv
rm /path/to/outputfile.csv
Above writes the output from ALL of the find commands to a single file - then after all of the find commands are complete - we write the sorted output to a new file and remove the original output file.

Taking things further - you might not want to see the time-stamps in the final, sorted file. Perhaps you only wanted the time-stamps to perform the sort, but don't want them in the file.
In which case, the sort command at the end of the above script would become:
Code:
sort -n /path/to/outputfile.csv | awk '{print $2}' > /path/to/sortedoutputfile.csv

Or in the example where we're putting the results into sorted blocks, you could strip out the timestamps in the output file by modifying the piped part at the end of the line containing the find command to:
Code:
| sort -n | awk '{print $2}' >> /path/to/outputfile.csv

And that's just a few ideas, off the top of my head. The possibilities here are pretty much unlimited. It all depends on what you're looking for in the output file.
 
Last edited:
I seem to be getting this error when trying all of the above:

/usr/bin/userlist3: line 5: syntax error near unexpected token `done'
/usr/bin/userlist3: line 5: `done < /nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist.csv'

Any ideas?

Thanks
 
I seem to be getting this error when trying all of the above:

/usr/bin/userlist3: line 5: syntax error near unexpected token `done'
/usr/bin/userlist3: line 5: `done < /nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist.csv'

Any ideas?

Thanks

Can you please post a reply in this thread with the exact code you are using in your script?

I’m thinking maybe you’ve made some typos in there, or something? Or perhaps there’s something else in your code, that is sending things awry!

If you use the insert button (...) in the toolbar at the top of the Linux.org post-editor and select the “insert code” option - a box will pop up, and you can copy/paste your script/code into it. Click the ok box and it will insert the code into your post inside code tags.

All of the shell-script snippets I’ve posted above definitely work. I use these kinds of patterns in a lot of my own scripts. I even did some testing on them to make sure they work before I posted them.

Whatever happens, this is a classic PICNIC (Problem In Chair, Not In Computer) - I’m just not sure if it’s your chair, or mine! Ha ha ha!

But, post your code and we can take a look at it and see what’s going on!
 
sorry here you go:

Code:
while read searchpath; do
    if [[ -d "$searchpath" ]]; then
        find "$searchpath" -newermt "2020-04-20 15:00:00"  >> /nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/output2.csv
    fi
done < /nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist.csv
 
Hmmmm, that all looks OK.
I was thinking perhaps your do/done’s were mismatched, maybe you were missing a do..... But your entire script is basically exactly what I posted....

But because you have the script in a directory on $PATH - it needs a shebang at the start of the script to tell the shell which interpreter to use. I didn't include the shebang. That’s the only thing I didn’t include in my snippets!

Insert the following line, as the very first line of the script:
Code:
#!/usr/bin/env bash

Normally, I would post the shebang in full scripts, but I was posting in "snippet" mode and assumed that you would know to put the shebang at the start of a script. My bad. Turns out the problem was in my chair after all!
 
now I get this error:
[root@RCRVNXCS0 bin]# userlist3
: No such file or directory


unfortunatley not a lot to go off i know.


A slight change to my original post request, and now having looked at the available solutions using the find command I think I can now give more of a definitive request here:

So we have user homedrives that could be located in 1 of 7 directory trees with the only variable being the isadhomes part of the directory as per below (isadhomes can be isadhomes01 – isadhomes07), but the home drive will only reside in one of these areas. We would like to out put to csv the last time a file was modified or created (like what your previous scripts gave us). We would also like to output a list of files from the users home directory that have been created or user modified in the last 10 days.

The user home drive location is:

/nas/quota/slot_2/root_vdm_2/isadhomes[01-07]/home/[username]

The csv output directory should be:

/nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist.csv

Thankyou once again for all your help with this.

Jim
 
Last edited:
Hard to tell. Looks like one of the paths you've used is incorrect, but without further information, I couldn't tell you which path it was!

Whenever having problems with a shell-script, you should try executing it in debug mode.
like this:
bash -x /path/to/script

And it will show each command that was executed in the script etc.
That should help you to narrow in on the cause of the problem.

Based on what you were looking for in the second half of your post, I've come up with this:
Bash:
#!/usr/bin/env bash

outputFile="/nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist-$(date +%Y-%m-%d-%T).csv"
tmpDir=$(mktemp -d)

find /nas/quota/slot_2/root_vdm_2/ -maxdepth 1 -type d -name "isadhomes*" | sort -V > "$tmpDir/baseDirs"

while read -r baseDir ; do
    find "$baseDir/home/" -maxdepth 1 -type d | \grep -v 'home/$' | sort -V >> "$tmpDir/homeDirs"
done < "$tmpDir/baseDirs"

while read -r homeDir ; do
    { 
        echo "==========================";
        echo "$homeDir";
        echo "==========================";
        find "$homeDir" -mtime -10 -printf '%TY-%Tm-%Td@%TT\t%p\n' | sort -nr | awk \{'print $2'\}
        echo 
    } >> "$outputFile"
done < "$tmpDir/homeDirs"

rm -r "$tmpDir"

Rather than explain the code - here's another version with a ton of comments in it.
Hopefully this makes sense!
Bash:
#!/usr/bin/env bash

# Initial variables:
# Create a file-name with a time-stamp in it
# That way you get a new, uniquely named file after each run
outputFile="/nas/quota/slot_2/root_vdm_2/isadhomes01/home/jltest/userlist-$(date +%Y-%m-%d-%T).csv"

# Make a temporary directory in /tmp/
# We'll use this to store some temporary files that we'll remove at the end
# NOTE: Ususally yields a path like /tmp/tmp.xxxxxx 
# - Where xxxxxx are a random sequence of letters and numbers
tmpDir=$(mktemp -d)

# The directories we're interested in are in these paths:
#/nas/quota/slot_2/root_vdm_2/isadhomes[01-07]/home/[username]

# The first thing to do:
# Searching from /nas/quota/slot_2/root_vdm_2/:
# - Find all isadhomes* directories and save them to a file in our /tmp/ directory
# NOTE: Using a non-recursive search 
# - we're constraining our search to sub-dirs immediately in root_vdm_2

# - This yields an alphabetically sorted list of paths, in a file at /tmp/tmp.xxxxxx  called "baseDirs"
find /nas/quota/slot_2/root_vdm_2/ -maxdepth 1 -type d -name "isadhomes*" | sort -V > "$tmpDir/baseDirs"


# Second task:
# - Iterate through the paths in the baseDirs file 
# - Use find (non-recursively) to find all dirs immediately in the baseDir/home folder off each path
# - The grep filters out the /home dir itself. 
# - Results are alphabetically sorted and appended to a file called "homeDirs" in our /tmp/ dir 
while read -r baseDir ; do
    find "$baseDir/home/" -maxdepth 1 -type d | \grep -v 'home/$' | sort -V >> "$tmpDir/homeDirs"
done < "$tmpDir/baseDirs"

# Third task:
# Iterate through the list of home directories and find files from the last 10 days
# Notes:
# - Prints a header to the output file
# - Then find is used - and outputs a timestamp and a filename for each result
# - Output is sorted by timestamp. -n is numeric sort -r is reverse order, so the most recent file is listed first
# - awk filters out the timestamp for the output to the final output file - yielding only an ordered list of files, most recent to least
while read -r homeDir ; do
    { 
        echo "==========================";
        echo "$homeDir";
        echo "==========================";
        find "$homeDir" -mtime -10 -printf '%TY-%Tm-%Td@%TT\t%p\n' | sort -nr | awk \{'print $2'\}
        echo 
    } >> "$outputFile"
done < "$tmpDir/homeDirs"

# Final task:
# Remove the temporary directory
# If you use /tmp/ in a script - you should always clean up afterwards!
rm -r "$tmpDir"
 

Members online


Top