Saying that, I've had a bit of a play with awk and come up with this little script, which does the following:
1. Ensures it is passed a single parameter
2. Ensures the parameter is a path to a valid file
3. If 1 or 2 are not met, it exits with an error
4. Creates a temporary directory in /tmp/
5. Moves into the temporary directory
6. Runs the input file through awk, which extracts everything between each line matching HEAD:, or the end of the file.
And outputs each group of results to a separate file in the temporary directory e.g. /tmp/tmp.wiorueru/chest1.txt /tmp/tmp.wiorueru/chest2.txt etc etc.
7. Moves back to the original working directory
8. Displays the number of chests and the content of each chest.
9. Cleans up by removing the temporary directory.
The only thing left to do is to modify the code that displays the number of chests and their contents, so that the content of each chest is read into an array, or an array of arrays.
I've just ran out of time for that part! It's getting late here and I have work in the morning! Ha ha!
Without further ado here's the script I have so far:
getchests.sh
Bash:
#!/usr/bin/env bash
# handle errors
die()
{
echo "ERROR: $1"
echo "Usage: getchests.sh /path/to/file"
exit 1
}
# ensure we have a single parameter which is a path to a valid file
if [ $# -ne 1 ]; then
die "Incorrect number of parameters!"
elif [ ! -f $1 ]; then
die "$1 does not exist, or is NOT a valid file!"
fi
# Get the absolute path to the passed-in file
inFile="$(realpath $1)"
# Create a temporary directory in /tmp/
tempDir=$(mktemp -d)
#echo "tempDir=$tempDir" # debug - shows the path to created temp directory
# cd into the temporary directory
cd "$tempDir"
# Process the input file with awk - get everything between each HEAD: tag and write each chest out to a separate .txt file (in the temporary directory)
awk 'BEGIN{open=0; num=0;}{if($0 ~ /HEAD:/){open=1;num=num+1;}else if($0 ~ /HEAD:/){open=0;}else{if(open==1){print $0 >> "chest"num".txt";}}}' $inFile
# cd back to the original working directory
cd -
# count the number of Chest-files in the temporary directory
echo "$inFile contains $(find $tempDir -type f | wc -l) Chests"
# This just loops through each file and cats it to the screen
# What the OP needs to do here is read each line of each file into a separate array - so will need an array of arrays, or something.
count=1;
for chest in "$tempDir"/*
do
echo "chest $count contains;"
cat "$chest"
echo
count=$((count+1))
done
# clean up - remove the temporary directory and temporary files
rm -r "$tempDir"
As always - ensure it has executable permissions with
chmod +x ./getchests.sh
(assuming you're in the directory containing the script)
And run it as:
Bash:
./getchests.sh /path/to/inputfile
If you forget to pass it a path to a file, or pass it an invalid path to a file - it will just exit with an error message.
I hope this helps!
I've used quite a few little shell-scripting tricks in various places in the script. If you have any questions about any of it - feel free to fire away!
The awk part was the bit that took the longest. That was a little tricky! I couldn't find a clean way to pass awk the path to the temporary directory, so it could write the temporary files out there.
So instead of wasting tons of time trying to get that working, I just made the script
cd
into the temporary directory to make that the current working directory. And then ran awk in there and got it to write the "chest" files out to the new current working directory.
But this meant that I had to run
realpath
on the filename passed into the script, to ensure that we akways had an absolute path to the script.
So for example, if you were in the directory
~/someProject/
and you ran the script passing the path to the input file using a relative path like:
./file.txt
:
Because the script changes the working directory to the temporary directory, then the relative path
./file.txt
would not work, because the file is not in the empty, temporary directory the script just moved into. It's in a sub-directory of your home directory. So what we had to do
before moving into the temporary directory was to use
realpath
to get the fully qualified, absolute path to the input file.
e.g. it would resolve to the absolute file-path/name
/home/yourname/someProject/file.txt
.
That way, after moving into the temporary directory - awk could still process the file because it had a fully qualified, absolute path to the file.
Anyway, I hope this helps.
I'm off to bed now!