Sorting lines mixing alphabetical key and numeric key

D

Denis Crété

Guest
Using Unix commands, I want to sort picture filenames (contained in a file called Tri.txt) according to 2 keys, one alphabetical Ka, and the other numeric Kn. The numeric key Kn is the last numeric field before the extension (substring E located after the last dot); the numeric key is prefixed by the first (alphabetic) key Ka. Therefore the pattern of the picture filename is KaKnNs.E (where Ns is a non numeric string not containing any numeric character).
I did not find any option for sort to get the following result
Ka1Kn1Ns1.E1
Ka1Kn2Ns2.E2
Ka1Kn3Ns3.E3
Ka2Kn1Ns4.E4
Ka2Kn2Ns5.E5
Kn1Ns6.E6
The last line only shows that the prefix Ka can also be an empty string.
The sorting order for the first (alphabetical) key Ka is not important as long as identical prefixes are grouped.
The sorting order for the second (numeric) key Kn is inceasing numeric values.
The Non numerical string Ns and Extension E fields are not important in sorting.
I wrote the following script which seems to do the job, but I hope I missed something allowing to have a much simpler command.
Code:
sed -e 's/\(.*[^0-9]\)\([0-9]*[^0-9]*\.[^.]*\)$/\1/;' Tri.txt |  \
sort -n | uniq | \
while read n; do \
    sed -ne 's/\('$n'\)\([0-9]\{1,\}[^0-9]*\..\{3,4\}\)$/\2--Cut_here--\1/; ta; /^'$n'$/p; d; :a;p;' Tri.txt | \
    sort -n | sed -e 's/\(.*\)--Cut_here--\(.*\)/\2\1/'; \
done;

In this pipe, the first command (sed ... Tri.txt) returns the prefixes Ka (or the complete line if the prefix is empty) for all the lines contained in the file Tri.txt.
The second command (sort -n) sorts them so that identical lines are conscutive, and filenames with empty prefix are already numerically sorted.
The third command (uniq) eliminates multiple identical entries, keeping only one of them.
The fourth command is a loop done once for each prefix Ka and for each filename with empty prefix. It contains:
- a sed command to swap the prefix Ka and the rest of the line, separating them with the string --Cut_here-- , i.e. the output looks like KnNs.E--Cut_here--Ka. If the prefix is empty, it just returns the unchanged line KnNs.E;
- the sort command to do the numeric sort (on the group of lines with prefix Ka);
- and finally the last command of the loop restores the original line by swapping the strings separated by --Cut_here-- (and eliminating --Cut_here--).

Thank you for any solution significantly improving the above solution, e.g. not resorting to a loop structure...
Denis
 

Members online

No members online now.

Top