QTerminal (Kali) - Numeric Sort when numbers at the end

Niqu

New Member
Joined
Apr 26, 2021
Messages
3
Reaction score
0
Credits
31
Hey:)

As part of my studies I started learning about Kali Linux as my first ever Linux distro.

Lately we learned a bit about the QTerminal, pipes & sorting

We now have been given a list with ~10000 words
Out of these 10000 words there are 100 words that are similiar but have different numbers in the end (so 100 different words in total)
Our task is to to find the one with the highest number for each word so that we only have every word only once:

--------------------------------
e.g:

Banana 32
Apple 34
Car 231
Car 213
Apple 12
Banana 549

Turns into:

Apple 34
Banana 549
Car 231


---------------------------------

Now to my problem:

I've tried out the different options (-n, -s, -t...) that sort provides but I always end up with the words that are similiar getting sorted together but the numbers behind them are still mixed and not in the right order (descending or ascending)




I attached a picture to show what it looks like after I try to sort it:

Bild_2021-04-28_135529.png

As you can see the words are grouped but the numbers are still mixed
 


f33dm3bits

Gold Member
Gold Supporter
Joined
Dec 11, 2019
Messages
3,905
Reaction score
2,639
Credits
28,225
You will need to write a short script that sorts and extracts the data you need. Then have it compare the values and have it print the highest value with the word it comes with, probably have to create a loop of some sort. I don't have time to look at it now but maybe someone else will be me to it but on the other hand, this is a homework assignment?
 
OP
N

Niqu

New Member
Joined
Apr 26, 2021
Messages
3
Reaction score
0
Credits
31
You will need to write a short script that sorts and extracts the data you need. Then have it compare the values and have it print the highest value with the word it comes with, probably have to create a loop of some sort. I don't have time to look at it now but maybe someone else will be me to it but on the other hand, this is a homework assignment?


Thanks for the idea, I will try to gather some information about that ! :)

This is is a non-obligatory assignment we can do.
Most of our study is happening in theory but we got the option to do some Linux to get a better understanding of whats happening.

With scripting you mean BASH scripting right?
 

rado84

Well-Known Member
Joined
Feb 25, 2019
Messages
452
Reaction score
398
Credits
1,685
Actually, if you take a closer look at your image, they ARE sorted, just no in the way you expect. Linux sorts stuff like this: the word with the least number of characters - alphabetically, then the number with the least number of characters - alphabetically. "Volutpat" is before "vulputate" because "O" is before "U" in the alphabet. Then it goes on with sorting alphabetically the numbers with the same logic - which is why 974 appears before 98. I know it's a weird logic.

The easiest way for you to sort them would be to copy and paste the words from that list to a LibreOffice spreadsheet which can then be set to autosort the contents of each cell. I've done something similar long ago, so I don't remember the exact way to do it, but I know it works, only I did it with plain words (no numbers) but it should be the same.
 

JasKinasis

Well-Known Member
Joined
Apr 25, 2017
Messages
1,321
Reaction score
1,870
Credits
8,343
I'm a little bit late to the party here, but the problem the OP has is: They are sorting the file purely alphabetically.
If you use sort without specifying any switches/options - it does a purely alphabetical sort. Which is why "volupat 830" comes before "volupat 89". Everything up to the 8's are the same. The 3 is less than the 9, so the "volupat 830" comes before "volupat 89". That's all there is to it!

The OP simply needs to use the -V switch, which will treat numeric values as whole numbers rather than a series of alphanumeric characters.
e.g.
Bash:
sort -V /path/to/file

Now the file will be sorted EXACTLY as the OP wanted. Pretty trivial!

Though to be fair, in the man page - it isn't exactly obvious what the -V/--version-sort switch does.
The man page just says this:
-V, --version-sort
natural sort of (version) numbers within text
It doesn't only work with version numbers. Any numbers in the text will be compared as whole numbers, rather than a series of individual alphanumeric characters.

sort -V comes in extremely handy if you've ever need to create a .m3u playlist in a directory containing an album with a large number of numbered .mp3, .flac, or .ogg tracks.

If you had a 20 song album and did an alphabetical sort
e.g.
Bash:
ls *.flac | sort >> playlist.m3u
Above lists all of the .flac files in the current directory, alphabetically sorts them and writes them out to a .m3u file.

But when you open the .m3u file, you'll see that the tracklisting is in this order:
1,10,11,12,13,14,15,16,17,18,19,2,20,3,4,5,6,7,8,9
Now the songs are in the wrong order.

But if you use the version number sort, the tracklisting will be in the correct order:
e.g.
Bash:
ls *.flac | sort -V >> playlist.m3u
That will list all of the flac files and then sort them using the version sort, which will treat the track numbers as natural numbers - so if we open the .m3u - all of the tracks will be listed in the correct album order.
So if we had a 20 song album the tracks will be listed sequentially from 1 to 20.
 
$100 Digital Ocean Credit
Get a free VM to test out Linux!

Members online


Top