1. can sort display progress?
No, sadly it cannot.
2. Does --parallel increase the speed of a sort?
Not really, it seems to be for limiting the number of processor threads/cores used.
A quick peek at sort's info page (via info coreutils sort invocation
) - shows that the default behaviour of sort is to use however many processors the machine has available. Up to a maximum of 8. Apparently more than 8 processors doesn't really yield much in the way of performance gains - so it has a hard limit of 8.
So, if you run sort without using --parallel, it will just use however many processors you have available (or a maximum of 8).
One other thing to note, is that the more processors you use, the RAM usage will increase exponentially.
So really, you only need to use --parallel if you want to limit the number of processors to use.
So for example, if you're ssh'd into a critical server, which is under a heavy load and you need to run sort on a large file - you might want to limit the number of threads to 2 - to allow other processors to still be used for other more critical tasks that the server is performing.
In that case - it will mean that the sort takes longer, but it won't be hogging all of the cpu threads. You can also set the maximum buffer size for it to use too.
So in your case - you probably don't need to use --parallel.
Not unless you need to limit the number of parallel threads to use.
However - considering the fact that memory usage increases logarithmically, the more processors you use - if you are sorting an immense file, perhaps you might want to consider trying the --parallel option and setting it to 1.
That should make the sort a little less memory intensive. So there will be less of the expensive memory related allocations going on.
Which MIGHT possible speed things up a little. It might be worth a shot!
Doing the sort in one thread will take longer, but - at the same time - the increase in memory performance might counterbalance things a little?! IDK - might be worth a shot?!
3. Regarding splitting the files VS monolithic files:
Breaking the huge files up into smaller chunks will naturally speed things up, because it requires less memory to be able to hold all of the data and sort it. So each sort is much less resource intensive.
With a huge, monolithic file sort, it will require a lot more memory and you're probably going to get huge amounts of data being swapped back and forth between RAM and the swap-file, which is quite a slow and resource intensive operation.
Also, internally - inside the program, there may be a lot of memory allocations and/or re-allocations going on during a large sort as things are being shifted around. And these kinds of memory operations are also computationally quite expensive.
So sorting a large number of smaller files is probably going to be quicker than sorting one extremely large file.