In a bash script, how do you match different lines in a sorted document to the numbers in their original order?

C

CrazedNerd

Guest
So using the concepts that @JasKinasis mentioned mentioned in my other thread here (readarray and array loops), i've been using them on scripts that compare and display lines in separated documents and put them in a different format than the commands "comm" and "diff". I've successfully done this for showing two documents side by side, or showing lines alone that are different through "sort"...but it would be better if i could show sorted lines paired with the line number as they stand in the original document. In other words, it wouldn't be helpful to show a user the line number for different lines as they stand in a sorted document, because they're in a different order than what they need to be in code or in an original document. I've been experimenting with a script for this, but so far i just have this, which either prints different lines when the files are sorted or compares what the different lines if the documents are side by side like i mentioned before:

Code:
#!/bin/bash
#easier to read format for showing differences between files

#tells user how to use script
SCRIPT=$0

if [ $# -lt 2 ]; then
        echo "This script needs two files as arguments: ${SCRIPT##*/} file file2"
        exit 1
fi

if ! [ -f $1 ] || ! [ -f $2 ]; then
        echo "File or files do not exist."
        exit 2
fi

sed 's/^[ \t]*//g' $1 > temp-diff
sed 's/^[ \t]*//g' $2 > temp-diff2

#assigns lines to arrays
#readarray -t DIFF < $1
#readarray -t DIFF2 < $2
readarray -t DIFF < temp-diff
readarray -t DIFF2 < temp-diff2

#prompts user for viewing unique lines or a side-by-side comparison
echo "Would you like to sort the results?  With sorting, you see unique lines"
read -p "In the document instead of a side-by-side comparison (y/n)  "  answer

until [[ $answer = [yYnN] ]]; do
        read -p "Incorrect answer, type y or n and press enter. To quit, enter q: " answer
                if [[ $answer = [Qq] ]]; then
                        exit 0
                fi
done

#chooses sort
if [[ $answer = [Yy] ]]; then

        echo -e "\nThese lines are unique to $1:\n"
        diff --new-line-format="" --unchanged-line-format="" <(sort $1) <(sort $2) | sed '/^$/d;s/^[ \t]*//g' | uniq
        echo -e "\nThese lines are unique to $2:\n"
        diff --new-line-format="" --unchanged-line-format="" <(sort $2) <(sort $1) | sed '/^$/d;s/^[ \t]*//g' | uniq

fi

#chooses comparison
if [[ $answer = [Nn] ]]; then
                echo -e "\nDifferent lines in $1:\n"
        count=${#DIFF[@]} #sets limit for looping
                #loops non-matching lines in an array plus their line number from first file
        for (( j=0; j<count; j++ )); do
                line_numb=$((j + 1))
                if [ "${DIFF[j]}" != "${DIFF2[j]}" ] && [ -n "${DIFF[j]}" ]; then
                        echo "${DIFF[j]} (Line $line_numb)"
                fi
        done
fi

if [[ $answer = [Nn] ]]; then
        echo -e "\nDifferent lines in $2:\n"
        count2=${#DIFF2[@]}
                #loops non-matching lines in an array from their line number from second file
                for (( j=0; j<count2; j++ )); do
                line_numb=$((j + 1))
                if [ "${DIFF[j]}" != "${DIFF2[j]}" ] && [ -n "${DIFF2[j]}" ]; then
                        echo "${DIFF2[j]} (Line $line_numb)"
                fi
        done
fi

rm temp-diff temp-diff2
 
Last edited by a moderator:


It's interesting though, "sort", "uniq", and "diff" together still don't weed out identical lines all the time.
 
Diff isn’t a trivial thing to implement.
You’re not necessarily looking at line by line differences. You could be looking at entire blocks of lines.

So for example:
If fileA and fileB are two different versions of a source code file and you want to find the differences between them. Some of the differences could be blocks of several lines of unique code that have either been added, or removed, or possibly moved. Other changes will be smaller and more atomic. Some lines may differ by a few words, or a few characters.

So it’s not a case of doing a straightforward, line by line comparison.
I’ve literally just woken up though, so my brain isn’t properly alive yet.
I’ll have a think and will try to post later, if I can!
 
Diff isn’t a trivial thing to implement.
You’re not necessarily looking at line by line differences. You could be looking at entire blocks of lines.

So for example:
If fileA and fileB are two different versions of a source code file and you want to find the differences between them. Some of the differences could be blocks of several lines of unique code that have either been added, or removed, or possibly moved. Other changes will be smaller and more atomic. Some lines may differ by a few words, or a few characters.

So it’s not a case of doing a straightforward, line by line comparison.
I’ve literally just woken up though, so my brain isn’t properly alive yet.
I’ll have a think and will try to post later, if I can!
I'm trying to make something that finds lines that are totally unique (just doesn't appear in one of the documents) and then also print the number from the original un-sorted document, i think it might work if you nest a for loop inside of another one, and compare the line from the first one to every line of the second one, but i haven't tried and i realized im better off testing things step by step instead of trying to make it work all at once...
 


Latest posts

Top