Crosspost: https://www.nixcraft.com/t/converting-a-list-into-a-tab-separated-file-grouped-by-values/4517
Hello Linux.org,
This is my first post here and I'm hoping the community is inviting to novice shell scripters. So let me share.
I have a text file with a list of values. Here is a snip:
Values starting with a PIPE can be considered headers and values with a CARET are values to the header it is under.
My goal is to create a 'tab separated value' file with the headers on the left. So after much googling, man files, and some other forum help... I've managed to come up with two commands.. both output differently in my efforts.
First command:
Which produced:
The second command is:
Which produced:
Now my list has unique values in my test run. My new list has duplicates like so:
And the desired output I am looking for is this:
My intention is to group all values of the same value together and maintain the left header. I have no idea how to approach it with awk, sed, or tr. I did manage a way in excel, but the processing power it takes on my old computer is annoying. I think the cli will speed things up greatly.
My hardware is a RPi 4 running raspbian.
Any help or solutions will be greatly appreciated
Hello Linux.org,
This is my first post here and I'm hoping the community is inviting to novice shell scripters. So let me share.
I have a text file with a list of values. Here is a snip:
Code:
|01BFRUITS|
^banana
^apple
^orange
^pear
|01AELECTRONICS|
^television
^radio
^dishwasher
^computer
|01AANIMAL|
^bear
^cat
^dog
^elephant
|01ASHAPE|
^circle
^square
^diamond
^star
Values starting with a PIPE can be considered headers and values with a CARET are values to the header it is under.
My goal is to create a 'tab separated value' file with the headers on the left. So after much googling, man files, and some other forum help... I've managed to come up with two commands.. both output differently in my efforts.
First command:
Code:
cat test.txt | awk -v OFS='\t' '/^\|/{ c1=$0; gsub(/\|/,"",c1) } /^\^/{ c2=$0; sub(/^\^/,"",c2); print c1,OFS,c2 }' | sed -z s/\\r\\t\\t//g
Which produced:
Code:
01BFRUITS banana
01BFRUITS apple
01BFRUITS orange
01BFRUITS pear
01AELECTRONICS television
01AELECTRONICS radio
01AELECTRONICS dishwasher
01AELECTRONICS computer
01AANIMAL bear
01AANIMAL cat
01AANIMAL dog
01AANIMAL elephant
01ASHAPE circle
01ASHAPE square
01ASHAPE diamond
01ASHAPE star
The second command is:
Code:
cat test.txt | sed -z 's/\r\n\^/\t/g' | tr -d '|'
Which produced:
Code:
01BFRUITS banana apple orange pear
01AELECTRONICS television radio dishwasher computer
01AANIMAL bear cat dog elephant
01ASHAPE circle square diamond star
Now my list has unique values in my test run. My new list has duplicates like so:
Code:
|01BFRUITS|
^banana
^apple
^orange
^pear
^banana
^apple
^orange
^pear
|01AELECTRONICS|
^television
^radio
^dishwasher
^computer
^television
^radio
^dishwasher
^computer
^television
^radio
^dishwasher
^computer
|01AANIMAL|
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
^bear
^cat
^dog
^elephant
|01ASHAPE|
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
^circle
^square
^diamond
^star
And the desired output I am looking for is this:
Code:
01BFRUITS banana banana
01BFRUITS apple apple
01BFRUITS orange orange
01BFRUITS pear pear
01AELECTRONICS television television television
01AELECTRONICS radio radio radio
01AELECTRONICS dishwasher dishwasher dishwasher
01AELECTRONICS computer computer computer
01AANIMAL bear bear bear bear
01AANIMAL cat cat cat cat
01AANIMAL dog dog dog dog
01AANIMAL elephant elephant elephant elephant
01ASHAPE circle circle circle circle circle
01ASHAPE square square square square square
01ASHAPE diamond diamond diamond diamond diamond
01ASHAPE star star star star star
My intention is to group all values of the same value together and maintain the left header. I have no idea how to approach it with awk, sed, or tr. I did manage a way in excel, but the processing power it takes on my old computer is annoying. I think the cli will speed things up greatly.
My hardware is a RPi 4 running raspbian.
Any help or solutions will be greatly appreciated