sed -n '1p' filename |sed -n 's/ //gp' |wc -m |awk '{print($1-1)}'
That's because there may be no spaces in the line and is a result of the processing of the second sed command. The explanation of what that line achieved was clear in post #4, but the nature of the text files it was being applied to wasn't defined, so the line was really an example of how to proceed rather than a precise final answer applicable to all possible cases. In this case, if the text files do not contain any spaces in the first line of the text file, then one would need a conditional statement to that effect detecting first of all whether there were spaces, and if there were, to delete them, but if there were none, to ignore the relevant sed command. With such a conditional, the code could perhaps be written more clearly as a script rather than a one liner. I'll leave it to the interested observer to achieve that relatively simple modification.Hi
The command line doesn't work
it returns -1 for several files at the time ans a single file
sed -n '1p' *.fasta |sed -n 's/ //gp' |wc -m |awk '{print($1-1)}'
-1
sed -n '1p' 1Mo_UM.fasta |sed -n 's/ //gp' |wc -m |awk '{print($1-1)}'
-1
I would just add that some programs count the spaces as charectors also so your 38 would be diminished by the number of spaces as well.
Having specific targets such as these lines makes things much clearer and will enable you to be more precise. With this new info, my first suggestion is to delete the second sed command from the code in post #4 since there are no spaces. That should output the number of characters in the lines. Then you can add a conditional statement to show which lines contain 38 characters, including the symbol ">". There are a few ways of doing it. I'll have a think about it, meanwhile you may be able to get something together.Hi
Indeed, theres no spaces in the characters from the first line
all lines go like this
file 1: >Vibrio_cholerae_strain_1Mo
file 2: >Vibrio_cholerae_strain_39Ki
file 3: >Vibrio_cholerae_strain_107V1216
and so on....
I need to determine which files contains in the first line > 38 characters; counting the symbol ">"
head
and wc
on each file, like this:for file in ./*.fasta; do
numChars=$( head -n 1 "$file" | wc -m )
if [[ $numChars > 38 ]]; then
echo "$file has $numChars characters in line 1"
fi
done
for file in ./*.fasta ; do numChars=$(head -n 1 "$file" | wc -m ) ; if [[ $numChars > 38 ]] ; then echo "$file has $numChars characters in line 1" ; fi ; done
Not entirely pointless a lot of ops do not know the white space I.E. spaces are counted.Entirely pointless:
Take a look at my proposed solution a couple of posts above.Hi
I removed the sed part, nevertheless this is what i got
sed -n '1p' *.fasta |wc -m |awk '{print($1-1)}'
32
I need to do it for several files, thats why I use *.fasta, but it seems it only counted one file and didn´t display which one
for file in ./*.fasta; do
if [[ $(head -n 1 "$file" | wc -m) > 38 ]]; then
echo "$file"
fi
done
for file in ./*.fasta; do if [[ $(head -n 1 "$file" | wc -m) > 38 ]]; then echo "$file"; fi; done