Hi
I was wondering if you could help me out to propose a command for the issue I need to resolve. I basically need to erase all characters following the first word, including white space, in multiple lines containing as first character the symbol ">" inside a plain text file (extension .fa), for multiple files inside same folder. For instance, for one file the first lines looks like this
>KLDFOOAE_00001 tape measure protein [Vibrio cholerae]
MANNLKTDIVLNLQGDLAQKARSYSKEMTTLATRSKAAFSMISSSAIAASRGIDTFGNRL
LFITGAAAVGFERTFVKTAAEFERYQTMLNKLQGSPEAGAKAMAWIEEFTQNTPYAIDEV
TQSFVRLKAFGIDPMDGTMQSIADQAAMIGGTAETVEGIATALGQAWTKGKLQSEEALQL
>KLDFOOAE_00002 MULTISPECIES: phage tail assembly protein [Vibrio]
MAVMTFNLEDGFKVGDAQCHEVGLKELTPKDVFDAQLASEKIGILNGRPHAYTSDVQMGM
ELLCRQVEFIGNVQGPFSVKEILKLSSRDFATLQQKARELDDIMFSDDALEGLEARGRD
>KLDFOOAE_00003 MULTISPECIES: hypothetical protein [Vibrio]
MEHVYQLVDGIVFKGKLQKQVTLHPIDSVSYDLVEQLVEEQLQHIQNQADVVLVNDSHLQ
GLKGYMLLNESAASSISKIGDENVDLMFFDLCQLKISAQDWNVILTANLAIAEYYANQAA
MLA
I need to keep the symbol and the code of the file, not the description, so the file I need would look like this
>KLDFOOAE_00001
MANNLKTDIVLNLQGDLAQKARSYSKEMTTLATRSKAAFSMISSSAIAASRGIDTFGNRL
LFITGAAAVGFERTFVKTAAEFERYQTMLNKLQGSPEAGAKAMAWIEEFTQNTPYAIDEV
TQSFVRLKAFGIDPMDGTMQSIADQAAMIGGTAETVEGIATALGQAWTKGKLQSEEALQL
>KLDFOOAE_00002
MAVMTFNLEDGFKVGDAQCHEVGLKELTPKDVFDAQLASEKIGILNGRPHAYTSDVQMGM
ELLCRQVEFIGNVQGPFSVKEILKLSSRDFATLQQKARELDDIMFSDDALEGLEARGRD
>KLDFOOAE_00003
MEHVYQLVDGIVFKGKLQKQVTLHPIDSVSYDLVEQLVEEQLQHIQNQADVVLVNDSHLQ
GLKGYMLLNESAASSISKIGDENVDLMFFDLCQLKISAQDWNVILTANLAIAEYYANQAA
MLA
Besides, I need to do this for multiple files inside same folder
Best regards, and thanks ahead
I was wondering if you could help me out to propose a command for the issue I need to resolve. I basically need to erase all characters following the first word, including white space, in multiple lines containing as first character the symbol ">" inside a plain text file (extension .fa), for multiple files inside same folder. For instance, for one file the first lines looks like this
>KLDFOOAE_00001 tape measure protein [Vibrio cholerae]
MANNLKTDIVLNLQGDLAQKARSYSKEMTTLATRSKAAFSMISSSAIAASRGIDTFGNRL
LFITGAAAVGFERTFVKTAAEFERYQTMLNKLQGSPEAGAKAMAWIEEFTQNTPYAIDEV
TQSFVRLKAFGIDPMDGTMQSIADQAAMIGGTAETVEGIATALGQAWTKGKLQSEEALQL
>KLDFOOAE_00002 MULTISPECIES: phage tail assembly protein [Vibrio]
MAVMTFNLEDGFKVGDAQCHEVGLKELTPKDVFDAQLASEKIGILNGRPHAYTSDVQMGM
ELLCRQVEFIGNVQGPFSVKEILKLSSRDFATLQQKARELDDIMFSDDALEGLEARGRD
>KLDFOOAE_00003 MULTISPECIES: hypothetical protein [Vibrio]
MEHVYQLVDGIVFKGKLQKQVTLHPIDSVSYDLVEQLVEEQLQHIQNQADVVLVNDSHLQ
GLKGYMLLNESAASSISKIGDENVDLMFFDLCQLKISAQDWNVILTANLAIAEYYANQAA
MLA
I need to keep the symbol and the code of the file, not the description, so the file I need would look like this
>KLDFOOAE_00001
MANNLKTDIVLNLQGDLAQKARSYSKEMTTLATRSKAAFSMISSSAIAASRGIDTFGNRL
LFITGAAAVGFERTFVKTAAEFERYQTMLNKLQGSPEAGAKAMAWIEEFTQNTPYAIDEV
TQSFVRLKAFGIDPMDGTMQSIADQAAMIGGTAETVEGIATALGQAWTKGKLQSEEALQL
>KLDFOOAE_00002
MAVMTFNLEDGFKVGDAQCHEVGLKELTPKDVFDAQLASEKIGILNGRPHAYTSDVQMGM
ELLCRQVEFIGNVQGPFSVKEILKLSSRDFATLQQKARELDDIMFSDDALEGLEARGRD
>KLDFOOAE_00003
MEHVYQLVDGIVFKGKLQKQVTLHPIDSVSYDLVEQLVEEQLQHIQNQADVVLVNDSHLQ
GLKGYMLLNESAASSISKIGDENVDLMFFDLCQLKISAQDWNVILTANLAIAEYYANQAA
MLA
Besides, I need to do this for multiple files inside same folder
Best regards, and thanks ahead