how to use awk,sed, grep

X

xiaoxiaodong2013

Guest
Question:
I want to replace the line beginning with ">gi" to a simple "NW_*.1" , so what linux command should i do to promise just changing the ">gi" line?

>gi|417531841|ref|NW_004080164.1| Ovis aries breed Texel chromosome 1 genomic scaffold, Oar_v3.1 OAR1, whole genome shotgun sequence
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
>gi|417531785|ref|NW_004080165.1| Ovis aries breed Texel chromosome 2 genomic scaffold, Oar_v3.1 OAR2, whole genome shotgun sequence
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
……
……
……
>gi|5835554|ref|NC_001941.1| Ovis aries mitochondrion, complete genome
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
 


It' s a very good linkage, but I want to retain NW_*1, they are changing in every line started with ">gi", so the replaced name should change, I still don't how to operate it . The output lines should be :

NW_004080164.1
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
NW_004080165.1
ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
 
You have the lines you want to operate on, so based on what's in the tutorial, simply try some modifications via command line. That's how I figure things out.
 
In bioinformatics we do not change a standard format (fasta) by a non standard in more to write these data into space disk and duplicate information …
 
Yeah, I resolve it. Thanks for your linkage.

Because the ID in .fa file should be similar with .gff file if I use the tophat to apply the two files together. If I don't change , there will be an error like: could not build bowtie index with err=1.
 

Staff online

Members online


Latest posts

Top