How to merge two files?

R

rclostio

Guest
I have two very large files that were run slightly different. Some of the lines of data are the same but some of them are different. I would like to merge all of the data into a new file that includes everything that is the same between the two files and everything that is different. The merge command will not work nor will the cat command. The sdiff command probably would work but it is interactive and these file are way to large for me to deal with interactively and I have several pairs of them that need to be done this way. Any advice on how to do this would be helpful.
 


I tried this and it seems to work but it is the right way to solve this issue?

cat file1 file2 >> file3
sort file3 | uniq
 
I have two very large files that were run slightly different. Some of the lines of data are the same but some of them are different. I would like to merge all of the data into a new file that includes everything that is the same between the two files and everything that is different. The merge command will not work nor will the cat command. The sdiff command probably would work but it is interactive and these file are way to large for me to deal with interactively and I have several pairs of them that need to be done this way. Any advice on how to do this would be helpful.
We have no knowledge of the structure of the file. We would need more information about the two files:
  • Are the files already sorted? If not, should they be sorted before merging?
  • If so, how should they be sorted?
  • Are there duplicate lines in one of the files that need to be maintained, or...
  • Can we assume that each line in each file are unique?
  • Are we merging the files based on the first character, and subsequent characters, or...
  • By a specific field or position?
  • Can you provide us with two sample files to look at?
The more info you can provide, the better we can answer.

Thanks!
 
I attached a sample file of the data. Basically from one "@" symbol to another is one string of data or "read" that needs to be kept together. Both files look similar to this but some of the reads will vary between files. I need to merge these files so that all of the reads get represented within a final file. Within a single file there should be no replicate reads. Let me know if you need any further info.

Thanks!
 

Attachments

  • example.txt
    3.8 KB · Views: 1,295

Members online


Latest posts

Top