Username checks out I guess, lol. Could you please shed some light on this then? It's my understanding that the reference genome is created using contigs via overlapping reads - does this not mean it's a consensus sequence on the most common nucleotide at each position? Or is it more that long stretches which are generally similar enough to overlap aid in determining location along the genome?

Also T2T is more recent right? I was mainly referring to the 2003 method in my first comment.


This is best answer here so far as it mentions the concensus sequence. I.e. Using multiple people, we have determined which nucleotide is the most common at every position. That most common one is included in the reference sequence. And every human will stray from that sequence in different positions along their genome on average about 3 million times. And also as they said, this has been fine tuned over the years with more people and faster/more accurate sequencing technology