Submitted by bjardd t3_yoqp81 in askscience

As far as I'm aware, the Human Genome Project was developed using DNA from a number of volunteers.

If the project generated a patchwork map of these people's genetics, then surely the results are specific to them and not to the whole population?

Is it that the overall structure is the same but there are just variations across individuals that don't make a huge difference to the main bulk of the genome? If this is the case then why could DNA from just one individual be used?



You must log in or register to comment.

FellowConspirator t1_ivfiqoh wrote

Of 3.2 billion bases, about 10 million bases are known to be variant, and on average each person has 100 thousand or so of those variants.

We’re all genetically distinct and unique, but we’re overwhelmingly similar to one another. The reference genome provides a structure upon which we can make notations of variation, localization of features / functions, etc.


MozeeToby t1_ivfpuys wrote

In addition to this, the human genome serves as a reference genome. You don't need to say "this patient has this string of 500 base pairs representing their variant" you can just say "at position 125 they have a GTA instead of ATA and at 244 they have 6 duplicated pairs"


BeardOBlasty t1_ivhmayn wrote

Yea I've always understood genes as being the beginning framework, and then as you grow and develop the little differences along the way is what make the unique human.

I always thought identical twins are the example that very similar genes can still result in very different people....or not. It's all about how the framework grows in it's environment.


julie78787 t1_ivi1vcy wrote

Identical twins will not have the exact same DNA at every point in each of their chromosomes. Each a cell divides, including the division which resulted in twinning, some number of mutations are likely to occur.


SuitableClassic t1_ivj99ay wrote

Is that why my identical twin is an ugly dofus, and I'm gorgeous beyond compare?


slouchingtoepiphany t1_ivk2qlk wrote

That's correct, the genes for beauty disproportionately went to you, unfortunately, the genes for intelligence might have gone the other way. Only kidding!!! :)


Dyvion t1_ivjgy3v wrote

See the movie, "Twins" for a true to life perfectly real example of this.


heresacorrection t1_ivgtoop wrote

On average we expect an individual to have millions of variants that differ from the reference. Most of which are inconsequential (i.e. not malignant).'s,specific%20changes%20in%20DNA%20sequence.

In addition, relative to the reference, the variability is dependent on your origin.

"Consistent with the out-of-Africa model of human origin, the number of variant sites per genome is highest among Africans (∼5 million variants) compared with individuals of East Asian, European, or South Asian ancestry (∼4.0–4.2 million variants) "


ivan_drago27 t1_ivh98dx wrote

Such a good note to add on the out-of-Africa model, thank you for including that. Been a few years since I actively studied this stuff and that made me want to dig into some theory again.


CaptainHunt t1_ivgay0v wrote

Also, while the sample size was comparatively limited, samples were taken from a wide variety of people from all over the world.


azuth89 t1_ivfglx7 wrote

The vast majority of our genome is identical. Heck we share 44% of our DNA with a banana, with another person it's nearly 100.

Mapping disparate individuals allows us to try and connect their traits to specific bits of the remaining, variable portions of the genome. It will also help us define what is dictated by genes, what is a genetic propensity which may or may not be activated by environmental factors or behaviors and what is purely environmental.

Once that's done, we can predict a wide variety of factors. Risk factors for various diseases and disorders are of most interest now, but this will inevitably lead to identifying other factors. Once those are identified the inevitable endpoint is editing. First to remove defects like a propensity to say...diabetes, heart failure or even something more minor like myopia but once we start down that road the line between removing risk factors and adding desirable ones is real blurry.

Fully mapping a genome is incredibly labor intensive, but the sample size will increase over time to enable this sort of thing and these initial mappings do a lot to determine what we need to investigate and what can be mostly ignored as the common background to humans.


E_B_Jamisen t1_ivfw1kv wrote

what kind of changes can we expect to see? like in a few hundred years, will we be able to change the DNA so someone has gills or 4 arms? or is the extent going to be "you arent lactose intolerant"?


azuth89 t1_ivg1c4j wrote

A good example would be to look at modern GMO crops. GMO or otherwise fruit is fruit, veg is veg but they're able to play a bit with size, disease resistance, need for certain nutrients from the environment. They're able to play with size and growth rate within a limit IF the GMO gets the right rnviroent to support it. Lots of little tweaks like that.

Optimization and removing/reducing weaknesses is certainly possible, superpowers involving a whole different body plan like gills or 4 arms, not so much.


PurpleSunCraze t1_ivgmbdd wrote

Is it “it may be possible but the tech isn’t there” or “fundamentally, it is not possible”?

I’ve seen these type of questions in regards to someday the possibility of genetic manipulation to the point of having a human go through a completely, biological sex change, you think that will remain in the realm of sci-fi forever? For the stuff we can change now, is it a slow, glacial process for the change, or in the case of my question, 30 minutes of the worst pain ever felt by a human being?


newappeal t1_ivosk9i wrote

Every organism that exists or ever existed came to be through the interaction of its genome and its environment, which is essentially a huge complex of chemical reactions. So if you can edit genomes and control an organism's local environment (both possible), then you can produce at least anything that has existed and unfathomably many things that don't. That doesn't mean literally anything imaginable, but it does mean many, many things.

However, the ability to grow organisms with arbitrary characteristics requires biochemical knowledge far beyond what we have now. The technical limitations we currently face are nothing compared to the knowledge gap.

Moreover, the hypothetical scenario I'm talking about here involves creating an artificial genome in an artificial cell and then growing a macroscopic organism with an arbitrary body plan from it. That's theoretically possible, for sure, because it happens literally all the time in natural contexts.

But what you're describing with this hypothetical full-body genetic-level sex change of an adult human doesn't really make sense from a technical perspective. I mean, sure, it's theoretically possible to completely deconstruct a human body to the molecular level and then construct a new one, but that has nothing to do with genetic engineering. Remember that we're not talking about growing an organism from a single germline cell in this case - we're talking about restructuring every single somatic cell in a fully-developed organism. The composition and structure of tissues and organs are not determined by their cells' current genetic makeup (even if we include epigenetics); rather, they are the result of biochemical changes across their entire genetic history. Simply swapping out every single cell's DNA in an organism (even if we could do that) would not cause the organism to suddenly transform into the organism it would have been if it had had that genome from the start.

Here's an analogy: If you change the blueprints of a house before the house is built, then you change the house. But if you change the blueprints after construction, the house doesn't change. All you would do is cause problems for anyone who wanted to repair or remodel the house, because the plans wouldn't match the actual house. Can you tear down the house and rebuild it a different way? Sure. But that's a fundamentally different process.


silent_cat t1_ivkf0db wrote

> Is it “it may be possible but the tech isn’t there” or “fundamentally, it is not possible”?

For a god it may be possible, for us, not so much. For comparison it's like modifying a program in a language we don't understand. This program has grown randomly over aeons and the strangest things relate to each other. You're basically stuck with randomly changing things and seeing what happens. But since people mature so slowly that's very slow progress.

During the maturation process of a foetus, there's lots of little tripwires to abort if something weird is going on.

That said, if you find some baby born with gill like structures or four arms, if you sequence them you might get a head start. But the slow testing phase will be a problem.

Evil science fiction villain mode: unless of course we figure out how to grow foetuses into babies without a woman being involved, you could build a factory to test 100,000 variations all at once. That would speed it up a bit.


Dont____Panic t1_ivg6k6n wrote

Some genetic manipulation can do things like turn hair into feathers and fingernails into scales, so that is possible, although it would be enormously unethical in humans.


HaV1nG15sueS t1_ivhf698 wrote

One of the more recent hot topics is CRISPR gene editing, which looks quite promising. Give it a look


turgidNtremulous t1_ivlj2nn wrote

It's worth pointing out that in humans, the jaw and (I believe) parts of the inner ear are derived from the same embryonic structures that turn into gills in fish. So, in a sense, you do have gills.


RebelClown86 t1_ivgjtaf wrote

So why is so much of our genome identical? I've always that a lot of our DNA is inert, so I would expect that to have accrued mutations over time. Is that not the case?


Career_Secure t1_ivgxyaj wrote

Probably because of the small population of people that today’s world population descends from, and the fact that mutation possibilities that are lethal or highly disadvantageous to survival don’t persist and are by default ruled out (core/important regions stay conserved between people).

The idea that a lot of our DNA is inert stems from when the human genome was sequenced, and they found out only a small percent of it codes for genes that go on to get translated into proteins. But, over time, scientists are finding that these non-coding regions of DNA don’t do nothing; in fact, they play many biological roles in regulating the expression of protein-coding genes and can have significant physiological impacts and relevance.


Ianisanengineer t1_ivfr9hy wrote

You are correct that the initial project involved sequencing the genomes of just a few individuals, and the limitations associated with that were recognized immediately. However, one must consider the purpose of the Human Genome Project:

  1. To map and sequence the human genome in order to have a standard map or reference sequence; to understand what genes are present and the broad structure of the human genome
  2. To develop the technology necessary to achieve aim 1. At the inception of the HGP, the prospect of sequencing the entire human genome with existing technology was essentially futile; a big part of the HGP was R&D.
  3. To enable the study of human genetics/genomics and understand the impact of genetic diversity of human health.

In order to achieve the first aim, almost any human will do. As others have pointed out, humans are very, very similar to one another genetically, sharing the vast majority of their genetic material. To get a broad reference genome, it wasn't terrifically important whose genome you had, and some of the "final" reference genomes at the end of the project were hybrids of a few individuals. Again, this doesn't really matter, because it's mostly identical anyway, and the differences are, for the most part, single nucleotides at specific locations.

Arguably the biggest reason for undertaking the HGP was aim 2. In the early 90s, the best sequencing technology available would have taken decades to complete even a single genome with global cooperation. Practically speaking, when the HGP was begun, it was technologically impossible. The imposition of such a colossal goal, however, drove the development of the next-generation sequencing technologies and techniques still in use today that enable rapid sequencing of very large genomes.

The third goal, which is ongoing, is where the limitations of a small sample size come in. Because in order to study human genetics and look at how genetics impact health, it's not enough to have a single reference genome - we need population data for that. The HGP kickstarted that process by providing a small number of reference genomes, but since the completion of those first few genome sequences, hundreds of thousands of additional people have been sequenced and those data have been pooled. This operation is not complete - our collection of human genomics information is still broadly under-representative of certain groups/ethnicities of people, in particular indigenous populations of Australia and the Americas and people of sub-Saharan African descent. That last population is particularly important, because our current data suggest that the vast majority of human genetic diversity is concentrated in sub-Saharan Africa, so there's a lot to be learned by studying these populations.


OtHanski t1_ivff86u wrote

If you were to only map one person, how would you know which parts are specific to him and which parts are common between all humans?

But if you map enough genomes of people with different traits, you can start to actually figure out which genes might affect what or which genes are more or less common in a certain population.


could_use_a_snack t1_ivfqdd7 wrote

It's like a car. 99% of the parts are identical, if you looked at those parts you would be able to determine that it is a Toyota Corolla. But you need the other 1% to identify the trim package. And if something is different in that first 99% then you might want to take a real good look at it.


shadestark11 t1_ivfhd3i wrote

Not an expert but they used multiple volunteers to build a consensus sequence. Which is basically taking the most common/prevalent fragment. It’s also misleading when someone says two human genomes differ by 0.1% only since it’s 0.1% of around 3 billion base pairs so roughly 3 million bp which by itself is a huge number and can help explain a lot of differences.

Would also like to add that post HGP(which ended in 2003 and the produced sequence was filled with gaps) we have sequenced a lot more individual genomes and the variance is now accepted to be around 0.3%-0.4%. If you’re interested, you could look into the recent publication of gapless human genome.


davedeoreo t1_ivggkbu wrote

This is best answer here so far as it mentions the concensus sequence. I.e. Using multiple people, we have determined which nucleotide is the most common at every position. That most common one is included in the reference sequence. And every human will stray from that sequence in different positions along their genome on average about 3 million times. And also as they said, this has been fine tuned over the years with more people and faster/more accurate sequencing technology


heresacorrection t1_ivgriyw wrote

This is factually untrue. The reference genome is constructed in a way that does not necessarily include the most common variant at a given position. The telomere-to-telomere (T2T) assembly is a single female individual (excluding the Y-chromosome).


davedeoreo t1_ivgtzqx wrote

Username checks out I guess, lol. Could you please shed some light on this then? It's my understanding that the reference genome is created using contigs via overlapping reads - does this not mean it's a consensus sequence on the most common nucleotide at each position? Or is it more that long stretches which are generally similar enough to overlap aid in determining location along the genome?

Also T2T is more recent right? I was mainly referring to the 2003 method in my first comment.


AbortionSurvivor777 t1_ivhwikq wrote

The human's genome can be seen as the instruction manual for the assembly of that entire human. While, yes those instructions are specific to that particular human, the human genome project essentially gave us the chapter titles and the total length of the instruction manual. The exact words on the pages are different for everyone, but the chapter titles and length are essentially the same for every human.


DumbDekuKid t1_ivfr9jt wrote

DNA from one individual is not used to determine consensus genomes anymore, if ever. Many individuals genomes are sequenced. Go to UCSC genome browser and view the most recent human genome build (hg38). Look at your favorite gene, view SNPs. In any single gene, a few single base pairs will vary. These are mapped to the reference genome. It is also very important to note that if you are a physician for example, and you have your patients genome sequenced by the hospital genomics core, the core needs a reference genome to map the patients sequences to so we can easily see where the reads fall in the complete human genome and if there are any significant mutations (done often for cancer patients). Having even a single human genome, fully annotated, to serve as a reference for mapping new sequences to, is ridiculously useful, because humans aren’t that different.


powabiatch t1_ivfr1li wrote

By now we have hundreds of thousands of people’s full (whole genome) DNA sequences. Iceland alone has a project to sequence its population, for example. Many cancer sequencing projects sequence normal tissue as well.

Human DNA is so similar it’s enough to publish a few genomes and then simply annotate the differences (e.g. single nucleotide polymorphisms). All major genome webportals include these data.


Sicon3 t1_ivgr87j wrote

Everyone has the same genes with very few exceptions. What changes person to person is the allele which is the specific mutation of said common gene.

Here's an analogy. Say you own a blue Honda Civic and your neighbor owns a silver one? Do you have the same type of car?

Of course you do they are both Honda civics they only differ in their paint color. Alleles work the same way. Most variants are only 1-5 bases removed from any other version since greater mutation tends to break the gene outright instead of creating a variant.

Do an example of an actual allele i have blue eyes so I know both copies of the gene that controls it (this is oversimplified but I'm trying to be quick and easy) code to produce very little melanin in my iris.

Someone with brown eyes has the exact same gene controlling their eye color but it is just ever so slightly different and as a result tells the body to produce more melanin resulting in brown eyes.

The human genome project while representing only a few different alleles represents 99.999999999999% of the genes found in the entire human population. As a result you can base a lot of conclusions on it and with the advent of widespread DNA testing we are building libraries of different alleles which will allow for even more targeted medicine going forwards.


humanspeech t1_iviibem wrote

Complicated: just because the genome has been sequenced doesn’t mean that we all have the same genes codes. It’s just to be used as a reference in order to find other mutations and have a rough idea of what the genome should look like.

One of the problems with the current model is that it fails to account for natural variation in terms of race or location. There’s a reason each country has its own program for genome sequencing, and I think England is doing it the best by randomly asking people to join rather than self report bcus that way you get a wider variety of variations.

In general, as a sidebar, The HGP also doesn’t account for the exome or other environmental factors. The variations in our genes are sometimes just base pairs but that mutation can be very significant, so having a “healthy” reference or protein can help us understand why it happens.


FelipeReigosa t1_ivfr79d wrote

I think if you sample enough individuals (a relatively small number) you can capture the whole genetic diversity of a species. The uniqueness of each person is a combinatorial thing, think of it like this (and obviously that's a huge oversimplification): suppose humans had only 10 genes and each gene had 3 variants (alleles). Then there are only 30 different genes. But with that you can make 3^10 = 59049 different people. You only have to map the genome of enough people to get the 30 genes and you've captured the whole genetic diversity of the human race.


Level_Rule2567 t1_ivfyfrr wrote

After the human genome project, and with the development of new sequencing techniques (next generation sequencing) new projects have arised. 1000 genomes was one of the first ones, that was latter expanded to 2500. Other projects appeared from then, like the 100.000 genomes project, and the Simons genomic diversity project.