Ianisanengineer

Ianisanengineer t1_ivfr9hy wrote

You are correct that the initial project involved sequencing the genomes of just a few individuals, and the limitations associated with that were recognized immediately. However, one must consider the purpose of the Human Genome Project:

  1. To map and sequence the human genome in order to have a standard map or reference sequence; to understand what genes are present and the broad structure of the human genome
  2. To develop the technology necessary to achieve aim 1. At the inception of the HGP, the prospect of sequencing the entire human genome with existing technology was essentially futile; a big part of the HGP was R&D.
  3. To enable the study of human genetics/genomics and understand the impact of genetic diversity of human health.

In order to achieve the first aim, almost any human will do. As others have pointed out, humans are very, very similar to one another genetically, sharing the vast majority of their genetic material. To get a broad reference genome, it wasn't terrifically important whose genome you had, and some of the "final" reference genomes at the end of the project were hybrids of a few individuals. Again, this doesn't really matter, because it's mostly identical anyway, and the differences are, for the most part, single nucleotides at specific locations.

Arguably the biggest reason for undertaking the HGP was aim 2. In the early 90s, the best sequencing technology available would have taken decades to complete even a single genome with global cooperation. Practically speaking, when the HGP was begun, it was technologically impossible. The imposition of such a colossal goal, however, drove the development of the next-generation sequencing technologies and techniques still in use today that enable rapid sequencing of very large genomes.

The third goal, which is ongoing, is where the limitations of a small sample size come in. Because in order to study human genetics and look at how genetics impact health, it's not enough to have a single reference genome - we need population data for that. The HGP kickstarted that process by providing a small number of reference genomes, but since the completion of those first few genome sequences, hundreds of thousands of additional people have been sequenced and those data have been pooled. This operation is not complete - our collection of human genomics information is still broadly under-representative of certain groups/ethnicities of people, in particular indigenous populations of Australia and the Americas and people of sub-Saharan African descent. That last population is particularly important, because our current data suggest that the vast majority of human genetic diversity is concentrated in sub-Saharan Africa, so there's a lot to be learned by studying these populations.

44