Viewing a single comment thread. View all comments

Ah_Go_On t1_iya67is wrote

First of all, sequencing any complete reference genome is an achievement in itself, certainly in the 90s/early 2000s, because of the scale - ~3 billion base pairs (bp) in humans. I say "reference" genome because this is what the Human Genome Project (HGP) developed - a mapped, representative sequence, based on separate genomes from multiple individuals to account for genetic variability between individuals, which is then refined, over time, to be as complete and representative as possible. The first completed HGP genome was like 92% of the whole thing - we only fully sequenced it this year!

Beginning in 1976, we sequenced reference genomes for viruses (~5.3k bp), then bacteria (~1.8m bp), then yeast (~12m bp). Back then the methods for genome sequencing, now called first-generation sequencing, were extremely labour-intensive and expensive. But early developments verified and built on the methods for obtaining, sequencing and just as importantly storing, mapping and analysing genetic data on this scale. Obviously, the storage and analysis capabilities developed in tandem with the development of computers. So another way the HGP was important was simply by pushing the development of sequencing and analytical technologies which are critical for genetic analysis today._These would likely have developed anyway, but the concerted, collaborative effort of the HGP allowed for more focused and streamlined development, plus the "significance" of the project attracted serious funding. Also, the first reference genome from the HGP was essential for developing faster and cheaper second-generation sequencing, since new methods need a reference for orientation and verification (and improvement) of accuracy.

Having a reference human genome has basically been the basis of the study of human genetics and systems biology in the 21st century.

It has allowed for the formation and the achievements of hugely important projects like ENCODE, HapMap, 1000 Genomes, the Human Protein Atlas and the Cancer Genome Atlas. All the extensive work done on genes prior to sequencing the genome can now be integrated in sophisticated and interactive databases. We have lots and lots of genome-wide association studies (GWAS) that are powerful tools for associating gene traits with personal traits, including illnesses. Many clinical trials in cancer research include a blood sample collection for various GWAS. Personalised medicine has been a bit slower to kick off than initially predicted (or rather, initially envisioned) but having a reference genome has greatly assisted in the establishment of biomarkers for predicting, diagnosing and prognosing diseases.

By sequencing other genomes, we can compare our genome to others. This is absolutely huge for our understanding of evolution, and our kinship, with primates, but also very distant relatives, further confirming evolutionary theory with respect to a common ancestor. If a particular gene or sequence is basically the same across multiple organisms, we can safely assume it is very ancient, and essential to all life. Documenting small (or medium or large) differences in sequences that are similar across many organisms basically amounts to watching evolution happen. It has also taught us interesting things we could never otherwise know from history, e.g., that humans and Neanderthals interbred (since we sequenced the Neadarthal genome in 2010).

From a broader sociological perspective it was just great for science in the sense of creating a huge, international, collaborative community. In principle, any lab anywhere could volunteer to contribute to the data. The whole HGP pioneered and built upon previous efforts at fostering scientific transparency and open-source data (and software), which is definitely a good thing.

It's been so important, I have probably forgotten really basic important stuff that it was important for! But it is worth remembering that it created at least as many questions as answers. Huge chunks of the human (and other organisms') genome are still not understood. The signal/noise ratio is insane. Having the sequence is very useful, as long as you can usefully interpret it. We've made good advances in this but there is plenty of work still to be done.

4