APPLICATION OF HIGH-THROUGHPUT SEQUENCING IN GENOME WIDE ANALYSIS
Young Researcher Corner - December 2011, by Immacolata Brigida
One of the major tools of biological research is the knowledge of DNA sequences to apply identification of mutations or vector integration profile in gene corrected cells. Gene expression is a complex trait influenced by cis- and trans-acting genetic and epigenetic variation, and also by environmental factors. The characterization of the human genetic variation that affects gene expression could be studied with different strategies, and next-generation sequencing technologies are generally used for global functional genomics assays. To this aim, high throughput sequencing techniques were developed for quantitative genetic and epigenetic studies in order to deepen insight gene functions. Basic approaches for DNA sequencing started in early 1970s with the use of Sanger's method, based on the separation of DNA bases from different DNA fragments, allowing the determination of only a few hundred nucleotides at a time.
The key principle of the Sanger method was the use of single-stranded DNA template, a DNA primer, a DNA polymerase, deoxynucleotidetriphosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation, which could be radioactively/fluorescently labeled for detection in automated sequencing machines. Major issues rely on the time used for the separation of different fragments by size, even though allowing the identification of thousands of interspersed genomic features and protein-DNA interactions, and the cloning procedures depending on base composition, length, and interactions with the bacterial host system. To overcome these limitations, alternative sequencing techniques have been developed by different companies, able to outperform the older Sanger-sequencing technologies by a factor of 100-1000 in daily throughput, with a significant reduction of the cost of the analyses per sample, with the final goal to create platforms to share high-throughput sequencing capacities. This facilitates not only the study on molecular and biological issues with higher resolution and efficacy likely the de novo sequencing of unknown genomes, but also the identification of several epigenetic features at genome wide level, like methylated DNA loci and DNase I hypersensitive sites. Furthermore, they were combined with chromatin immuno-precipitation procedures (ChIP-seq), allowing to high-throughput map a number of protein-DNA interactions and chromatin features like DNA polymerase and transcription factors binding sites as well as histone tails acetylation and methylation in several cell types. Chromatin immunoprecipitation (ChIP) process used massively parallel sequencing, specific DNA sites that interact with transcription factors or other chromatin-associated proteins (non-histone ChIP) and sites that correspond to modified nucleosomes (histone ChIP), in order to enrich the crosslinked proteins or modified nucleosomes of interest using an antibody specific to the protein or the histone modification. The purified DNA is next sequenced using different next-generation platforms, with enzyme-driven extension of all templates in parallel. After each extension, the fluorescent labels that have been incorporated are detected through high-resolution imaging. Pyrosequencing was developed by 454 Life Sciences, and consists in amplification of DNA inside water droplets in an oil solution (emulsion PCR). Each droplet contains a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picoliter-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light to detect individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs. The Illumina (Solexa) sequencing is based on reversible dye-terminators in which DNA molecules are first attached to primers on a slide and amplified in order to form local clonal colonies (bridge amplification). Four types of ddNTPs are added, and non-incorporated nucleotides are washed away. Unlike pyrosequencing, the DNA can only be extended one nucleotide at a time. A camera takes images of the fluorescently labeled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle. Recent applications of these techniques involved also RNA deep sequencing (RNA-seq) with the full analysis of the transcriptome and exome as well as miRNAs identification and quantification.
Applications of high-throughput sequencing were used in the last years for the study of chromatin structures in hematopoietic cells, in order to shed light to the genome wide localization of epigenetic marks and their influence on lineage differentiation. DNase I hypersenstive sites (HSS) are short regions of chromatin able to bind transcription factors resulting in the displacements of histone octamers leaving these regions more susceptible to DNase I enzyme cleavage. HSS are considered markers for many different types of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions and are primarily associated to transcribed and expressed genes. Nevertheless, the high-throughput genome wide analysis detects a number of HSS in intergenic regions highlighting the presence of long-distance acting enhancer. The application of the CHIP-seq technique to hematopoietic cells was recently described for the identification of several histone modifications, likely methylations of lysine and arginine. This process induces alterations in the charges on the protein surface influencing histone-histone interactions and the strength of their binding to DNA molecules, influencing the nucleosome positioning, the DNA methylation status and the chromatin condensation by the interaction with several regulatory proteins. Some combinations of different histone methylations are associated with euchromatin configurations (H3K4me3 or H3K27me1), which are usually associated with active promoters and transcribed genes, while some others with heterochromatin formation (H3K9me3 and H3K27me3), found more often in inactive/silenced loci.
More recently the high-throughput technique was coupled with DNA bar-coding to track single hematopoietic stem cells during in vivo differentiation in mouse model, allowing the identification of the same integration found in the HSC into the daughter. Moreover the use of bar-coding, although being a process requiring several hours, did not alter HSCs function. Coupling these two methods allows a better characterization of HSCs, in terms of single cell level clonal tracking of both in vitro and in vivo processes for virtually any cell type that can be infected by a lentivirus, thereby separating the behavior of HSCs from that of other progenitors by directly measuring their clonality. This system can be used to simultaneously track the proliferation and development of hundreds of cells in vivo with single-cell precision. High throughput is critical to study rare or stochastic cellular events, revealing novel features in presence of low cell numbers. Moreover the high sensitivity of the barcode tracking system allows a direct examination of the entire hematopoietic process starting with the hematopoietic stem cells themselves. For instance, this barcode tracking system can be applied to cell and gene therapy to follow and quantify the fate and distribution of transplanted cells. The high sensitivity of this technique allows for the analysis of clinical samples with very low cell numbers and for the identification of early-stage malignancy before subsequent expansion or metastasis.
- Genome-wide allele-specific analysis: insights into regulatory variation. Pastinen T. Nat Rev Genet. 2010 Aug;11(8):533-8.
- A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Sanger F, Coulson AR. J. Mol. Biol. 1975 94(3): 441–8.
- High-throughput DNA sequencing--concepts and limitations. Kircher M, Kelso J. Bioessays. 2010 Jun;32(6):524-36.
- De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Diguistini S, Liao NY, Platt D, Robertson G, Seidel M, Chan SK, Docking TR, Birol I, Holt RA, Hirst M, Mardis E, Marra MA, Hamelin RC, Bohlmann J, Breuil C, Jones SJ. Genome Biol. 2009;10(9):R94.
- DNA methylation profiling of human placentas reveals promoter hypomethylation of multiple genes in early-onset preeclampsia. Yuen RK, Peñaherrera MS, von Dadelszen P, McFadden DE, Robinson WP. Eur J. Hum Genet. 2010 Sep;18(9):1006-12
- Genome Sequencing in Open Microfabricated High Density Picoliter Reactors. Margulies M, Egholm M, Altman WE, et al Nature 2005 437(7057): 376–80.
- Next-generation sequencing transforms today's biology. Schuster SC. Nat. Methods 2008; 5 (1): 16–8.
- ChIP-seq: advantages and challenges of a maturing technology. Park PJ. Nat Rev Genet. 2009 Oct;10(10):669-80.
- Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Lu R, Neff NF, Quake SR, Weissman IL. Nat Biotechnol. 2011 Oct 2;29(10):928-33.