Next Generation Sequencing (NGS) is a useful tool in determining the DNA sequence, information which is valuable in furthering our understanding of biological processes. Unlike some tools, NGS is flexible and it can be applied in different situations, ranging from the exome to the small RNAs. This flexibility means that there are parameters that needed to be considered prior to running an NGS experiment. This section will outline some of these important considerations below.
The current NGS platforms available on the market, although very accurate, are still prone to error. Even at accuracies of 99% and greater, a sequence generated may contain incorrect nucleotides. This means that if a machine’s accuracy is 99%, one base pair is read incorrectly out of 100 base pairs; since NGS platforms generate high amounts of output, these errors can add up quickly. The way to circumvent NGS platform limitations is to sequence nucleotides multiple times. The number of times a nucleotide is sequenced is referred to as “coverage”, or “depth” (1). Coverage may also be used to refer to the percentage of target bases that have been sequenced a specific number of times (1).
Coverage will vary depending on the type of NGS and the research application. More coverage tends to be used when in search for a variant that is less common (<1%) in a sample. An example is the detection of cancer mutations in tumour DNA circulating in the plasma of cancer patients (2). However, the appropriate coverage for an experiment is determined on a case-by-case basis. The coverage also varies depending on the NGS type (i.e. Whole Genome Sequencing). For instance, whole genome sequencing generally requires approximately 30x coverage, as this will detect 98% of heterozygous single nucleotide variants identified in a microarray. There is a way to compute coverage, as shown in the Lander-Waterman equation below (1).
C = LN/G
The equation consists of the following variables:
For general coverage guidelines, please refer to the table below:
NGS Type | Application | Recommended Coverage (x) or Reads (millions) | References |
---|---|---|---|
Whole Genome Sequencing | Homozygous Single Nucleotide Variants (SNVs) – single nucleotide changes in genes where the alleles are identical. | 15x | Bentley et al., 2008 |
Heterozygous SNVs – single nucleotide changes in genes where the alleles are different from each other. | 33x | Bentley et al., 2008 | |
Insertion/Deletion Mutations (INDELS) – mutations in the genome where nucleotides are inserted or removed. | 60x | Feng et al., 2014 | |
Genotype calling - determination of an individual's genotype. | 35x | Ajay et al., 2011 | |
Copy Number Variation (CNV) – variance in the number of copies of a gene between individuals | 1-8x | Xie et al., 2009; Medvedev et al., 2010 | |
Whole Exome Sequencing | Homozygous SNVs | 100x (3x local read coverage) | Clark et al., 2011; Meynert et al., 2013 |
Heterozygous SNVs | 100x (13x local read coverage) | Clark et al., 2011; Meynert et al., 2013 | |
INDELs | Not recommended | Feng et al., 2014 | |
RNA Sequencing - Transcriptome Sequencing | Differential expression profiling – quantitative measurement of gene expression across multiple genes to examine different levels of expression in the sample. | 10-25 million | Liu et al., 2014; ENCODE 2011 RNA-Seq |
Alternative splicing – identification of different splice variants from mRNA transcripts. | 50-100 million | Liu et al., 2014; ENCODE 2011 RNA-Seq | |
Allele specific expression – transcript expression which is affected by a specific gene allele. | 50-100 million | Liu et al., 2014; ENCODE 2011 RNA-Seq | |
De novo assembly – construction of a transcriptome without use of a reference sequence. | >100 million | Liu et al., 2014; ENCODE 2011 RNA-Seq | |
RNA Sequencing - Small RNA (microRNA) Sequencing | Differential expression – quantitative measurement of small RNA expression to examine different levels of expression in the sample. | ~1-2 million | Metpally et al., 2013; Campbell et al., 2015 |
Discovery of novel small RNAs. | ~5-8 million | Metpally et al., 2013; Campbell et al., 2015 | |
DNA Methylation Sequencing | Bisulfite Sequencing (Bisulfite-Seq) – sequencing which is done by treating genomic DNA with bisulfite to convert non-methylated cytosines to uracil. | 5-15x per strand or per replicate; 30x total methylome | Ziller et al., 2015; Epigenomics Road Map |
Adapted from Genohub website, “Table 1: Coverage and Read Recommendations by Application”
Consult our NGS experts for the appropriate coverage of your experiment by emailing [email protected]. |
Before a sample can be sequenced, it must be prepared into a sample library from genomic DNA or total RNA. A library is a collection of randomly sized DNA fragments that represent the sample input. However, depending on the type of NGS applications, different library preparation steps are taken. There are four types of NGS applications which are considered below: Whole Genome Sequencing (WGS), Exome Sequencing (Exome-Seq), RNA Sequencing (RNA-Seq), and Methylation Sequencing (Methyl-Seq). We will focus on the protocols used in the Illumina NGS platforms as it uses the most effective sequencing method, sequencing by synthesis, and generates the highest output of all the platforms currently on the market. For a more detailed explanation, please view our Next Generation Sequencing (NGS) – An Introduction knowledge base.
Library Preparation for Whole Genome Sequencing (WGS)
Whole Genome Sequencing, or WGS, refers to the sequencing of an organism’s entire genome. Sample library preparation for WGS is dependent on two considerations: 1) The genome size of the organism from which the sample was derived, and 2) the amount of sample available to be sequenced. Based on these two considerations, the method of sample library preparation can be specified.
1) Illumina TruSeq PCR-free Library Preparation Kit – Any Size Genome with Large Sample Input
The Illumina TruSeq PCR-free Library Preparation Kit is ideal if there is 1-2 μg of genomic DNA available, regardless of the genome size. The purpose of this particular kit is to avoid PCR amplification errors associated with the DNA polymerase working over long distances. Genomic DNA is isolated from the sample and fragmented physically or chemically, leaving random 5’ and 3’ end overhangs. The resulting DNA fragments are then purified for the desired size of 350bp or 550bp, using magnetic beads which bind to these fragment sizes. Size selection occurs by incubating specific ratios of magnetic beads with fragmented DNA; a higher ratio of magnetic bead to DNA results in a greater size range of purified DNA. Following that, the end overhangs created from fragmentation are repaired into blunt ends. This is achieved by using a combination of a 3’ to 5’ exonuclease and a 5’ to 3’ polymerase. The exonculease removes the 3’ overhang, while the polymerase fills in the 5’ overhang. The 3’ ends of the fragments are additionally adenylated; this single base overhang hybridizes with the 3’ thymine overhang of the adapters which are then ligated together. This ligation step is critical for the sequencing reaction later on as the adapters will enable the DNA to hybridize to the surface of the sequencing reaction chip. The collection of adapter-ligated fragments forms a library which can be sequenced. Before the library can be sequenced, it must be validated quantitatively and qualitatively. The library is validated quantitatively with qPCR. There are two reasons for this: 1) the primers used in the qPCR are contained in the adapter sequences and will only allow amplification of adapter-ligated fragments, and 2) the library is too small to be quantified flurometricly as there was no PCR amplification. The library is additionally validated qualitatively with the Agilent Technologies 2100 Bioanalyzer, before optional pooling with other libraries. Technical details regarding library validation instruments can be found in the Quality Control section below. For further details, please refer to Figure 1 presented below.
2) Illumina TruSeq Nano DNA Library Prep Kit - Any Size Genome with Small Sample Input
The TruSeq Nano DNA Library Prep Kit is ideal if there is 100-200 ng of genomic DNA available. The protocol is almost identical to the TruSeq PCR-free Library Preparation Kit protocol, save for PCR amplification and library validation. Amplification occurs between adapter ligation and library validation steps (see Figure 2). The purpose of PCR amplification is to enrich for adapter-ligated DNA fragments and increase the concentration of the library for sequencing. Library quantification and qualitative analysis are nearly the same as for the TruSeq PCR-free Library Preparation Kit. However, the high library concentration and, more importantly, selective amplification of fragments ligated with correctly oriented adapters together allow quantification to be done fluorometricly.
3) Illumina Nextera DNA Library Prep Kit - Large Genome Size with Small Sample Input
The Nextera DNA Library Prep Kit is ideal for large, complex genomes (ex. human genome) and provides a shorter sample preparation time relative to the TruSeq PCR-free and Nano Library Prep Kits. The protocol is fairly similar to that of the TruSeq Nano DNA Library Prep Kit, although with a few differences. Unlike the TruSeq kits, fragmentation and adapter ligation of genomic DNA, or “tagmentation”, occur in the first step. This is done with an enzyme called a transposome. The transposome is a transposase-transposon complex; this means that the enzyme is able to make cuts in DNA like a transposase but also insert a portion of itself in the DNA sequence like a transposon. The Nextera transposome is unique as the transposon portion of the complex consists of adapter sequences. During tagmentation, the Nextera transposome simultaneously cleaves the DNA molecule and inserts the adapter sequences. There is a subsequent clean-up step to remove any remaining transposome bound to the DNA from interfering with later steps. Because DNA fragmentation and tagging occurred at the same time, there is no need for DNA fragment end repair or adapter ligation preparation. Library quantification is solely done fluorometricly with Qubit. For further details, please refer to Figure 3 presented below.
4) Illumina Nextera DNA XT Library Prep Kit - Small Genome Size with Small Sample Input
The Nextera DNA XT Library Prep Kit is ideal for small genomes (ex. bacteria) as well as plasmids and amplicons. The protocol is very similar to the Nextera Library Prep Kit. However, there are a few exceptions: There is neither post-tagmentation clean-up nor library quantification.
Library Preparation for Exome Sequencing
Exome Sequencing, or Exome-Seq, is the sequencing of the coding portion of the genome. Currently this is a more affordable alternative to WGS as only about 2% of the whole genome is sequenced. Exome-Seq can be performed in two ways: 1) Sequencing of only the exons or 2) sequencing of all the exons, introns (non-protein coding regions), and regulatory regions such as the 5’ and 3’untranslated regions (5’ and 3’-UTR) and microRNAs (miRNA) sequences.
1) Illumina Nextera Rapid Capture Exome Kit
The Nextera Rapid Capture Exome Kit is ideal if only the exons are to be analyzed. Like the Nextera Library Prep Kit protocol for WGS, tagmentation of genomic DNA happens in the first step. This is followed by a clean-up step where the transposome is removed. The removal is necessary to prevent transposome interference in later steps. Adapter-ligated fragments are amplified with PCR to enrich for adapter-ligated DNA and to increase the concentration of the library. In addition, primers needed for sequencing and indexing are added in the first of three PCR enrichment steps. Once amplification is complete, the library is purified from non-amplified fragments with magnetic beads. The library is also quantified fluorometricly to determine if there is sufficient product. Next, exome-amplified fragments are isolated. This is achieved by hybridizing the exome-amplified fragments to biotinylated oligonucleotide probes which are complementary to the exome, followed by “capture” through non-covalent binding of biotinylated sequences with streptavidin beads. During these steps, non-specifically bound DNA is removed with washes. The process of hybridization and capture is repeated a second time. Once this is complete, the DNA library is enriched twice. The library is then purified with magnetic beads to have a pure sample for a final round of enrichment prior to sequencing. PCR enrichment is performed a third time, and then the library is purified. Finally, the library is validated quantitatively and qualitatively. Quantification is done using either qPCR or Qubit; qualitative analysis is performed with the Agilent Technologies 2100 Bioanalyzer. For further details, please refer to Figure 4 presented below.
2) Illumina Nextera Rapid Capture Expanded Exome Kit
The Nextera Rapid Capture Expanded Exome Kit is ideal if a more complete analysis of the exome, including UTRs and miRNA binding regions, is desired. The protocol is almost identical to the Nextera Rapid Capture Exome Kit, except for the addition of specific probes and related beads which bind and capture non-protein coding regions. Additional information about the protocol can be found in the Nextera Rapid Capture Exome Kit section (see link above).
Library Preparation for RNA-Seq
RNA Sequencing, or RNA-Seq, consists of sequencing the RNA transcripts present in the sample of an organism. This includes the entire collection of transcripts present including mRNA, or small RNAs.
RNA-Seq is divided into three categories based on the RNA chosen to be sequenced: total RNA-Seq, mRNA-Seq, and small RNA-Seq. Each of these categories has a unique sample library preparation protocol.
1) Illumina TruSeq Stranded Total RNA Kit
The TruSeq Stranded Total RNA Kit is ideal if a complete view of the transcripts in a sample is desired. Ribosomal RNA (rRNA) is not a desired component of the total RNA sample library, so it must be depleted. The depletion of rRNA is done by binding them to magnetic beads with sequences complementary to rRNA. After hybridization, the magnetic beads are pulled out of the solution with a strong magnet and the supernatant is used in further preparation steps. The remaining RNA is cleaned, fragmented, and primed in a single step for cDNA synthesis. Using random primers, the first cDNA strand is then synthesized. During this step, the compound Actinomycin is added; this is done to prevent second strand synthesis while the first strand is made. The RNA template is then degraded to ensure that only the second cDNA strand will be produced in the next synthesis step. Next, the second cDNA strand is synthesized, although dUTP nucleotides are used instead of dTTP nucleotides. The purpose of using dUTPs is to differentiate between the two strands of DNA once the second cDNA strand has been synthesized. The resulting double-stranded DNA is prepared for adapter ligation through adenylation of the 3’ end; this makes the cDNA able to hybridize with the thymine on the 3’ end of the adapters. Once adenylation is complete, adapters are ligated onto the 3’ends of the cDNA and dUTPs are enzymatically removed (see NEB website link provided here for an overview). The adapter-ligated cDNA fragments now lacking dUTPs are then enriched via PCR amplification. The resulting library is validated quantitatively with qPCR and qualitatively with the Agilent Technologies 2100 Bioanalyzer before normalization. If necessary, the library can be pooled with others for multiplexing. For further details, please refer to Figure 5 presented below. A more detailed protocol can be found on the link here.
2) Illumina TruSeq Stranded mRNA Kit
The TruSeq Stranded mRNA Kit is ideal if the gene expression profile of a sample is desired. The protocol is identical to the TruSeq Stranded Total RNA kit, with the exception of mRNA enrichment instead of rRNA depletion in the first step. For further details, please refer to Figure 6 presented below.
3) Illumina TruSeq Small RNA Kit
The TruSeq Small RNA Kit is ideal if small, non-coding RNAs (ex. miRNA) are to be analyzed. The protocol for this kit is very different from the TruSeq Stranded Total RNA and TruSeq Stranded mRNA kits. Unlike the other RNA library prep kits, the first step consists of sequential blunt-ended adapter ligation (3’ adapter then 5’ adapter) to total RNA. This protocol also does not involve either depletion or enrichment of RNA. The adapter-ligated RNAs are then subject to RT-PCR to enrich for RNAs that have adapters ligated in the correct orientation. Products of RT-PCR are run on an agarose gel; the desired product sizes are isolated at sizes 147bp and 157bp. The purified library is only validated qualitatively, using the Agilent Technologies 2100 Bioanalyzer. For further details, please refer to Figure 7 presented below.
Library Preparation for Methyl-Seq
Methylation Sequencing, or Methyl-Seq, is the sequencing of the methylated regions of the genome. One of the ways to perform Methyl-Seq is by treating genomic DNA with bisulfite to convert non-methylated cytosines to uracils. Methylated cytosines are retained and they can be analyzed for methylation patterns.
1) Illumina TruSeq DNA Methylation Kit
The TruSeq DNA Methylation Kit is ideal if genome methylation is to be analyzed. The sample library preparation begins with fragmentation of the genome. Once complete, the fragments undergo a bisulfite treatment to convert non-methylated cytosines to uracils, while retaining those which are methylated. Using random primers containing the 5’ adaptor sequence at their 5’ end, DNA amplification occurs. Next, the 3’ adapter tag is ligated. PCR enrichment for the adapter-ligated fragments is performed, and if desired, indexing primers for sequencing are added. The enriched library is purified using magnetic beads, before quantitative validation of the library with qPCR or a fluorometric method. Qualitative analysis is also performed with the Agilent Technologies 2100 Bioanalyzer. For further details, please refer to Figure 8 presented below.
Prior to sequencing, the sample library must be validated quantitatively and qualitatively. This is performed to verify if there is a sufficient amount of good quality DNA in the prepared library. Both quality and quantity play important roles in generating data. The consequence of having either more or less DNA than the optimal amount set by the library protocol is that the sequencing reaction runs less efficiently. This generates low quality data through problems including read problems from flow cell saturation, or reduced coverage because of insufficient DNA. In terms of quality, a good quality library is one that has a diverse set of DNA fragments with minimal duplicate fragments. This is important because during PCR amplification of some sample library preparation protocols, duplicates of fragments will be generated. The consequence of duplicate fragments is that the sequencing reaction will be biased towards these fragments (3). Rather than have a wide range of fragments sequenced, the same fragments are sequenced repetitively; this results in overrepresentation in the machine output.
Library quantification is performed using either qPCR or a fluorometric method like Qubit. Some libraries may only be quantified using one of the two methods. Sample library quality is then verified with the Agilent Technologies 2100 Bioanalyzer. Please refer to the Library Preparation section for further details.
qPCR
qPCR is a method of quantifying a sample library before sequencing. It is ideal when there is an insufficient amount available for fluorometric quantification, commonly due to no PCR amplification. It is also a more sensitive way, relative to Qubit, to quantify the adapter-ligated fragments in a sample. qPCR selectively amplifies such fragments, so it avoids the inaccuracies of Qubit that result from being unable to distinguish between fragments which can and cannot be sequenced. The only drawback to this procedure is that it is very time-consuming.
Qubit
Qubit is an alternative to qPCR for quantifying a sample library. Relative to qPCR, it provides results faster; however, it is not applicable for cases where there is no PCR enrichment as it is less sensitive than qPCR and requires more sample. Quantification is performed by illuminating and detecting dyes which selectively bind to DNA or RNA. First, a standard must be measured with the appropriate assay. The sample, which may be diluted, is then mixed with the appropriate dye before being inserted into the machine. For further details, please refer to the Qubit product page here.
Agilent Technologies 2100 Bioanalyzer
The Bioanalyzer is used to check for the size distribution of the library before the sequencing reaction. It is a way to verify that the sizes that were selected for during sample library preparation are present. The Bioanalyzer consists of a machine that reads gel chips containing diluted samples in the wells. The chips are similar to the idea of agarose gels, except in a smaller format. There are specific details for libraries prepared from DNA or RNA, but the protocols for both are very similar. The first step is to introduce the gel into the chip and pressurize it; this will evenly distribute the gel in the chip, minimizing errors in machine analysis later on. Once complete, markers, ladders, and samples (either diluted or undiluted) are loaded onto the chip. There may be additional reagents needed depending on the kit requirements. The chip is then vortexed for one minute at 2400 rpm before it is loaded onto the Bioanalyzer. The machine will monitor each well for sample; this is visualized with peaks on a graph. The location of the peaks will indicate the markers and the sample size distribution of the library, while the peak height shows the amount of fragments at a specific size. For further details, please refer to the Agilent DNA and RNA analysis kit as well as the Agilent Technologies 2100 Bioanalyzer product pages found here and here.
Once the sample library has been prepared, validated, and sequenced, there are various applications for the data output.
Whole Genome Sequencing
There are two downstream applications for WGS:
RNA-Sequencing
There are two downstream applications for RNA-Seq:
Exome-Sequencing
Application of the output sequence generated from Exome-Seq is limited to alignment with a reference sequence. This is done to detect variations in the exome such as coding variants and Mendelian disorders (9) (10).
Methyl-Seq
Application of the output generated from Methyl-Seq is limited to sequence alignment with a reference sequence to analyze things such as DNA-protein interactions and cell-lineages through the methylation pattern across the sequence (9)(11).
Custom bioinformatics services are available, please email our technical support team at [email protected] with your request. |