Next Generation Sequencing (NGS) – Introduction

Next Generation Sequencing (NGS) – Introduction

45 min Read

Video Summary

Deoxyribonucleic acid, commonly known as DNA, contains the blueprints of life. Within its structures are the codes required for the assembly of proteins and non-coding RNA – these molecular machineries affect all the biological systems that create and maintain life. By understanding the sequence of DNA, researchers have been able to elucidate the structure and function of proteins as well as RNA and have gained an understanding of the underlying causes of disease. Next Generation Sequencing (NGS) is a powerful platform that has enabled the sequencing of thousands to millions of DNA molecules simultaneously. This powerful tool is revolutionizing fields such as personalized medicine, genetic diseases, and clinical diagnostics by offering a high throughput option with the capability to sequence multiple individuals at the same time.

Sanger Sequencing

Sanger Sequencing utilizes a high fidelity DNA-dependent polymerase to generate a complimentary copy to a single stranded DNA template (1) (2) (3). In each reaction a single primer, complementary to the template, initiates a DNA synthesis reaction from its 3’ end. Deoxynucleotides or nucleotides, which are the monomers of DNA, are added one after the other in a template-dependent manner forming phospho-diester bonds between the 3’ hydroxyl of the growing end of the primer and the 5’ tri-phosphate group of the incoming nucleotide (Figure 1)(1).

Each reaction also contains a mixture of four di-deoxynucleotides, one for each DNA base (i.e. A, G, T, and C). These di-deoxynucleotides resemble the DNA monomers enough to allow incorporation into the growing strand, however, they differ from natural deoxynucleotides in two ways: 1) they lack a 3’ hydroxyl group which is required for further DNA extension resulting in chain termination once incorporated in the DNA molecule, and 2) each di-deoxynucleotide has a unique fluorescent dye attached to it allowing for automatic detection of the DNA sequence (3) (4) (5).

Many copies of different-length DNA fragments are generated in each reaction, terminated at all of the nucleotide positions of the template molecule by one of the di-deoxynucleotides (Figure 1). The reaction mixtures are loaded on the sequencing machine, either manually onto slab gels or automatically with capillaries, and are electrophoresed to separate the DNA molecules by size. The DNA sequence is read through the fluorescent emission of the di-deoxynucleotide as it flows through the gel (Figure 2) (5). Modern day Sanger Sequencing instruments use capillary based automated electrophoresis, which typically analyzes 8–96 sequencing reactions simultaneously.

Figure 1 – An illustration of Sanger Sequencing.

Try our MegaFi™ Pro Fidelity DNA Polymerase

Similarities between different NGS Technologies (6)(7)(8)(9)

Next Generation Sequencing systems have been introduced in the past decade that allow for massively parallel sequencing reactions. These systems are capable of analyzing millions or even billions of sequencing reactions at the same time. Although different machines have been developed with various differing technical details, they all share some common features which are outlined below (Figure 2):

1. Sample Preparation:

All Next Generation Sequencing platforms require a library obtained either by amplification or ligation with custom adapter sequences. These adapter sequences allow for library hybridization to the sequencing chips and provide a universal priming site for sequencing primers. learn more about sample preparation from our Next Generation Sequencing - Experimental Design knowledge base.

2. Sequencing machines:

Each library fragment is amplified on a solid surface (either beads or a flat silicon derived surface) with covalently attached DNA linkers that hybridize the library adapters. This amplification creates clusters of DNA, each originating from a single library fragment; each cluster will act as an individual sequencing reaction.

The sequence of each cluster is optically read (either through the generation of light or fluorescent signal) from repeated cycles of nucleotide incorporation. Each machine has its own unique cycling condition; for example, the Illumina system uses repeated cycles of incorporation of reversibly fluorescent and terminated nucleotides followed by signal acquisition and removal of the fluorescent and terminator groups.

3. Data output:

Each machine provides the raw data at the end of the sequencing run. This raw data is a collection of DNA sequences that were generated at each cluster. This data could be further analysed to provide more meaningful results.

Next Generation Sequencing (NGS) platforms

Figure 2 – An illustration of the similarities and difference between the different Next Generation Sequencing platforms.

Differences between different NGS Technologies (6)(7)(8)(9)

The differences between the different Next Generation Sequencing platforms lie mainly in the technical details of the sequencing reaction. Below we describe these technical differences briefly. For a full explanation, please visit the manufacturers’ webpages at the links provided in each section.

Pyrosequencing

In pyrosequencing, the sequencing reaction is monitored through the release of the pyrophosphate during nucleotide incorporation. A single nucleotide is added to the sequencing chip which will lead to its incorporation in a template dependent manner. This incorporation will result in the release of pyrophosphate which is used in a series of chemical reactions resulting in the generation of light. Light emission is detected by a camera which records the appropriate sequence of the cluster. Any unincorporated bases are degraded by apyrase before the addition of the next nucleotide. This cycle continues until the sequencing reaction is complete (Table 1).

Disadvantages:
High reagent cost, and high error rate over strings of 6 or more single base nucleotides.

Table 1 — Technical details for all available pyrosequencing based NGS machines.

	GS Junior	GS Junior Plus	GS FLX+ System
	GS Junior	GS Junior Plus	GS FLX Titanium XL+	GS FLX Titanium XLR70
Read Length	400bp	700bp	700bp (up to 1,000bp)	450bp (up to 600bp)
Throughput	35Mb	70Mb	700Mb	450Mb
Reads per Run	100,000 Shotgun, 70,000 amplicon	100,000 shotgun, 70,000 amplicon	1,000,000 shotgun	1,000,000 shotgun, 700,000 amplicon
Accurarcy	99% at 400bp	99% at 700bp	99.997%	99.995%
Run Time	10 hr	18 hr	23 hr	10 hr

Sequencing by Synthesis

Sequencing by synthesis utilizes the step-by-step incorporation of reversibly fluorescent and terminated nucleotides for DNA sequencing and is used by the Illumina NGS platforms. The nucleotides used in this method have been modified in two ways: 1) each nucleotide is reversibly attached to a single fluorescent molecule with unique emission wavelengths, and 2) each nucleotide is also reversibly terminated ensuring that only a single nucleotide will be incorporated per cycle. All four nucleotides are added to the sequencing chip and after nucleotide incorporation the remaining DNA bases are washed away. The fluorescent signal is read at each cluster and recorded; both the fluorescent molecule and the terminator group are then cleaved and washed away. This process is repeated until the sequencing reaction is complete. This system is able to overcome the disadvantages of the pyrosequencing system by only incorporating a single nucleotide at a time (Table 2).

Disadvantages:
As the sequencing reaction proceeds, the error rate of the machine also increases. This is due to incomplete removal of the fluorescent signal which leads to higher background noise levels.

Table 2 — Technical details for all available sequencing by synthesis based NGS machines:

	MiSeq	NextSeq 500		HiSeq 2500		HiSeq 3000	HiSeq 4000
Run Mode	N/A	Mid-Output	High-Output	Rapid Run	High-Output	N/A	N/A
Flow Cells Per Run	1	1	1	1 or 2	1 or 2	1	1 or 2
Output Range	0.3-15 Gb	20-39 Gb	30-120 Gb	10-300 Gb	50-1000 Gb	125-750 Gb	125-1500 Gb
Run Time	5-55 hrs	15-26 hrs	12-30 hrs	7-60 hrs	<1-6 days	<1-3.5 days	<1-3.5 days
Reads per Flow Cell	25million	130 million	400 million	300 million	2 billion	2.5 billion	2.5 billion
Maximum Read Length	2 x 300bp	2 x 150bp	2 x 150bp	2 x 250bp	2 x 125bp	2 x 150bp	2 x 150bp

For more information, please visit the Illumina website.

Sequencing by Ligation

Sequencing by ligation is different from the other two methods since it does not utilize a DNA polymerase to incorporate nucleotides. Instead, it relies on short oligonucleotide probes that are ligated to one another. These oligonucleotides consist of 8 bases (from 3’-5’): two probe specific bases (there are a total of 16 8-mer probes which all differ at these two base positions) and six degenerate bases; one of four fluorescent dyes are attached at the 5’ end of the probe. The sequencing reaction commences by binding of the primer to the adapter sequence and then hybridization of the appropriate probe. This hybridization of the probe is guided by the two probe specific bases and upon annealing, is ligated to the primer sequence through a DNA ligase. Unbound oligonucleotides are washed away, the signal is detected and recorded, the fluorescent signal is cleaved (the last 3 bases), and then the next cycle commences. After approximately 7 cycles of ligation the DNA strand is denatured and another sequencing primer, offset by one base from the previous primer, is used to repeat these steps - in total 5 sequencing primers are used (Table 3).

Disadvantages:
This method leads to very short sequencing reads.

Table 3 — Technical details for all available sequencing by ligation based NGS machines:

	Genetic Analyzer V2.0
	5500W System	5500xl W System
Instrument Throughput
1 x 50	80 Gb	160 Gb
1 x 75	120 Gb	240 Gb
2 x 50 MP	160 Gb	320 Gb
50 x 50 PE	160 Gb	320 Gb
Accuracy	99.99%	99.99%
Run Time	7 days	7 days

For more information, please visit the Applied Biosystems website.

Ion Semiconductor Sequencing

Ion semiconductor sequencing utilizes the release of hydrogen ions during the sequencing reaction to detect the sequence of a cluster. Each cluster is located directly above a semiconductor transistor which is capable of detecting changes in the pH of the solution. During nucleotide incorporation, a single H+ is released into the solution and it is detected by the semiconductor. The sequencing reaction itself proceeds similarly to pyrosequencing but at a fraction of the cost (Table 4).

Disadvantages:
High error rate over homopolymeric stretches of nucleotides.

Table 4 — Technical details for all available ion semiconductor sequencing based NGS machines:

	Ion Proton System
Output	up to 10 Gb
Reads	60-80 million Reads
Read Length	up to 200bp
Run time	2-4 hrs

For more information, please visit the Life Technologies website.

Comparisons between Different NGS Platforms

It is difficult to see the differences between the different NGS instruments based on the above data. In this section we attempt to simplify comparisons between instruments by seeing how each system performs if given the task to sequence either the human (3,300,000,000 bases), mouse (2,800,000,000 bases), Arabidopsis thaliana (135,000,000 bases), and E. coli (4,639,221 bases) genomes (Table 5). To be able to use the sequencing data, coverage of at least 30x is required, anything lower than this number is marked in red and anything higher is marked in green.

Table 5 — Coverage of genome per run

Roche	GS Junior	GS Junior Plus	GS FLX+ System
Roche	GS Junior	GS Junior Plus	GS FLX Titanium XL+	GS FLX Titanium XLR70
Human	0	0	0	0
Mouse	0	0	0	0
Arabidopsis thaliana	0	1	5	3
E. coli	8	15	151	97

Illumina	MiSeq	NextSeq 500		HiSeq 2500		HiSeq 3000	HiSeq 4000
Human	5	12	36	91	303	227	455
Mouse	5	14	43	107	357	268	536
Arabidopsis thaliana	111	289	889	2,222	7,407	5,556	11,111
E. coli	3,233	8,407	25,866	64,666	215,553	161,665	323,330

Applied Biosystems	Genetic Analyzer V2.0
Applied Biosystems	5500W System	5500xl W System
Human	48	97
Mouse	57	114
Arabidopsis thaliana	1,185	2,370
E. coli	34,489	68,977

	Ion Proton System
Human	3
Mouse	4
Arabidopsis thaliana	74
E. coli	2,156

Useful Terms

Next Generation Sequencing is a young field, with the first machines marketed in 2005. However, in less than a decade NGS has become a cornerstone of molecular biology and genetics. As such, being familiar with its technical terms will help in better understanding the available literature and becoming a member of its ever expanding community. In this section the most common terms used in this field are explained:

Next Generation Sequencing:

Next Generation Sequencing, or NGS, is a sequencing method where millions of sequencing reactions are carried out in parallel, increasing the sequencing throughput.

Reads:

The output of an NGS sequencing reaction. A read is a single uninterrupted series of nucleotides representing the sequence of the template.

Read Length:

The length of each sequencing read. This variable is always represented as an average read length since individual reads have varying lengths.

Coverage:

Next Generation Sequencing, or NGS, is a sequencing method where millions of sequencing The number of times a particular nucleotide is sequenced. Due to the error -prone sequencing reactions, random errors could occur. Therefore, 30x coverage is typically required to ensure each nucleotide sequence is accurate.

Deep Sequencing:

Sequencing where the coverage is greater than 30x. This is used in cases where dealing with rare polymorphisms which only a subset of the sample expresses the mutation. This method increases range, complexity, sensitivity, and accuracy of the result.

Paired-End Sequencing:

Sequencing from both ends of a fragment while keeping track of the paired data. With this method the sequencing reaction will commence from one end of the fragment. Once completed, the fragment is denatured and a sequencing primer is hybridized to the reverse side adapter. The fragment is then sequenced again. Using this method will allow either further confirmation of the accuracy of the sequence or it could be used to increase the overall read length.

Mate-Paired reads:

A sample preparation step where large DNA fragments (~10kb) are circularized with an adapter sequence followed by degradation of the circular DNA. This method links DNA fragments that are separated from each other by a certain distance and it is used in applications such as de novo assembly, structural variant detection, and identification of complex genomic rearrangements.

Adapter:

Unique sequences used to cap the ends of a fragmented DNA. The adapter’s functions are as follows: 1) allow hybridization to solid surface; 2) provide priming location for both amplification and sequencing primers; and 3) provide barcoding for multiplexing different samples in the same run.

Library:

A collection of DNA fragments with adapters ligated to each end. Library preparation is required before a sequencing run. Our next knowledge base will delve into the different sample and library preparation methods available.

Alignment:

Mapping a sequence read to a known reference genome.

Reference sequence/genome:

A fully sequenced and mapped genome used for the mapping of sequence reads.

De Novo Assembly:

Assembly of the sequence reads to generate a reference sequence.

Specificity:

The percentage of sequences that map to the intended targets out of total bases per run.

Uniformity:

The variability in sequence coverage across target regions. When performing whole genome sequencing or exome sequencing, it is expected that the result will be highly uniform (as there should be a 1:1 ratio in the starting material). However, RNA sequencing will not be uniform since differences in expression alter its starting material.

Homopolymer:

A stretch of single nucleotide bases, such as AAAA or GGGGGG.

References

Sequences, sequences, and sequences. Sanger, F. s.l. : Annu Rev Biochem, 1988, Vol. 57, pp. 1-28.
Nucleotide sequence of bacteriophage phi X174 DNA. Sanger, F, Air, GM and Barrell, BG. 1977, Nature, Vol. 265, pp. 687-695.
DNA Sequencing with chain-terminating inhibitors. Snager, F, Nicklen, S and Coulson, AR. s.l. : Proc NatI Acad Sci USA, Vol. 74, pp. 5463-5467.
Overview of DNA sequencing strategies. Shendure, JA, Porreca, GJ and Church, GM. Chapter 7, s.l. : John Wiley & Sons, 2011.
Energy transfer primers: a new fluoresence labeling paradigm for DNA sequencing and analysis. Ju, J, Glazer, AN and Mathies, RA. 2, s.l. : Nat Med, 1996, pp. 998-999.
454 Sequencing. [Online] 2015. [Cited: 6 2, 2015.] http://www.454.com/.
illumina. [Online] 2015. [Cited: 6 2, 2015.] http://www.illumina.com/.
SOLiD. Applied Biosystems. [Online] 2015. [Cited: 6 2, 2015.] http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html.
Ion Torrent. Applied Biosystems. [Online] 2015. [Cited: 6 2, 2015.] http://www.lifetechnologies.com/ca/en/home/brands/ion-torrent.html.

Introduction

Experimental Design

Data Analysis

Whole Genome Sequencing (WGS)