If you're starting a sequencing project, you may encounter terms such as "Sequencing Coverage" or "Coverage Mapping". In this blog post, we'll explain how you can estimate these numbers for your own project!
Let's start by defining some of the basic terms introduced earlier!
Raw Sequencing Coverage
|Raw Coverage =||# of Reads x Read Length|
|Target Region Size|
The raw coverage refers to the amount of times the target region is sequenced without taking into account alignment efficiency. The raw coverage can change depending on the read length used, the number of reads sequenced, the machine used (e.g. HiSeq generates more data than a MiSeq) and whether a whole lane or a partial lane is used.
However in reality, the sequencing efficiency may not be exactly 100% and this leads us to the topic of mapping coverage.
|Mapping Coverage =||# of Reads Mapped to the Reference Genome x Read Length|
|Target Region Size|
For example, if only 700 reads out of 1,000 reads pass alignment / mapping quality thresholds, then the coverage at that region would be 700 even though 1,000 reads were produced during sequencing. Reads may not align back to the reference region for a variety of reasons, including but not limited to:
- Sequencing errors preventing reads from accurately aligning back to the reference
- Contaminated samples - for example bacteria inside C.elegans, which if not removed, can also be sequenced and generate reads but do not align with the reference genome
- Differences in the thresholds that are set for filtering "bad" alignments used by each study
In addition, the following can affect how much coverage is generated for a particular region:
- How easy the region of interest can be captured and fragmented, thus affecting coverage distribution
- Coverage bias introduced during PCR
- The expression level of a gene: the higher the expression level, the higher the coverage
Applied Biological Materials, Inc. is an Illumina Certified Service Provider, dedicated to ensuring the delivery of the highest-quality data available for genetic analysis applications. Learn more about our NGS services here.