Molecular Minutes

Q&A: NGS Data Analysis 101

Posted by Applied Biological Materials (abm) on November 27, 2019

Bioinformatics-Webinar_landing-page_web-banner

Here is the full list of answers to questions we've received during the

"NGS Data Analysis 101: RNA-Seq, WGS, and more" webinar:


  1. What file format will I get from sequencing on a MiSeq, FastQ or BCL? Do you provide the BCL files for sequencing?

  2. Do Q30 scores vary on different sequencers?

  3. I don’t have a reference genome but there is a related species that does have a reference genome. Can I use that for alignment/analysis?

  4. For analysis do you use GATK or VarScan2? ​Can I pick which one you use for my analysis?

  5. Do I have to use JBrowse? ​Does anyone still use Gbrowse?

  6. What happens if I don’t normalize my RNA-Seq data using FPKM?

  7. What should I do if I do not have appropriate “control” samples for RNA-Seq? 

  8. Can StringTie be used to identify fusions, or do I need to use another program for this?

  9. If I want to identify alternative-splicing transcripts from RNA-Seq data, do I have to do anything differently for sequencing?

  10. If I did single-end RNA-Seq, do I still normalize the data the same way?

  11. Can you do custom analysis? I want standard and custom analysis for a project.

  12. For RNA-Seq data alignment, can I use BWA-MEM as the aligner or can I use something else, like STAR (Spliced Transcripts Alignment to a Reference) software?

  13. Do I need to use replicates for my RNA-Seq experiment?

  14. Do I have to normalize my data before using DESeq2 for looking at differential gene expression? Or can I input FastQ data and the software can process it anyway?

1. What file format will I get from sequencing on a MiSeq, FastQ or BCL? Do you provide the BCL files for sequencing?

Our standard deliverable for all sequencing are FastQ files for all projects. If you would prefer to receive BCL files, please let our team know before order placement to discuss further! In most cases, we can provide these, depending on your project, and the sequencing platform.

 

2. Do Q30 scores vary on different sequencers?

Yes, the Q30 scores generally differ. For instance, NovaSeq > HiSeq > NextSeq > MiSeq, although the choice of sequencer will affect Q30 scores less than other factors such as read length, sample quality, or even the actual sequence for the sample.


 

3. I don’t have a reference genome but there is a related species that does have a reference genome. Can I use that for alignment/analysis?

Even among closely-related species, there are often many differences that make using a related reference genome challenging. Using a related reference may result in analysis that shows thousands or even millions of mutations/variations between samples and will not provide you with a reliable result.

Once you are ready to publish your results, most reputable journals/publications will not accept analysis with a closely-related species (instead of the appropriate reference genome).


If there is no reference genome for your species of interest, you can usually do WGS, generate a reference genome, and publish your results in a journal relatively easily. And your colleagues in your field would appreciate your efforts!

 

4. For analysis do you use GATK or VarScan2? ​Can I pick which one you use for my analysis?

We use both as part of our standard variation calling for human samples, and VarScan2 for all samples; if you have a preference for your analysis, we can also accommodate that.

 

5. Do I have to use JBrowse? ​Does anyone still use Gbrowse?

A lot of researchers still use Gbrowse. Because there are a number of glitches and bugs in the Gbrowse tool (which JBrowse has mostly addressed/fixed), we would suggest using JBrowse instead if possible.


 

6. What happens if I don’t normalize my RNA-Seq data using FPKM?

If you do not normalize your data, your results that show purported gene expression levels will be very misleading and essentially unusable. Journals and reviewers also will not accept this for a study. Always normalize your RNA-Seq data.

 


7. What should I do if I do not have appropriate “control” samples for RNA-Seq?

Appropriate controls are absolutely necessary for all experiments; if you do not have appropriate controls ready in time for your RNA-Seq experiment, you should delay doing RNA-Seq until you have appropriate controls.

 


8. Can StringTie be used to identify fusions, or do I need to use another program for this?

StringTie cannot be used to identify fusions, but other programs can be used for this, such as JAFFA, MapSplice,  or SOAPfuse. You can also write custom scripts that can do this type of analysis but  this would require either a data scientist, bioinformatician, or computer programmer.

 


9.
If I want to identify alternative-splicing transcripts from RNA-Seq data, do I have to do anything differently for sequencing?

To search for alternative splicing events, deeper sequencing is often required, as well as using paired-end sequencing. StringTie can be used for the analysis, though, for this type of project.

 


10. If I did single-end RNA-Seq, do I still normalize the data the same way?

For single-end sequencing, you can normalize the same way, using StringTie to calculate FPKM. You may also have heard of another way to normalize data, using RPKM, which predates FPKM and was used for single-end sequencing. FPKM can be used for single-end or paired-end sequencing though, with the same result.

 


11. Can you do custom analysis? I want standard and custom analysis for a project.

Yes, we have a dedicated in-house bioinformatics team that can assist with nearly any analysis you would need for your NGS project. Simply let us know the number of samples you have, the data format, and the type of analysis you are interested in, and one of our specialists can assist you further.


12. For RNA-Seq data alignment, can I use BWA-MEM as the aligner or can I use something else, like STAR (Spliced Transcripts Alignment to a Reference) software?

BWA in general is used for WGS alignment. For RNA-Seq, because genes have introns and you must do something called split-alignment for the analysis, we would suggest using STAR. Avoid using BWA for this type of alignment.

 


13. Do I need to use replicates for my RNA-Seq experiment?

Yes. To increase confidence and reduce experimental error it is suggested that you submit at least 3 replicates per sample. Note that this is to serve as a guideline only and the final number of replicates and samples is to be determined by the end user based on their final experimental conditions.

For example, it would be better to sequence 3 replicates with 10 million read pairs per sample versus 1 sample with 30 million read pairs; this would allow you to have greater confidence in the results, do meaningful statistical analysis, and generally be easier to publish in a journal once your project is complete.

 


14. Do I have to normalize my data before using DESeq2 for looking at differential gene expression? Or can I input FastQ data and the software can process it anyway?

DESeq2 uses the StringTie output for differential gene expression (DEG) analysis, but data DESeq2 uses is not normalized; it instead uses the raw read count, normalized this in DESeq2, and then performs DEG analysis.

 

 

Got more questions? Feel free to contact our Technical Support Team at ngs@abmgood.com or leave your question in the section below. We'll be happy to help.

Topics: Next Generation Sequencing, NGS, Data Analysis

Molecular Minutes

Educational resources for life scientists and interviews with scientists/science communicators in the field.

For more in-depth articles, check out our knowledge base, which covers topics such as CRISPR, Next Generation Sequencing, PCR, Cell Culture, and more.

Blog managed by Applied Biological Materials (abm). 

Subscribe to Email Updates

Recent Posts