Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Statistical Method for Analyzing Low Quality Scores in DNA Sequencing Reads


Affiliations
1 Department of Computer Science and Applications, S.D. College, Ambala Cantt., Kurukshetra University, Kurukshetra, India
     

   Subscribe/Renew Journal


The exponential growth of new DNA sequencing technologies is changing biological sciences by allowing scientific investigators to sequence large amounts raw DNA bases previously requiring a major genome sequencing efforts. Next-generation sequencing produces much higher output with significantly lower cost, because of the millions of reactions running in parallel and much smaller reaction volumes [1]. These new Techniques come with unmatched amount of data - but this sequencing data comes with errors. A better knowledge of the error profiles is essential for sequence analysis and absolutely necessary in order to make substantial decisions [19]. Unterminated bases in sequencing cycles have been reported to be the major source of errors. In this paper we perform an analysis on sequencing reads data from a real human being for sequence quality scores. Here, we compute quality scores and detect low quality clusters in DNA sequencing reads and produce a graphical analysis. We also infer the factors that lead to the presence of many low quality clusters in the sample. This statistical analysis allows us to study and compare various errors introduced by different next generation sequencers. Having the ability to analyze error profiles for sequencing reads has the potential to significantly enhance our ability to perform accurate sequence analysis.


Keywords

Next Generation Sequencing, DNA Bases, Sequencing Errors, Quality Scores, Base Caller, Sequencing Reads.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 185

PDF Views: 2




  • A Statistical Method for Analyzing Low Quality Scores in DNA Sequencing Reads

Abstract Views: 185  |  PDF Views: 2

Authors

Sangharsh Saini
Department of Computer Science and Applications, S.D. College, Ambala Cantt., Kurukshetra University, Kurukshetra, India

Abstract


The exponential growth of new DNA sequencing technologies is changing biological sciences by allowing scientific investigators to sequence large amounts raw DNA bases previously requiring a major genome sequencing efforts. Next-generation sequencing produces much higher output with significantly lower cost, because of the millions of reactions running in parallel and much smaller reaction volumes [1]. These new Techniques come with unmatched amount of data - but this sequencing data comes with errors. A better knowledge of the error profiles is essential for sequence analysis and absolutely necessary in order to make substantial decisions [19]. Unterminated bases in sequencing cycles have been reported to be the major source of errors. In this paper we perform an analysis on sequencing reads data from a real human being for sequence quality scores. Here, we compute quality scores and detect low quality clusters in DNA sequencing reads and produce a graphical analysis. We also infer the factors that lead to the presence of many low quality clusters in the sample. This statistical analysis allows us to study and compare various errors introduced by different next generation sequencers. Having the ability to analyze error profiles for sequencing reads has the potential to significantly enhance our ability to perform accurate sequence analysis.


Keywords


Next Generation Sequencing, DNA Bases, Sequencing Errors, Quality Scores, Base Caller, Sequencing Reads.