Open Access Open Access  Restricted Access Subscription Access

A two-step procedure for detecting change points in genomic sequences


Affiliations
1 Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
2 Agricultural Education Division,ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
3 Division of Design of Experiments, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
4 Fishery Resources Assessment Division, ICAR-Central Marine Fisheries Research Institute, Kochi 682 018, India

The field of whole genomic studies and investigations is currently focused on change-point detection. Over time, various segmentation techniques have been pro­posed to identify these change points. To effectively locate segments within a genome, it is helpful to pinpoint the intervals or boundaries between them, which are known as change points. By treating these change points as outliers, they can be identified. The anomalies or outliers in a dataset are the observations which are significantly different from the rest of the observations. They can be attributed to some measurement errors or properties of the data themselves. Studying the fluctuations over different segments also revealed the heterogeneity bet­ween consecutive segments. In this paper, anomaly identification approach or influential point detection has been discussed and studied in cow genome data of chromosome 25. Furthermore, the observed anomalies have been confirmed to determine whether or not they are true change points. The two-step technique resulted in the identification of change sites based on observed abnormalities and is efficient in terms of calculation time and cost. This study aims to detect any anomalies in genomic data and determine the exact points at which the data segment significantly differed from the rest of the segments. We have developed relevant R codes for data processing and applied methodologies.

Keywords

Anomalies, change points, genomic sequences, segmentation, two-step procedure
User
Notifications
Font Size

Abstract Views: 146




  • A two-step procedure for detecting change points in genomic sequences

Abstract Views: 146  | 

Authors

Arfa Anjum
Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
Seema Jaggi
Agricultural Education Division,ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
Shwetank Lall
Division of Design of Experiments, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
Eldho Varghese
Fishery Resources Assessment Division, ICAR-Central Marine Fisheries Research Institute, Kochi 682 018, India
Anil Rai
Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
Arpan Bhowmik
Division of Design of Experiments, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
Dwijesh Chandra Mishra
Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India

Abstract


The field of whole genomic studies and investigations is currently focused on change-point detection. Over time, various segmentation techniques have been pro­posed to identify these change points. To effectively locate segments within a genome, it is helpful to pinpoint the intervals or boundaries between them, which are known as change points. By treating these change points as outliers, they can be identified. The anomalies or outliers in a dataset are the observations which are significantly different from the rest of the observations. They can be attributed to some measurement errors or properties of the data themselves. Studying the fluctuations over different segments also revealed the heterogeneity bet­ween consecutive segments. In this paper, anomaly identification approach or influential point detection has been discussed and studied in cow genome data of chromosome 25. Furthermore, the observed anomalies have been confirmed to determine whether or not they are true change points. The two-step technique resulted in the identification of change sites based on observed abnormalities and is efficient in terms of calculation time and cost. This study aims to detect any anomalies in genomic data and determine the exact points at which the data segment significantly differed from the rest of the segments. We have developed relevant R codes for data processing and applied methodologies.

Keywords


Anomalies, change points, genomic sequences, segmentation, two-step procedure



DOI: https://doi.org/10.18520/cs%2Fv126%2Fi1%2F54-58