DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database

Heba Afify; Muhammad Islam; Manal Abdel Wahed

DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database

Heba Afify , Muhammad Islam , Manal Abdel Wahed

Affiliations
1 Department of Systems and Biomedical Engineering, Cairo University, Egypt

Abstract
References
Article Metrics
Refbacks

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information theory are often perceived as being of interest for data communication and storage. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison of genomic databases. This paper presents a differential compression algorithm that is based on production of difference sequences according to op-code table in order to optimize the compression of homologous sequences in dataset. Therefore, the stored data are composed of reference sequence, the set of differences, and differences locations, instead of storing each sequence individually. This algorithm does not require a priori knowledge about the statistics of the sequence set. The algorithm was applied to three different datasets of genomic sequences, it achieved up to 195-fold compression rate corresponding to 99.4% space saving.

Keywords

Data Compression, Genomic Sequences, Differential Compression Algorithm.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 317

PDF Views: 134

DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database

Abstract Views: 317 | PDF Views: 134

Authors

Heba Afify
Department of Systems and Biomedical Engineering, Cairo University, Egypt

Muhammad Islam
Department of Systems and Biomedical Engineering, Cairo University, Egypt

Manal Abdel Wahed
Department of Systems and Biomedical Engineering, Cairo University, Egypt

Abstract

Keywords

Data Compression, Genomic Sequences, Differential Compression Algorithm.

Username
Password
Remember me

Username
Password
Remember me

AIRCC's International Journal of Computer Science and Information Technology

AIRCC's International Journal of Computer Science and Information Technology

DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database

Keywords

DNA Lossless Differential Compression Algorithm Based on Similarity of Genomic Sequence Database

Authors

Abstract

Keywords