Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Biological Sequence Compression Based On Properties of Unique and Repeated Similarities of Sequences Using Variable Length LUT


Affiliations
1 Department of Computer Science & Engineering, B.C.T. Kumaon Engineering College, Dwarahat, Almora, Uttarakhand, India
2 ECE Department, B.C.T. Kumaon Engineering College, Dwarahat, Almora, Uttarakhand, India
     

   Subscribe/Renew Journal


Genome may contain several copies of the same gene. Although human genome contains about 3 billion base pairs, only 3% of it encodes protein. There are only about 25000 genes in human genome which encode about 100000 proteins by alternative splicing. Biological sequences are commonly of two types - unique and repeated. We are utilizing these properties of the sequences. The earlier algorithms either work on unique repeat or repeated repeat sequence. We are merging both methodologies to develop a new algorithm which collectively compress both type of sequences, i.e. we can apply the same compression algorithm on all types of sequences. This will definitely reduce our effort for developing different algorithm and it will be easier to apply one single algorithm rather using different algorithm. In this paper, a Biological sequence compression is proposed to compress both unique sequences, which are repeated in one area, and repeated sequences that are interspersed throughout the genome. The algorithm is also compared with existing ones and it is found to achieve better compression ratio than other.

Keywords

Genome, Sequence, Uniqueness, Compression Ratio, DNA Compress, Gen Compress, LUT, Base Pair.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 268

PDF Views: 4




  • Biological Sequence Compression Based On Properties of Unique and Repeated Similarities of Sequences Using Variable Length LUT

Abstract Views: 268  |  PDF Views: 4

Authors

Rajendra Kumar Bharti
Department of Computer Science & Engineering, B.C.T. Kumaon Engineering College, Dwarahat, Almora, Uttarakhand, India
Archana Verma
Department of Computer Science & Engineering, B.C.T. Kumaon Engineering College, Dwarahat, Almora, Uttarakhand, India
R. K. Singh
ECE Department, B.C.T. Kumaon Engineering College, Dwarahat, Almora, Uttarakhand, India

Abstract


Genome may contain several copies of the same gene. Although human genome contains about 3 billion base pairs, only 3% of it encodes protein. There are only about 25000 genes in human genome which encode about 100000 proteins by alternative splicing. Biological sequences are commonly of two types - unique and repeated. We are utilizing these properties of the sequences. The earlier algorithms either work on unique repeat or repeated repeat sequence. We are merging both methodologies to develop a new algorithm which collectively compress both type of sequences, i.e. we can apply the same compression algorithm on all types of sequences. This will definitely reduce our effort for developing different algorithm and it will be easier to apply one single algorithm rather using different algorithm. In this paper, a Biological sequence compression is proposed to compress both unique sequences, which are repeated in one area, and repeated sequences that are interspersed throughout the genome. The algorithm is also compared with existing ones and it is found to achieve better compression ratio than other.

Keywords


Genome, Sequence, Uniqueness, Compression Ratio, DNA Compress, Gen Compress, LUT, Base Pair.