Open Access
Subscription Access
Open Access
Subscription Access
Biological Sequence Compression Based On Properties of Unique and Repeated Similarities of Sequences Using Variable Length LUT
Subscribe/Renew Journal
Genome may contain several copies of the same gene. Although human genome contains about 3 billion base pairs, only 3% of it encodes protein. There are only about 25000 genes in human genome which encode about 100000 proteins by alternative splicing. Biological sequences are commonly of two types - unique and repeated. We are utilizing these properties of the sequences. The earlier algorithms either work on unique repeat or repeated repeat sequence. We are merging both methodologies to develop a new algorithm which collectively compress both type of sequences, i.e. we can apply the same compression algorithm on all types of sequences. This will definitely reduce our effort for developing different algorithm and it will be easier to apply one single algorithm rather using different algorithm. In this paper, a Biological sequence compression is proposed to compress both unique sequences, which are repeated in one area, and repeated sequences that are interspersed throughout the genome. The algorithm is also compared with existing ones and it is found to achieve better compression ratio than other.
Keywords
Genome, Sequence, Uniqueness, Compression Ratio, DNA Compress, Gen Compress, LUT, Base Pair.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 268
PDF Views: 4