Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Analyzing the Effect of Adding Noise on Compressed Textual Data


Affiliations
1 Thapar University, Patiala, India
2 CSED, Thapar University, Patiala, India
     

   Subscribe/Renew Journal


Compression is one of the techniques for better utilization of storage devices, resulting in saving of storage space.This paper addresses compression by using the technique called Normalized Compression Distance (NCD). The Normalized Compression Distance is based on algorithmic complexity developed by Kolmogorov, called Normalized Information Distance.Normalized Compression Distance can be used to cluster objects of any kind, such as music, texts, or gene sequences (microarray classification). The NCD between two binary strings is defined in terms of compressed sizes of the two strings and of their concatenation; it is designed to be an effective approximation of the non computable but universal Kolmogorov distance between two strings. This paper studies the influence of noise on the normalized compression distance, a measure based on the use of compressors to compute the degree of similarity of two files. This influence is approximated by a first order differential equation which gives rise to a complex effect, which explains the fact that the NCD may give values greater than 1. Finally, the analyzing the effect of adding noise on compressed textual data and findings are that NCD performs well even in the presence of quite high noise levels by using CompLearn Toolkit.

Keywords

Kolmogorov Complexity, Normalized Information Distance, Normalized Compression Distance, Compression.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 281

PDF Views: 2




  • Analyzing the Effect of Adding Noise on Compressed Textual Data

Abstract Views: 281  |  PDF Views: 2

Authors

Sudesh Kumar
Thapar University, Patiala, India
Shalini Batra
CSED, Thapar University, Patiala, India

Abstract


Compression is one of the techniques for better utilization of storage devices, resulting in saving of storage space.This paper addresses compression by using the technique called Normalized Compression Distance (NCD). The Normalized Compression Distance is based on algorithmic complexity developed by Kolmogorov, called Normalized Information Distance.Normalized Compression Distance can be used to cluster objects of any kind, such as music, texts, or gene sequences (microarray classification). The NCD between two binary strings is defined in terms of compressed sizes of the two strings and of their concatenation; it is designed to be an effective approximation of the non computable but universal Kolmogorov distance between two strings. This paper studies the influence of noise on the normalized compression distance, a measure based on the use of compressors to compute the degree of similarity of two files. This influence is approximated by a first order differential equation which gives rise to a complex effect, which explains the fact that the NCD may give values greater than 1. Finally, the analyzing the effect of adding noise on compressed textual data and findings are that NCD performs well even in the presence of quite high noise levels by using CompLearn Toolkit.

Keywords


Kolmogorov Complexity, Normalized Information Distance, Normalized Compression Distance, Compression.