Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Advanced Dictionary Based Lossless Compression Technique for English Text Data


Affiliations
1 IEM, Kolkata, India
     

   Subscribe/Renew Journal


Data compression technique helps us to reduce the size of such large volumes of data that reduces network bandwidth and the storage spaces as well. So text compression is a very important concept in Data Management. The research aim of this paper is to present a new lossless data compression technique for English text compression. It is basically a two steps process. Firstly, there is a reduction using a Dictionary-based lookup table. The dictionary based look-up table is made of as a part of the operating system. The dictionary based look-up table replaces the word by an 18-bit address. The reduction using the look-up table gives us a compression of more than 50% in most cases and the result is stored in a binary file.  It is then followed by a compression using a modified Huffman Algorithm, which takes 6 bit data block at a time to build up the Huffman tree. This step together with the reduction, compresses the file to around 32-38% of its original size. Beside this approach, this paper also describes the comparison of this new technique with other well-known compression methods.


Keywords

Text Segment, Reduction, Dictionary Table, Data Compression.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 266

PDF Views: 3




  • An Advanced Dictionary Based Lossless Compression Technique for English Text Data

Abstract Views: 266  |  PDF Views: 3

Authors

Dipanjan Bhattacharya
IEM, Kolkata, India
Sanjay Chakraborty
IEM, Kolkata, India
Pinkar Roy
IEM, Kolkata, India
Animesh Kairi
IEM, Kolkata, India

Abstract


Data compression technique helps us to reduce the size of such large volumes of data that reduces network bandwidth and the storage spaces as well. So text compression is a very important concept in Data Management. The research aim of this paper is to present a new lossless data compression technique for English text compression. It is basically a two steps process. Firstly, there is a reduction using a Dictionary-based lookup table. The dictionary based look-up table is made of as a part of the operating system. The dictionary based look-up table replaces the word by an 18-bit address. The reduction using the look-up table gives us a compression of more than 50% in most cases and the result is stored in a binary file.  It is then followed by a compression using a modified Huffman Algorithm, which takes 6 bit data block at a time to build up the Huffman tree. This step together with the reduction, compresses the file to around 32-38% of its original size. Beside this approach, this paper also describes the comparison of this new technique with other well-known compression methods.


Keywords


Text Segment, Reduction, Dictionary Table, Data Compression.