Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Survey on Data Compression in Big Data by Using Various Methods


Affiliations
1 Department of Computer Science, Nehru Arts and Science College, Coimbatore, India
2 PG & Research Department Computer Science, Nehru Arts and Science College, Coimbatore, India
     

   Subscribe/Renew Journal


Now a days Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing act as a stack of massive, storage, and software services in a scalable manner. Current big sensing data uses cloud computing technique for this purpose. Here we are using data compression techniques. Based on specific on cloud data compression requirements we propose a scalable data compression approach based on similarity among chunks. Map Reduce algorithm is used for this purpose. Aprominent parallel data processing tool is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. This survey intends to assist the data base and open source communities in understanding various technical aspects of the MAPREDUCE framework. As the name suggests reducer phase take place in two sections. Map Reduce is used as an algorithm for the implementation to achieve extra scalability on Cloud. In general, big data is a collection of data sets so large and complex that it becomes extremely difficult to process with on-hand database management systems or traditional data processing tools. It represents the progress of the human cognitive processes, usually includes data sets with sizes beyond the ability of current technology, method and theory to capture, manage and process the data within a tolerable elapsed time .The big sensing data from different kinds of sensing systems is high heterogeneous, and it has typical characteristics of common real world big data. They are five ‘V’s, Volume, Variety, Velocity, Veracity and Veracity. Data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss. But due to the size and speed of big sensing data in real world, the current data compression and reduction techniques still need to be improved It has been well recognized that big sensing data or big data sets from mesh networks such as sensor systems and social networks can take the form of big graph data. To process those big graph data, current techniques normally introduce complex and multiple iterations. In order to cope with that huge volume big sensing data, different techniques can have been developed on-line or off-line, centralized or distributed.
User
Subscription Login to verify subscription
Notifications
Font Size

  • “Big data: science in petabyte era: community cleverness required” Nature 455 (7209): 1, 2008.
  • S. Tsuchiya, Y. Sakamoto, Y Tsuchimoto and V. Lee, “Big Data Processing in cloud environments” FUJITSU Science and Technology Journal, 48(2) 159-168 2002.
  • K.H. Lee, Y. J. Lee, H. Choi, Y.D. Chung and B. Moon, “Parellel data processing with Mapreduce: a survey” ACM SIGMOD record 40(4): 11-20, 2012.
  • Big Data beyond Mapreduce: Google’s Big Data Papers, http://architects.dzone.com/articles/big-data-beyond-mapreduce, accessed on November 20 2015.

Abstract Views: 282

PDF Views: 4




  • Survey on Data Compression in Big Data by Using Various Methods

Abstract Views: 282  |  PDF Views: 4

Authors

N. Kavitha
Department of Computer Science, Nehru Arts and Science College, Coimbatore, India
K. Subhadra
PG & Research Department Computer Science, Nehru Arts and Science College, Coimbatore, India

Abstract


Now a days Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing act as a stack of massive, storage, and software services in a scalable manner. Current big sensing data uses cloud computing technique for this purpose. Here we are using data compression techniques. Based on specific on cloud data compression requirements we propose a scalable data compression approach based on similarity among chunks. Map Reduce algorithm is used for this purpose. Aprominent parallel data processing tool is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. This survey intends to assist the data base and open source communities in understanding various technical aspects of the MAPREDUCE framework. As the name suggests reducer phase take place in two sections. Map Reduce is used as an algorithm for the implementation to achieve extra scalability on Cloud. In general, big data is a collection of data sets so large and complex that it becomes extremely difficult to process with on-hand database management systems or traditional data processing tools. It represents the progress of the human cognitive processes, usually includes data sets with sizes beyond the ability of current technology, method and theory to capture, manage and process the data within a tolerable elapsed time .The big sensing data from different kinds of sensing systems is high heterogeneous, and it has typical characteristics of common real world big data. They are five ‘V’s, Volume, Variety, Velocity, Veracity and Veracity. Data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss. But due to the size and speed of big sensing data in real world, the current data compression and reduction techniques still need to be improved It has been well recognized that big sensing data or big data sets from mesh networks such as sensor systems and social networks can take the form of big graph data. To process those big graph data, current techniques normally introduce complex and multiple iterations. In order to cope with that huge volume big sensing data, different techniques can have been developed on-line or off-line, centralized or distributed.

References