Open Access Open Access  Restricted Access Subscription Access

Pre-Requisites of Big Data and Hadoop


Affiliations
1 Department of Computer Engineering, Engineering Wing, Punjabi University, Patiala, India
 

Data is increasing at a speed that is even difficult to capture in today's world as a result of regular uploading of data to the internet. Social networking websites are the main sources for the collection of this huge amount of data. This collection of online data happens at a great pace. "Big Data" is a collection of data sets that is extremely large in numbers highlighting its size; it varies from one another i.e. different varieties of data; and velocity of collection of data i.e. its speed. This extreme amount of data is difficult to handle by traditional software packages. Hence it becomes necessary to introduce a new and efficient method to handle this huge data and then came into existence is Hadoop. Hadoop is an open source project that enables separation of processing of different datasets using simple programming model along with high rate of fault tolerance. Hadoop consists of two main parts: HDFS for storing large amount of data and MapReduce for analysis of data. It becomes easy to handle even petabytes or zettabytes of data with the help of this Java based software i.e. Hadoop.
User
Notifications
Font Size

Abstract Views: 109

PDF Views: 5




  • Pre-Requisites of Big Data and Hadoop

Abstract Views: 109  |  PDF Views: 5

Authors

Jagbir Singh
Department of Computer Engineering, Engineering Wing, Punjabi University, Patiala, India
Rakesh Singh
Department of Computer Engineering, Engineering Wing, Punjabi University, Patiala, India

Abstract


Data is increasing at a speed that is even difficult to capture in today's world as a result of regular uploading of data to the internet. Social networking websites are the main sources for the collection of this huge amount of data. This collection of online data happens at a great pace. "Big Data" is a collection of data sets that is extremely large in numbers highlighting its size; it varies from one another i.e. different varieties of data; and velocity of collection of data i.e. its speed. This extreme amount of data is difficult to handle by traditional software packages. Hence it becomes necessary to introduce a new and efficient method to handle this huge data and then came into existence is Hadoop. Hadoop is an open source project that enables separation of processing of different datasets using simple programming model along with high rate of fault tolerance. Hadoop consists of two main parts: HDFS for storing large amount of data and MapReduce for analysis of data. It becomes easy to handle even petabytes or zettabytes of data with the help of this Java based software i.e. Hadoop.