Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Monitoring Aspects of Cloud Over the Big Data Analytics Using the Hadoop for Managing Short Files


Affiliations
1 Banasthali Vidyapith, Tonk, Rajasthan, India
2 Nic, Delhi, India
3 Manipal University, Jaipur, India
     

   Subscribe/Renew Journal


This paper presents a review study on cloud computing and the big data analytics using the hadoop. Hadoop is an open source tool used for data storage of unstructured data. Hadoop can also be defined as the engineering part of big data which is only a predictive analysis and it is mainly used for processing and analysis of data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another one is Map Reduce which is a function used for parallel processing of data. Hadoop does not perform well for short files as a large number of short files pose a heavy burden on the Name Node of HDFS and an increase in execution time for Map Reduce is encountered. Hadoop is designed to handle large size files and hence suffers a performance penalty while dealing with large number of short files. This research work gives an introduction about HDFS, short file problem and existing ways to deal with it. Now a day's storage is not a big issue, the issue is how we can make sense of data and how to explain to the industry that our cloud is safe.

Keywords

Big Data Analytics, Cloud Computing, Hadoop, Short Files.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 267

PDF Views: 2




  • Monitoring Aspects of Cloud Over the Big Data Analytics Using the Hadoop for Managing Short Files

Abstract Views: 267  |  PDF Views: 2

Authors

Prerna Kumari
Banasthali Vidyapith, Tonk, Rajasthan, India
Himanshu Sharma
Nic, Delhi, India
Aishwarya Shekhar
Manipal University, Jaipur, India

Abstract


This paper presents a review study on cloud computing and the big data analytics using the hadoop. Hadoop is an open source tool used for data storage of unstructured data. Hadoop can also be defined as the engineering part of big data which is only a predictive analysis and it is mainly used for processing and analysis of data. It has mainly two core components: HDFS (Hadoop distributed file system) which stores large amount of data in a reliable manner and another one is Map Reduce which is a function used for parallel processing of data. Hadoop does not perform well for short files as a large number of short files pose a heavy burden on the Name Node of HDFS and an increase in execution time for Map Reduce is encountered. Hadoop is designed to handle large size files and hence suffers a performance penalty while dealing with large number of short files. This research work gives an introduction about HDFS, short file problem and existing ways to deal with it. Now a day's storage is not a big issue, the issue is how we can make sense of data and how to explain to the industry that our cloud is safe.

Keywords


Big Data Analytics, Cloud Computing, Hadoop, Short Files.