Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Enhancing HiveQL Engine Using Map-Join-Reduce


Affiliations
1 Pune Institute of Computer Technology, Pune, India
     

   Subscribe/Renew Journal


Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.

HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.

 Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.


Keywords

Hadoop, Hive, HiveQL.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 221

PDF Views: 3




  • Enhancing HiveQL Engine Using Map-Join-Reduce

Abstract Views: 221  |  PDF Views: 3

Authors

Amruta Kulkarni
Pune Institute of Computer Technology, Pune, India
Shweta Dharmadhikari
Pune Institute of Computer Technology, Pune, India

Abstract


Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.

HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.

 Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.


Keywords


Hadoop, Hive, HiveQL.