Enhancing HiveQL Engine Using Map-Join-Reduce

Amruta Kulkarni; Shweta Dharmadhikari

Enhancing HiveQL Engine Using Map-Join-Reduce

Amruta Kulkarni , Shweta Dharmadhikari

Affiliations
1 Pune Institute of Computer Technology, Pune, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.

HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.

Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.