Enhancing HiveQL Engine Using Map-Join-Reduce
Subscribe/Renew Journal
Today we are facing information explosion. It brings us the challenge of huge data handling system. Hive is a data warehouse infrastructure based on Hadoop platform. It provides mechanism of huge data organization, extraction methods of data using MapReduce and analysis of large data sets stored in HDFS system.
HiveQL is a query language for Hive for data extraction. It also allows to plug-in custom MapReduce function in addition with traditional MapReduce functionality. This HiveQL MapReduce is under consideration for MapJoinReduce enhancement. This will lead us for detailed study of performance improvement. MapReduce processing strategy frequently checkpoints and shuffles intermediate results data. MapReduce can be made more scalable and efficient by improving the intermediate data handling strategy.
Proposed solution is Map-Join-Reduce. Map-Join-Reduce simplifies the data handling mechanism by removing burden of presenting complex join algorithm. We will first present the UML class diagrams for HiveQL Engine. These diagrams will en-light the HiveQL query execution process. We will present debugging issues for Hive system for reverse engineering and Hive build patch given for errors. Finally we will see propose solution for Map-Join-Reduce.
Keywords
Abstract Views: 221
PDF Views: 3