Open Access
Subscription Access
Open Access
Subscription Access
Enhancing HiveQL Engine Using Map-Join-Reduce
Subscribe/Renew Journal
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
This HiveQL is allowing enhancement of MapReduce to MapJoinReduce for our convenience. This will lead us for detailed study of performance improvement.
The programmer is only required to write specialized map and reduce functions as part of the Map/Reduce job. Framework takes care of the rest. But MapReduce finds performance issue. The performance issue is mainly due to MapReduce sequential data processing strategy which frequently checkpoints and shuffles intermediate results in data processing. So MapReduce can be improved to increase scalability and efficiency.
And proposed solution is Map-Join-Reduce. Map-Join-Reduce remove the burden of presenting complex join algorithms to the system. We first proposed filter-join-aggregate mathematical model which is an extension of MapReduce model. To support this mathematical model we present a MapJoinReduce architecture design for HiveQL engine. This architecture design will put light on strategy of query processing by Hive system and Hadoop system.
Benefit of this approach is minimized check pointing and shuffling of intermediate result and further more improves performance of system.
This HiveQL is allowing enhancement of MapReduce to MapJoinReduce for our convenience. This will lead us for detailed study of performance improvement.
The programmer is only required to write specialized map and reduce functions as part of the Map/Reduce job. Framework takes care of the rest. But MapReduce finds performance issue. The performance issue is mainly due to MapReduce sequential data processing strategy which frequently checkpoints and shuffles intermediate results in data processing. So MapReduce can be improved to increase scalability and efficiency.
And proposed solution is Map-Join-Reduce. Map-Join-Reduce remove the burden of presenting complex join algorithms to the system. We first proposed filter-join-aggregate mathematical model which is an extension of MapReduce model. To support this mathematical model we present a MapJoinReduce architecture design for HiveQL engine. This architecture design will put light on strategy of query processing by Hive system and Hadoop system.
Benefit of this approach is minimized check pointing and shuffling of intermediate result and further more improves performance of system.
Keywords
CPU and Memory Analysis, Hadoop, HiveQL.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 227
PDF Views: 3