Built-in Big Data Applications Using Restful Web Services
Subscribe/Renew Journal
In this paper designed a tool to build the execution profile of individual Hive queries by extracting information from HIVE and Hadoop logs. The profile consists of detailed information about MapReduce jobs, tasks and attempts belonging to a query. It is stored as a JSON document in MongoDB and can be retrieved to generate reports in charts or tables. The profiling tool tested with several experiments on AWS with TPC-H datasets and queries, it is found that the profiling tool is able to assist developers in comparing HIVE queries written in different formats, running on different data sets and configured with different parameters. It is also able to compare tasks/attempts within the same job to diagnose performance issues.
- Apache hive. [Online]. Available: http://hive.apache.org/
- Apache hadoop. [Online]. Available: http://hadoop.apache.org/
- Apache hadoopnextgenmapreduce (yarn). [Online]. Available: http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarnsite/YARN.html
- Tpc-h benchmark. [Online]. Available: http://www.tpc.org/tpch/
- M. Poess and C. Floyd, “New tpc benchmarks for decision support and web commerce,” SIGMOD Rec., vol. 29, no. 4, pp. 64–71, Dec. 2000. [Online]. Available: http://doi.acm.org/10.1145/369275.369291
- R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang, “Ysmart: Yet another sql-to-mapreduce translator,” in Distributed Computing Systems (ICDCS), 2011 31st International Conference on. IEEE, 2011, pp. 25 – 36.
- Y. Huai, A. Chauhan, A. Gates, G. Hagleitner, E. N. Hanson, O. O’Malley, J. Pandey, Y. Yuan, R. Lee, and X. Zhang, “Major technical advancements in apache hive,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014, pp. 1235–1246.
- Srivastava and A. Eustace, ATOM: A system for building customized program analysis tools. ACM, 1994, vol. 29, no. 6.
- Q. Gao, F. Qin, and D. K. Panda, “Dmtracker: finding bugs in largescale parallel programs by detecting anomaly in data movements,” in Proceedings of the 2007 ACM/IEEE conference on Supercomputing. ACM, 2007, p. 15.
- H. Herodotou and S. Babu, “Profiling, what-if analysis, and cost based optimization of mapreduce programs,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 1111–1122, 2011.
- Btrace: A dynamic instrumentation tool for java. [Online]. Available: https://kenai.com/projects/btrace
- X. Zhao, Y. Zhang, D. Lion, M. Faizan, Y. Luo, D. Yuan, and M. Stumm, “lprof: A nonintrusive request flow profiler for distributed systems,” in Proceedings of the 11th Symposium on Operating Systems Design and Implementation, 2014.
- P. Barham, R. Isaacs, R. Mortier, and D. Narayanan, “Magpie: Online modelling and performance-aware systems.” in HotOS, 2003, pp. 85–90.
- R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica, “X-trace: A pervasive network tracing framework,” in In NSDI, 2007.
- R. R. Sambasivan, A. X. Zheng, M. De Rosa, E. Krevat, S. Whitman, M. Stroucken, W. Wang, L. Xu, and G. R. Ganger, “Diagnosing performance changes by comparing request flows.” in NSDI, 2011.
Abstract Views: 293
PDF Views: 4