Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Survey on Big Data and Machine Intelligence Tools


Affiliations
1 Department of CSE, Sri Chandrasekharendra Saraswathi Viswa University, Enathur, Kanchipuram, Tamil Nadu, India
2 Sri Chandrasekharendra Saraswathi Viswa University, Enathur, Kanchipuram, Tamil Nadu, India
     

   Subscribe/Renew Journal


Data is growing at an exponential phase today that posing challenges in analyzing, handling and sharing. The task of choosing the correct machine learning tools for such huge datasets is a difficult task. Each tool have their own limitations. Traditional tools fail to perform real time processing of huge datasets. This paper is intended for the individuals those who are interested to know about machine intelligence tools and how they are related to perform big data analytics. We have given the overview of each tools that are available with their latest versions and releases. To begin with, we have started with the introduction to big data, Hadoop and machine intelligence techniques. Then we go to the machine intelligence tools and understand the application areas where they can be implemented. We discuss the key features of each tool and provide a comparative study of all the tools. So, this paper aims to help the users to choose or take decisions easily in choosing the tools.


Keywords

Big Data, Hadoop, Machine Learning.
Subscription Login to verify subscription
User
Notifications
Font Size


  • International Data Corporation. Digital Universe Study. (2014). Retrieved from http://www.emc.com/leadership/ digital-universe/index.htm.
  • Ancestry.com Fact Sheet. http://corporate.ancestry.com/ press/company-facts/.
  • Landset, S. (2015). A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, 2(24).
  • Apache Hadoop. Retrieved from https://hadoop.apache.org/.
  • Feller J., & Fitzgerald, B. (2002). Understanding open source software development. Addison-Wesley, London, Retrieved from http://dl.acm.org/citation. cfm?id=513726.
  • MOA (Massive Online Analysis). Retrieved from http:// moa.cs.waikato.ac.nz/.
  • Hellerstein, J. M., Schoppmann, F., Wang, D. Z., Fratkin, E, Welton, C., Feng, X., Li, K., & Kumar, A. (2012). The MADlib Analytics Library or MAD Skills. The SQL.In: VLDB Endowment, (pp. 1700-171).
  • Dato Core. Retrieved from https://github.com/dato-code/ Dato-Core.
  • O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of Biomedical Informatics, 46(5), 774-781
  • Bellini, P., di Claudio, M., Nesi, P., & Rauch, N. (2013). Tassonomy and review of Big data solutions navigation. In Big Data Computing. Chapman and Hall/ CRC, Boca Raton, (pp. 57).
  • Howell-Barber, H., Lawler, J. P., Joseph, A., & Narula, S. (2013). A study of cloud computing Software-as-aService (SaaS). Financial Firms. Cloud Computing, Special Issue.
  • Foster, I., Yong, Z., Raicu, I., & Shiyong, L. (2008). Cloud computing and grid computing 360-degree compared. Grid Computing Environments Workshop, 2008. GCE’08, Austin, Texas., Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4738445.
  • Lawton, G. (2008). Developing software online with platform-as-a-service technology. Computer, June, 41(6), 13-15.
  • Bhardwaj S, Jain L, & Jain, S. (2010). Cloud computing: A study of infrastructure as a service (IAAS). International Journal of Engineering and Information Technology, 2(1), 60-63.
  • Schutt, R., & O’Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. O’Reilly Media, Inc. Retrieved from http://dl.acm.org/citation. cfm?id=2544025.
  • Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. META Group.
  • Bekkerman, R., Bilenko, M., & Langford, J. (2011). Scaling up machine learning: Parallel and distributed approaches. Cambridge: Cambridge University Press.
  • Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation.
  • Apache Hama. Retrieved from https://hama.apache.org/.http://www.skytree.net/machine-learning/why-do-machine-learning-big-data/
  • http://mahout.apache.org/users/basics/algorithms.html
  • http://spark.apache.org/mllib/
  • http://scikit-learn.org/stable/#
  • http://www.shogun-toolbox.org/page/features/
  • http://accord-framework.net/intro.html.
  • http://www.cloudera.com/developers/cloudera-labs.html
  • http://oryx.io/.
  • http://wiki.pentaho.com/display/DATAMINING/Data+ Mining+Algorithms+and+Tools+in+Weka
  • https://en.wikipedia.org/wiki/Weka_(machine_learning).
  • http://cs.stanford.edu/people/karpathy/convnetjs/index.html
  • http://www.nvidia.com/object/cuda_home_new.html#sthash.0Vo1PF8C.dpuf.
  • NVIDIA CUDA TOOLKIT 7.5” Release Notes for Windows, Linux and Mac OS, RN – 06722-001_ v7.5, September (2015). Retrieved from
  • https://en.wikipedia.org/wiki/CUDA

Abstract Views: 278

PDF Views: 3




  • Survey on Big Data and Machine Intelligence Tools

Abstract Views: 278  |  PDF Views: 3

Authors

Shyam Mohan
Department of CSE, Sri Chandrasekharendra Saraswathi Viswa University, Enathur, Kanchipuram, Tamil Nadu, India
P. Shanmugapriya
Sri Chandrasekharendra Saraswathi Viswa University, Enathur, Kanchipuram, Tamil Nadu, India

Abstract


Data is growing at an exponential phase today that posing challenges in analyzing, handling and sharing. The task of choosing the correct machine learning tools for such huge datasets is a difficult task. Each tool have their own limitations. Traditional tools fail to perform real time processing of huge datasets. This paper is intended for the individuals those who are interested to know about machine intelligence tools and how they are related to perform big data analytics. We have given the overview of each tools that are available with their latest versions and releases. To begin with, we have started with the introduction to big data, Hadoop and machine intelligence techniques. Then we go to the machine intelligence tools and understand the application areas where they can be implemented. We discuss the key features of each tool and provide a comparative study of all the tools. So, this paper aims to help the users to choose or take decisions easily in choosing the tools.


Keywords


Big Data, Hadoop, Machine Learning.

References