Open Access Open Access  Restricted Access Subscription Access

Various Tools and Techniques to Assess Information from Big Data


Affiliations
1 Department of Computer Engineering, Punjabi University, Patiala, Punjab, India
2 Department of Computer Engineering, Punjabi University, Punjab, India
 

Big data refers to sets of data with high computational complexity and that is larger than the capacity of traditional software tools to seize, accumulate and investigate. It relates to structured and unstructured data. Big data actually revolves around 3 V's-velocity i.e. speed, volume i.e. quantity and variety i.e. types of data. Big Data is data generated from social media (Facebook, Twitter etc.) , the data generated by networks, for example IOT (Internet of Things).This research paper sheds light on various issues related to tools available, languages used to explore big data and also mining techniques needed to fetch and analyze big data. The Methodology used is the Beautiful Soup which is a python library that can perform parsing of html page and web scraping .Web scraping helps to transform unstructured data to structured form. From the study, it has been observed that in today's era, python is the most powerful language to fetch and analyze big data, because it can handle Zeta Bytes (ZBs) amount of data. Java and other languages cannot handle data more than Giga Bytes (GBs). Hadoop is the most useful and powerful tool for distributed storage and processing of large datasets, by the use of various plug-ins, it becomes easy to analyze big data.

Keywords

Big Data, Web Mining, Web Scraping, Beautiful Soup Python Library.
User
Notifications
Font Size

Abstract Views: 142

PDF Views: 1




  • Various Tools and Techniques to Assess Information from Big Data

Abstract Views: 142  |  PDF Views: 1

Authors

Gagandeep Kaur
Department of Computer Engineering, Punjabi University, Patiala, Punjab, India
Harpreet Kaur
Department of Computer Engineering, Punjabi University, Punjab, India

Abstract


Big data refers to sets of data with high computational complexity and that is larger than the capacity of traditional software tools to seize, accumulate and investigate. It relates to structured and unstructured data. Big data actually revolves around 3 V's-velocity i.e. speed, volume i.e. quantity and variety i.e. types of data. Big Data is data generated from social media (Facebook, Twitter etc.) , the data generated by networks, for example IOT (Internet of Things).This research paper sheds light on various issues related to tools available, languages used to explore big data and also mining techniques needed to fetch and analyze big data. The Methodology used is the Beautiful Soup which is a python library that can perform parsing of html page and web scraping .Web scraping helps to transform unstructured data to structured form. From the study, it has been observed that in today's era, python is the most powerful language to fetch and analyze big data, because it can handle Zeta Bytes (ZBs) amount of data. Java and other languages cannot handle data more than Giga Bytes (GBs). Hadoop is the most useful and powerful tool for distributed storage and processing of large datasets, by the use of various plug-ins, it becomes easy to analyze big data.

Keywords


Big Data, Web Mining, Web Scraping, Beautiful Soup Python Library.