Open Access Open Access  Restricted Access Subscription Access

Feature Influence Based ETL for Efficient Big Data Management


Affiliations
1 SRM Institute of Science and Technology, Kattankulathur, Chengalpat 603 203, Tamil Nadu, India
 

The increased volume of big data introduces various challenges for its maintenance and analysis. There exist various approaches to the problem, but they fail to achieve the expected results. To improve the big data management performance, an efficient real time feature influence analysis based Extraction, Transform, and Loading (ETL) framework is presented in this article. The model fetches the big data and analyses the features to find noisy records by preprocessing the data set. Further, the method performs feature extraction and applies feature influence analysis to various data nodes and the data present in the data nodes. The method estimates Feature Specific Informative Influence (FSII) and Feature Specific Supportive Influence (FSSI). The value of FSII and FSSI are measured with the support of a data dictionary. The class ontology belongs to various classes of data. The value of FSII is measured according to the presence of a concrete feature on a tuple towards any data node, whereas the value of FSSI is measured based on the appearance of supportive features on any data point towards the data node. Using these measures, the method computes the Node Centric Transformation Score (NCTS). Based on the value of NCTS the method performs map reduction and merging of data nodes. The NCTS_FIA method achieves higher performance in the ETL process. By adapting feature influence analysis in big data management, the ETL performance is improved with the least amount of time complexity.

Keywords

Cloud, FSII, FSSI, FIA, NCTS, Ontology.
User
Notifications
Font Size


  • Feature Influence Based ETL for Efficient Big Data Management

Abstract Views: 201  |  PDF Views: 115

Authors

M Vijayalakshmi
SRM Institute of Science and Technology, Kattankulathur, Chengalpat 603 203, Tamil Nadu, India
R I Minu
SRM Institute of Science and Technology, Kattankulathur, Chengalpat 603 203, Tamil Nadu, India

Abstract


The increased volume of big data introduces various challenges for its maintenance and analysis. There exist various approaches to the problem, but they fail to achieve the expected results. To improve the big data management performance, an efficient real time feature influence analysis based Extraction, Transform, and Loading (ETL) framework is presented in this article. The model fetches the big data and analyses the features to find noisy records by preprocessing the data set. Further, the method performs feature extraction and applies feature influence analysis to various data nodes and the data present in the data nodes. The method estimates Feature Specific Informative Influence (FSII) and Feature Specific Supportive Influence (FSSI). The value of FSII and FSSI are measured with the support of a data dictionary. The class ontology belongs to various classes of data. The value of FSII is measured according to the presence of a concrete feature on a tuple towards any data node, whereas the value of FSSI is measured based on the appearance of supportive features on any data point towards the data node. Using these measures, the method computes the Node Centric Transformation Score (NCTS). Based on the value of NCTS the method performs map reduction and merging of data nodes. The NCTS_FIA method achieves higher performance in the ETL process. By adapting feature influence analysis in big data management, the ETL performance is improved with the least amount of time complexity.

Keywords


Cloud, FSII, FSSI, FIA, NCTS, Ontology.

References