Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

OSI Text Document Clustering Values Based on Frequent Greedy Technique Algorithm


Affiliations
1 Department of Computer Science, GATE College, Tirupati, Andhra Pradesh, India
     

   Subscribe/Renew Journal


The principle objective for records mining is to do away with the information and examples from a whole lot of data or dataset. Data mining is accustomed to investigating the proper data and looking forward to future information. The trouble of missing characteristics (MVs) has shown up for the maximum part in veritable global datasets and obstructed using various quantifiable or AI computations for information evaluation in view in their clumsiness in handling insufficient datasets [1, 2]. To address this problem, more than one MV credit score estimations were made. In any case, those techniques don’t carry out well when maximum by using a way of the poor tuples are assembled with every other, created right here as the Clustered Missing Values Phenomenon, which credit to the nonappearance of nice complete tuples close to an MV for attribution. Right now, advocate the Order-Sensitive Imputation for Clustered Missing characteristics (OSICM) framework, wherein lacking characteristics are recounted progressively for the final goal that the characteristics filled before within the procedure are in like manner used for later credit of various MVs. Obviously, the solicitation of attributions is essential to the ampleness and capability of OSICM framework. We parent the searching out of the best attribution demand as development trouble and display its NP-hardness. Additionally, we devise an estimation to find the fantastic best sport plan and endorse two accumulated/heuristic computations to trade-off reasonability for viability [3]. Finally, we direct expansive preliminaries on certifiable and constructed datasets to expose the power of our OSICM framework.

Keywords

Clustered MVS Phenomenon, Missing Value, Order-Sensitive Imputation, OSICM System.
Subscription Login to verify subscription
User
Notifications
Font Size


  • X. Su, R. Greiner, T. M. Khoshgoftaar, and A. Napolitano, “Using classifier-based nominal imputation to improve machine learning,” In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’11), pp. 124-135, 2011.
  • X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, ….., and W. Zhang, “Knowledge vault: A webscale approach to probabilistic knowledge fusion,” In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14), New York, NY, USA, pp. 601-610, 2014,
  • J. Tang, B. Jiang, A. Zheng, and B. Luo, “Graph matching based on spectral embedding with missing value,” Pattern Recognition, vol. 45, no. 10, pp. 3768-3779, 2012.
  • D. W. Joenssen, and U. Bankhofer, “Hot deck methods for imputing missing data - The effects of limiting donor usage,” In International Workshop on Machine Learning and Data Mining in Pattern Recognition (MLDM’12), pp. 63-75, 2012.
  • T. Aittokallio, “Dealing with missing values in large-scale studies: Microarray data imputation and beyond,” Briefings in Bioinformatics, vol. 11, no. 2, pp. 253-264, 2010.
  • X. Zhu, S. Zhang, Z. Jin, and Z. Xu, “Missing value estimation for mixed-attribute data sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 110-121, 2011.
  • X.-P. Zhang, A. S. Khwaja, J. Luo, A. S. Housfater, and A. Anpalagan, “Convergence analysis of multiple imputations particle filters for dealing with missing data in nonlinear problems,” In 2014 IEEE International Symposium on Circuits and Systems (ISCAS’14), pp. 2567-2570, 2014.
  • D. Sovilj, E. Eirola, Y. Miche, K.-M. Björk, R. Nian, A. Akusok, and A. Lendasse, “Extreme learning machine for missing data using multiple imputations,” Neurocomputing, vol. 174, part-A, pp. 220-231, 2016.
  • C. Zhang, X. Zhu, J. Zhang, Y. Qin, and S. Zhang, “GBKII: An imputation method for missing values,” In Advances in Knowledge Discovery and Data Mining, pp. 1080-1087, 2007.
  • X. Zhang, X. Song, H. Wang, and H. Zhang, “Sequential local least squares imputation estimating missing value of microarray data,” Computers in Biology and Medicine, vol. 38, no. 10, pp. 1112-1120, 2008.
  • L. van der Maaten, “Accelerating t-SNE using tree-based algorithms,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3221-3245, 2014.
  • D. T. Searls, “The utilization of a known coefficient of variation in the estimation procedure,” Journal of the American Statistical Association, vol. 59, no. 308, pp. 1225-1226, 1964.
  • W. Fan, J. Li, S. Ma, N. Tang, and W. Yu, “Towards certain fixes with editing rules and master data,” The VLDB Journal, vol. 21, no. 2, pp. 213-238, 2012.
  • C. Mayfield, J. Neville, and S. Prabhakar, “ERACER: A database approach for statistical inference and data cleaning,” In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), Indianapolis, Indiana, USA, pp. 75-86, 2010.
  • S. Song, A. Zhang, L. Chen, and J. Wang, “Enriching data imputation with extensive similarity neighbors,” Proceedings of the VLDB Endowment, vol. 8, no. 11, pp. 1286-1297, 2015.

Abstract Views: 195

PDF Views: 0




  • OSI Text Document Clustering Values Based on Frequent Greedy Technique Algorithm

Abstract Views: 195  |  PDF Views: 0

Authors

Gnana Prasunamba Jyosyula
Department of Computer Science, GATE College, Tirupati, Andhra Pradesh, India

Abstract


The principle objective for records mining is to do away with the information and examples from a whole lot of data or dataset. Data mining is accustomed to investigating the proper data and looking forward to future information. The trouble of missing characteristics (MVs) has shown up for the maximum part in veritable global datasets and obstructed using various quantifiable or AI computations for information evaluation in view in their clumsiness in handling insufficient datasets [1, 2]. To address this problem, more than one MV credit score estimations were made. In any case, those techniques don’t carry out well when maximum by using a way of the poor tuples are assembled with every other, created right here as the Clustered Missing Values Phenomenon, which credit to the nonappearance of nice complete tuples close to an MV for attribution. Right now, advocate the Order-Sensitive Imputation for Clustered Missing characteristics (OSICM) framework, wherein lacking characteristics are recounted progressively for the final goal that the characteristics filled before within the procedure are in like manner used for later credit of various MVs. Obviously, the solicitation of attributions is essential to the ampleness and capability of OSICM framework. We parent the searching out of the best attribution demand as development trouble and display its NP-hardness. Additionally, we devise an estimation to find the fantastic best sport plan and endorse two accumulated/heuristic computations to trade-off reasonability for viability [3]. Finally, we direct expansive preliminaries on certifiable and constructed datasets to expose the power of our OSICM framework.

Keywords


Clustered MVS Phenomenon, Missing Value, Order-Sensitive Imputation, OSICM System.

References