Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Improved Feature Extraction on Text Documents using Neural Network Model


Affiliations
1 Department of Computer and Information Science, Annamalai University, India
     

   Subscribe/Renew Journal


In natural language processing, the text clustering plays a major role on reducing the text dimensionality. However, the lack of data models has made the clustering algorithm to face sparsity problems. The integration with deep learning has resolved the problem of scarce knowledge on text documents. However, deeper architectures learn such redundant features, which limit the efficiency of solutions. In this paper, a complete extraction of features from text document using neural network model. The neural network model utilizes feed forward mechanism and a type of unsupervised learning that denoises the corrupted input features. The reconstructed feature is used for initialing the feed forward network. This method reduces the manual labelling in the process of screening. For evaluation, series of experiments are conducted to investigate the performance of the method over the text datasets with various conventional algorithms.

Keywords

Text Document, Feature Extraction, Neural Network, Denoising.
Subscription Login to verify subscription
User
Notifications
Font Size

  • C.C. Aggarwal and C. Zhai, “A Survey of Text Classification Algorithms”, Springer, 2012.
  • W. Aziguli, Y. Zhang, Y. Xie and D. Zhang, “A Robust Text Classifier based on Denoising Deep Neural Network in the Analysis of Big Data”, Scientific Programming, Vol. 2017, pp. 1-20, 2017.
  • L.E. Peterson, “K-Nearest Neighbor”, Scholarpedia, Vol. 4, No. 2, pp. 1883-1887, 2009.
  • P. Langley, W. Iba and K. Thompson, “An Analysis of Bayesian Classifiers”, Aaai, Vol. 90, pp. 223-228, 1992.
  • X. Luo, J. Deng, J. Liu and W. Wang, “A Quantized Kernel Least Mean Square Scheme with Entropy-Guided Learning for Intelligent Data Analysis”, China Communications, Vol. 14, No. 7, pp. 1-10, 2017.
  • T.N. Lal, O. Chapelle and J. Weston, “Embedded Methods”, Springer, 2006.
  • A. Rehman, K. Javed and H.A. Babri, “Feature Selection based on a Normalized Difference Measure for Text Classification”, Information Processing and Management, Vol. 53, No. 2, pp. 473-489, 2017.
  • R. Wald, T. Khoshgoftaar and A. Napolitano, “Filter-and Wrapper-based Feature Selection for Predicting user Interaction with Twitter Bots”, Proceedings of IEEE International Conference on Information Reuse and Integration, pp. 416-423, 2013.
  • I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, Vol. 3, No. 2, pp. 1157-1182, 2003.
  • Q. Le and T. Mikolov, “Distributed Representations of Sentences and Documents”, Proceedings of International Conference on Machine Learning, pp. 1188-1196, 2014.
  • M. Jiang, Y. Liang and X. Feng, “Text Classification based on Deep Belief Network and Softmax Regression”, Neural Computing and Applications, Vol. 29, No. 1, pp. 61-70, 2018.
  • C.H. Shih, B.C. Yan and S.H. Liu, “Investigating Siamese LSTM Networks for Text Categorization”, Proceedings of Asia-Pacific Conference on Signal and Information Processing Association Annual Summit, pp. 641-646, 2017.
  • C.Y. Lee, S. Xie, P. Gallagher and Z. Zhang, “Deeply-Supervised Nets”, Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 562-570, 2015.
  • C. Szegedy, W. Liu, Y. Jia and P. Sermanet, “Going Deeper with Convolutions”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
  • M. Denil, B. Shakibi, L. Dinh and M.A. Ranzato, “Predicting Parameters in Deep Learning”, Proceedings of International Conference on Advances in Neural Information Processing Systems, pp. 2148-2156, 2013.
  • B.O. Ayinde, T. Inanc and J.M. Zurada, “On Correlation of Features Extracted by Deep Neural Networks”, Proceedings of International Conference on Neural Networks, pp. 1-8, 2019.
  • B.O. Ayinde and J.M. Zurada, “Clustering of Receptive Fields in Autoencoders”, Proceedings of International Conference on Neural Networks, pp. 1310-1317, 2016.
  • A. Rehman, K. Javed, H.A. Babri and M.N. Asim, “Selection of the Most Relevant Terms based on a Max-Min Ratio Metric for Text Classification”, Expert Systems with Applications, Vol. 114, No. 1. pp. 78-96, 2018.
  • A. Dasgupta, P. Drineas, B. Harb and V. Josifovski, “Feature Selection Methods for Text Classification”, Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230-239, 2007.
  • S. Lai, L. Xu, K. Liu and J. Zhao, “Recurrent Convolutional Neural Networks for Text Classification”, Proceedings of International Conference on Artificial Intelligence, pp. 1-14, 2015.
  • N. Kousik, S. Kallam, R. Patan and A.H. Gandomi, “Improved Salient Object Detection using Hybrid Convolution Recurrent Neural Network”, Expert Systems with Applications, Vol. 166, pp 1-20, 2020.
  • S. Zhou, Q. Chen and X. Wang, “Active Semi-Supervised Learning Method with Hybrid Deep Belief Networks”, PloS One, Vol. 9, No. 9, pp. 1-9, 2014.
  • C. Huang, W. Gong, W. Fu and D. Feng, “A Research of Speech Emotion Recognition based on Deep Belief Network and SVM”, Mathematical Problems in Engineering, Vol. 12, No. 3, pp. 1-16, 2014.
  • S.E. Kahou, C. Pal, X. Bouthillier and P. Froumenty, “Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video”, Proceedings of ACM on International Conference on Multimodal Interaction, pp. 543-550, 2013.
  • M. Liu, G. Haffari, W. Buntine and M. Ananda-Rajah, “Leveraging Linguistic Resources for Improving Neural Text Classification”, Proceedings of the Australasian Language Technology Association Workshop, pp. 34-42, 2017.
  • BBC Sports, Available at: http://mlg.ucd.ie/datasets/bbc.html.

Abstract Views: 191

PDF Views: 0




  • Improved Feature Extraction on Text Documents using Neural Network Model

Abstract Views: 191  |  PDF Views: 0

Authors

V. Kumaresan
Department of Computer and Information Science, Annamalai University, India
R. Nagarajan
Department of Computer and Information Science, Annamalai University, India

Abstract


In natural language processing, the text clustering plays a major role on reducing the text dimensionality. However, the lack of data models has made the clustering algorithm to face sparsity problems. The integration with deep learning has resolved the problem of scarce knowledge on text documents. However, deeper architectures learn such redundant features, which limit the efficiency of solutions. In this paper, a complete extraction of features from text document using neural network model. The neural network model utilizes feed forward mechanism and a type of unsupervised learning that denoises the corrupted input features. The reconstructed feature is used for initialing the feed forward network. This method reduces the manual labelling in the process of screening. For evaluation, series of experiments are conducted to investigate the performance of the method over the text datasets with various conventional algorithms.

Keywords


Text Document, Feature Extraction, Neural Network, Denoising.

References