Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Text Classification:A Concept Matrix based Approach


Affiliations
1 Dept. of Computer Science and Engineering, Anna University, Chennai 600 025, India
2 A.A.M. Eneineerine Colleee. Kovilvenni 614 403, India
     

   Subscribe/Renew Journal


Text classification is an important tasic in modem information systems. Most of the existing classification techniques are keyword and statistical oriented. Keyword alone cannot always distinguish between the relevant and irrelevant text. The meaning of a key word is context dependent. When it occurs in a phrase it can be quite different from its meaning as in individual word. In many cases the keyword-based approach does not bring out the semantic meaning of a document. In this paper a new hierarchical conceptmatrix patterned automated classification method has been designed and developed. Also this is not solely based on key words. The design involves development of an agent by adapting the ACM Computing Review classification method. The formulation of a new concept matrix consisting of phrases occurring in the title, abstract and keywords of an experimental research document is proposed. The columns of the matrix are representing the different phrases and rows are frequency of occurrences of a specified phrase in the document. For a set of documents in a corpus such row matrices are created. In order to classify a new document its new concept matrix is created and appended to the existing matrix. After this a SVD is performed over the concept matrix, decomposing the original matrix into three matrices. Then the original matrix is reconstructed using certain rows with highest convergence value of diagonal and orthonormal matrices. After the original reconstruction, the correlation between the columns represents the relativity of the document. The system is able to learn and formulate the new concept-matrix at the various level hierarchies over a period of time. This concept matrix hierarchy pattern also gives the semantic meaning. An agent is able to decide the relevance of the document with respect to its own hierarchy as well as able to add the phrases and required conceptmatrix in to the servers that are predicted under a certain category. The system is tested with a set of articles collected through Internet.
User
Subscription Login to verify subscription
Notifications
Font Size

  • Yihua Liao and V.Rao Vemuri. Using Text Categorization Techniques for Intrusion Detection, Dept. of Computer Science, University of California, Davis. http://seclab.cs.ucdavis.edii/ papers/Liao-Vemuri.pdf
  • Guowei Zu; Watau Ohyama; Tetsushi and Fumitaka Kimura. Accuracy Improvement of Automatic Text Classification. Based on Feature Transfermation, DoEng '03, November 2022, 2003, France, 2003, ACM 1-58113-724-9/03/0011,118-120.
  • Leah, S. Larkey and W. Bruce Croft. Combining Classifiers in Text Categorization, SIGIR'96, Zurich, Switzerland, 1996, ACM 0-89791-792-8/96/08.
  • L. Douglas Baker and Andrew Kachites McCallum. Distributed Clustering of Words for Text Classification, SIGIR '98, Melbourne, Australia, 1998, ACM 1-58113-015-58/98
  • David D. Lewis. Evaluating and Optimizing Autonomous Text Classification Systems, SIGIR '95, Seattle WA, USA, 1995, ACM 0-89791-714-6/95/07.
  • Ellen Riloff. Using Cases to Represent Context for Text Classification, CIKM'93, Nov 1993, DC, USA, 1993, ACM 0-89791-626-3/93/0011.
  • Kama! Nigam; John Lafferty and Andrew McCallum. Using Maximum Entropy for Text C/assi/icafio«. http://www.kamalnigam.com/papers/maxent-ijcaiws99.pdf
  • Andrew McCallum and Kamal Nigam. Text Classification by Bootstrapping with Keywords EM and Shrinkage, http://www.kamalkamalnigam.com/papers/keywordcat-clws99.pdf
  • HisnChun Chen. High-performance Digital Library Classification Systems. Information Retrieval to Knowledge Management, http://ai.bpa.arizona.edu/go/dl/
  • Dimitris Meretakis; Dimitris Fragoudis; Hongjun Lu and Spiros Likothanassis. Scalable Association-Based Text Classification, CIKM2000, McLean, VA USA, ACM 2000 1-58113-320-0/00/11.
  • David Camach; Cesar Hernandez and Jose, M. Molina. Information Classification using Fuzzy Knowledge Based Agents, Proc. IEEE Systems, Man and Cybernetics Conference, 2001.
  • Chandra Chekuri and Prabhakar Raghavan. Web Search Using Automatic Classification. http://theory.stanford.edu/people/wass/publications/Web_Search/Web_Search.html
  • George Forman. An Extensive Empirical Study of Feature Selection Matrices for Text Classification. Journal of Machine Learning Research, 3; 2003; 1289-1305.
  • Chintan Patel; Kaustubh; Yugyung Lee and E.K. Park. OntoKhoJ: A Semantic Web Portal for Ontology Searching, Ranking and Classification. http://c.students.umkc.edu/copdk4/Papers/OntoKhoj-Submission.pdf
  • Anne Kao; Lesley Quach; Steve Poteet and Steve Woods. User Assisted Text Classification and Knowledge Management, CIKM'03, Nov. 3-8, 2003, Louisiana, USA, 2003 ACM 158113-723-0/03/0011.
  • Dou Shen; Zheng Chen; Qiang Yang; Hua-Jun Zengm Benyu Zhang; Yuchang Lu and Wei-Ying Ma. Web Page Classification through Summarization, SIGIR '04, July 25-29,2004, South Yorkshire, UK, 2004 ACM 1-58113-881-4/04/007
  • Giuseppe Attardi; Antonio Gulli and Fabrizio Sebastiani. Automatic Web Page Categorization by Link and Content Analysis. http://www.math.unipd.it/~fabseb60/Publications/THAI99.pdf
  • Madhusudhan Kongovi; Juan Carlos Guzman and Venu Dasigi. Text Categorization: An Experiment Using Phrases, 24th BCS-IRSG European Colloquium on IR Research Glasgow, HK, March 25-27, LNCS 2002, Vol. 2291
  • Thorsten Joachims. Learning to Classify Text Using Support Vector Machines, Dissertation, Universitat DortMund, Fachbereich Informatik, Feb. 2001.
  • Gerard Salton. Automatic Text Processing: The Transformation, Analysis and Retrival of Information by Computer, Addison-Wesley Publishing Company Inc. 1989
  • A. Neelamagam. Classification in the Digital Environment, Information Studies. Vol.8; No.l; January 2002; p 1-7.
  • A. Neelamagam, Hierarchy, Hierarchical Relation and Hierarchical Arrangement. Information Studies. Vol.8; No.l; January 2002; p 9-22.
  • A. Neelamagam and K.N.Prasad. Digitized schemes for subject classification and Thesauri: Complementary Roles. Information Studies. Vol.8(l); January 2002; p 25-55.
  • Shian-Hua Lin; Meng Chang Chan; Jan-Ming Ho and Yueh-Ming Huang. ACIRD: Intelligent Internet Document Organization and Retrieval. IEEE Trans. On Knowledge and Data Engineering. Vol. 14(3); May/June 2002; p 599- 613.
  • Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval, Person Education (Sg.) Pte. Ltd., India, 2004.
  • Darrell Laham. Latent Semantics Analysis Approaches to Categorization. http://lsa.colorado.edu/
  • Roel Popping. Computer - assisted Text Analysis, SAGE Publications, London, First Ed.2000.
  • R. Guha; R. McCool and R. Fikes. Context for the Semantic Web. http://tap.stanford.edu/tap/ papers.html.
  • Stefan Decker; Frank van Harmelen; Jeen Broekstra; Michel Erdmann; Dieter Fensel; Ian Horrocks; Michel Klein and Sergey Melnik. The Semantic Web- on the respective roles of XML &RDF. http://www.ontoknowledge.org/oil/downl/IEEEOO.pdf
  • William, B. Frakes and Ricardo Baeza-Yates. Information Retrieval: Data Structures and Algorithms, Prentice Hall, 1992.
  • Klaus Krippendorff, Content Analysis: An Introduction to Methodology, SAGE Publications, 2004.
  • Behnak Yalaghian; Mark Chignell. Re-ranking Search results using Network Analysis: A Case Study with google, http://www.acm.org
  • ACM Computing Classification System, http://www.acm.org/class/1998/ 34. George Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Chssificanon. Journal of Machine Learning Research. .3; 2003; p 1289-1305.

Abstract Views: 137

PDF Views: 0




  • Text Classification:A Concept Matrix based Approach

Abstract Views: 137  |  PDF Views: 0

Authors

R. Ponnusamy
Dept. of Computer Science and Engineering, Anna University, Chennai 600 025, India
T. V. Gopal
A.A.M. Eneineerine Colleee. Kovilvenni 614 403, India

Abstract


Text classification is an important tasic in modem information systems. Most of the existing classification techniques are keyword and statistical oriented. Keyword alone cannot always distinguish between the relevant and irrelevant text. The meaning of a key word is context dependent. When it occurs in a phrase it can be quite different from its meaning as in individual word. In many cases the keyword-based approach does not bring out the semantic meaning of a document. In this paper a new hierarchical conceptmatrix patterned automated classification method has been designed and developed. Also this is not solely based on key words. The design involves development of an agent by adapting the ACM Computing Review classification method. The formulation of a new concept matrix consisting of phrases occurring in the title, abstract and keywords of an experimental research document is proposed. The columns of the matrix are representing the different phrases and rows are frequency of occurrences of a specified phrase in the document. For a set of documents in a corpus such row matrices are created. In order to classify a new document its new concept matrix is created and appended to the existing matrix. After this a SVD is performed over the concept matrix, decomposing the original matrix into three matrices. Then the original matrix is reconstructed using certain rows with highest convergence value of diagonal and orthonormal matrices. After the original reconstruction, the correlation between the columns represents the relativity of the document. The system is able to learn and formulate the new concept-matrix at the various level hierarchies over a period of time. This concept matrix hierarchy pattern also gives the semantic meaning. An agent is able to decide the relevance of the document with respect to its own hierarchy as well as able to add the phrases and required conceptmatrix in to the servers that are predicted under a certain category. The system is tested with a set of articles collected through Internet.

References