Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Text Classification:A Concept Matrix based Approach


Affiliations
1 Dept. of Computer Science and Engineering, Anna University, Chennai 600 025, India
2 A.A.M. Eneineerine Colleee. Kovilvenni 614 403, India
     

   Subscribe/Renew Journal


Text classification is an important tasic in modem information systems. Most of the existing classification techniques are keyword and statistical oriented. Keyword alone cannot always distinguish between the relevant and irrelevant text. The meaning of a key word is context dependent. When it occurs in a phrase it can be quite different from its meaning as in individual word. In many cases the keyword-based approach does not bring out the semantic meaning of a document. In this paper a new hierarchical conceptmatrix patterned automated classification method has been designed and developed. Also this is not solely based on key words. The design involves development of an agent by adapting the ACM Computing Review classification method. The formulation of a new concept matrix consisting of phrases occurring in the title, abstract and keywords of an experimental research document is proposed. The columns of the matrix are representing the different phrases and rows are frequency of occurrences of a specified phrase in the document. For a set of documents in a corpus such row matrices are created. In order to classify a new document its new concept matrix is created and appended to the existing matrix. After this a SVD is performed over the concept matrix, decomposing the original matrix into three matrices. Then the original matrix is reconstructed using certain rows with highest convergence value of diagonal and orthonormal matrices. After the original reconstruction, the correlation between the columns represents the relativity of the document. The system is able to learn and formulate the new concept-matrix at the various level hierarchies over a period of time. This concept matrix hierarchy pattern also gives the semantic meaning. An agent is able to decide the relevance of the document with respect to its own hierarchy as well as able to add the phrases and required conceptmatrix in to the servers that are predicted under a certain category. The system is tested with a set of articles collected through Internet.
User
Subscription Login to verify subscription
Notifications
Font Size


  • Text Classification:A Concept Matrix based Approach

Abstract Views: 270  |  PDF Views: 0

Authors

R. Ponnusamy
Dept. of Computer Science and Engineering, Anna University, Chennai 600 025, India
T. V. Gopal
A.A.M. Eneineerine Colleee. Kovilvenni 614 403, India

Abstract


Text classification is an important tasic in modem information systems. Most of the existing classification techniques are keyword and statistical oriented. Keyword alone cannot always distinguish between the relevant and irrelevant text. The meaning of a key word is context dependent. When it occurs in a phrase it can be quite different from its meaning as in individual word. In many cases the keyword-based approach does not bring out the semantic meaning of a document. In this paper a new hierarchical conceptmatrix patterned automated classification method has been designed and developed. Also this is not solely based on key words. The design involves development of an agent by adapting the ACM Computing Review classification method. The formulation of a new concept matrix consisting of phrases occurring in the title, abstract and keywords of an experimental research document is proposed. The columns of the matrix are representing the different phrases and rows are frequency of occurrences of a specified phrase in the document. For a set of documents in a corpus such row matrices are created. In order to classify a new document its new concept matrix is created and appended to the existing matrix. After this a SVD is performed over the concept matrix, decomposing the original matrix into three matrices. Then the original matrix is reconstructed using certain rows with highest convergence value of diagonal and orthonormal matrices. After the original reconstruction, the correlation between the columns represents the relativity of the document. The system is able to learn and formulate the new concept-matrix at the various level hierarchies over a period of time. This concept matrix hierarchy pattern also gives the semantic meaning. An agent is able to decide the relevance of the document with respect to its own hierarchy as well as able to add the phrases and required conceptmatrix in to the servers that are predicted under a certain category. The system is tested with a set of articles collected through Internet.

References