Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Semi-Automatic Domain Ontology Construction for Tamil Documents


Affiliations
1 College of Engineering, Guindy, India
2 Anna University, Chennai, India
     

   Subscribe/Renew Journal


Ontology is an explicit specification of a conceptualization. That is, ontology is a description of the concepts and relationships that can exist for an agent or a community of agents. Ontology construction is a challenging task and in this paper a new technique is employed for the semi-automatic construction of ontology. It involves two modules. They are ontological word selection and semantic relationship extraction. Ontological nodes and semantically related words are selected from tamil text corpus. The input to the system is the tamil text documents. Each and every tamil text document is word segmented and then morphologically analyzed to find out the parts of speech. This is because, ontological words are supposed to be nouns. The confinement of the noun list is performed using TF-IDF technique. Semantically related words are identified based on the notion of serial clustering of words in text and by exploring the value of such clustering as an indicator of a word’s bearing content. This approach is flexible in the sense that is it is sensitive to context. A term is assessed as content bearing within one collection, but not another. In this way, a domain ontology is constructed semi-automatically for tamil text documents.

Keywords

Ontology, Semi-Automatic Ontology, Semantic Relationship Extraction, Content Bearing Words, TF-IDF, Morphological Analysis and Clustering.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 186

PDF Views: 4




  • Semi-Automatic Domain Ontology Construction for Tamil Documents

Abstract Views: 186  |  PDF Views: 4

Authors

M. S. Girija
College of Engineering, Guindy, India
T. Mala
Anna University, Chennai, India
T. V. Geetha
Anna University, Chennai, India

Abstract


Ontology is an explicit specification of a conceptualization. That is, ontology is a description of the concepts and relationships that can exist for an agent or a community of agents. Ontology construction is a challenging task and in this paper a new technique is employed for the semi-automatic construction of ontology. It involves two modules. They are ontological word selection and semantic relationship extraction. Ontological nodes and semantically related words are selected from tamil text corpus. The input to the system is the tamil text documents. Each and every tamil text document is word segmented and then morphologically analyzed to find out the parts of speech. This is because, ontological words are supposed to be nouns. The confinement of the noun list is performed using TF-IDF technique. Semantically related words are identified based on the notion of serial clustering of words in text and by exploring the value of such clustering as an indicator of a word’s bearing content. This approach is flexible in the sense that is it is sensitive to context. A term is assessed as content bearing within one collection, but not another. In this way, a domain ontology is constructed semi-automatically for tamil text documents.

Keywords


Ontology, Semi-Automatic Ontology, Semantic Relationship Extraction, Content Bearing Words, TF-IDF, Morphological Analysis and Clustering.