PuDO:Development of an Ontology for Hierarchical Organization and Representation of Domains for Punjabi Words
(Natural Language Processing (NLP) is the area of research which focuses on the different tasks of understanding, extraction and retrieval from unstructured text. It makes use of multiple tools, resources and methodologies for performing these tasks. NLP applications developed depend heavily upon resources apart from tools and methodologies. Like many other Indian languages, Punjabi language also inherits a rich literature history but on technological aspects it is relatively under resourced and still a lot of work remains to be done in the field of Punjabi language processing. There are many researchers, groups and organizations which are working on the different aspects of Punjabi language processing but it does not have many NLP resources of its own, such as, annotated corpora, rich dictionaries, sentiment lexicons, conceptualized domains etc.
Our present work is an attempt to develop a controlled vocabulary of concepts or topics (domains) for Punjabi words and present it in the form of 'domains ontology', PUDO (Punjabi Domains Ontology). Ontologies capture and describe the current state of knowledge about a domain of interest, and represent it in terms of concepts and relationships in ways that computers can process efficiently and humans can understand easily. This paper presents our work which is based on identifying the concepts termed as domains for Punjabi language words which can be organized in a hierarchical manner. The hierarchy is based on relation of specificity for Punjabi language. We developed the domains ontology by starting assigning concepts as top level domains and then conceptualized lower level domains having more granular conceptualization under the higher level domains. The developed ontology is further populated with the words as instances which evoke these domains. This developed resource can further be used in different semantically based NLP tasks in Punjabi language.
Keywords
Abstract Views: 235
PDF Views: 1