Open Access Open Access  Restricted Access Subscription Access

Culling Scientific and Technical Terms from Text Corpora for Compiling a TermBank in Bangla


Affiliations
1 Linguistic Research Unit, Indian Statistical Institute, Kolkata, India
 

In this paper I describe a few steps that we adopt to develop a digital TermBank after culling the Scientific and Technical Terms (STTs) from a text corpus of Bangla. Following the stages and methods of processing and analysis of corpus we are successful to develop a TermBank which now contains nearly 10,000 terms to be used in various works of linguistics and language technology. The strategy we use can be effectively applied on corpora of other Indian languages for same purposes. This confirms its utility and relevance in NLP works for Indian languages.

Keywords

Scientific and Technical Terms, Corpus, POS Tagging, Collocation, Lemmatization, Treebank, Terminology, Frequency.
User
Notifications
Font Size

Abstract Views: 236

PDF Views: 1




  • Culling Scientific and Technical Terms from Text Corpora for Compiling a TermBank in Bangla

Abstract Views: 236  |  PDF Views: 1

Authors

Niladri Sekhar Dash
Linguistic Research Unit, Indian Statistical Institute, Kolkata, India

Abstract


In this paper I describe a few steps that we adopt to develop a digital TermBank after culling the Scientific and Technical Terms (STTs) from a text corpus of Bangla. Following the stages and methods of processing and analysis of corpus we are successful to develop a TermBank which now contains nearly 10,000 terms to be used in various works of linguistics and language technology. The strategy we use can be effectively applied on corpora of other Indian languages for same purposes. This confirms its utility and relevance in NLP works for Indian languages.

Keywords


Scientific and Technical Terms, Corpus, POS Tagging, Collocation, Lemmatization, Treebank, Terminology, Frequency.