Open Access
Subscription Access
Culling Scientific and Technical Terms from Text Corpora for Compiling a TermBank in Bangla
In this paper I describe a few steps that we adopt to develop a digital TermBank after culling the Scientific and Technical Terms (STTs) from a text corpus of Bangla. Following the stages and methods of processing and analysis of corpus we are successful to develop a TermBank which now contains nearly 10,000 terms to be used in various works of linguistics and language technology. The strategy we use can be effectively applied on corpora of other Indian languages for same purposes. This confirms its utility and relevance in NLP works for Indian languages.
Keywords
Scientific and Technical Terms, Corpus, POS Tagging, Collocation, Lemmatization, Treebank, Terminology, Frequency.
User
Font Size
Information
Abstract Views: 236
PDF Views: 1