Open Access
Subscription Access
Event Extraction from social media Text in Malayalam using Neural Conditional Random Fields
This paper describes a Neural Conditional Random Fields (NCRF) approach for Event extraction (EE) task which aims to discover different types of events along with the event arguments from the user generated text content (tweets) in Malayalam. The data for this work was obtained from FIRE (Forum for Information Retrieval and Evaluation) 2017 shared task [12] on Event Extraction from Newswires and Social Media Text in Indian Languages. A NCRF is a combination of Recurrent Neural Network (RNN) and Conditional Random Fields (CRF). In addition to event detection, the system also extracts the event arguments which contain the information related to the events such as when (Time), where (Place), Reason, Casualty, Aftereffect etc). Our proposed Event Extraction system achieves F-score of 79.74%. The results are encouraging and comparable with the state-of-art.
Keywords
Event Extraction, Social Media Text, Indian Languages, Malayalam, Neural Conditional Random Fields (NCRF).
User
Font Size
Information
- Banko M, Cafarella MJ, Soderland S. (2007). Open information extraction for the web. IJCAI 2007; 7:2670–2676.
- Collobert R, Weston J, Bottou L,. (2011) Natural language processing (almost) from scratch. The Journal of Machine Learning Research 2011; 12:2493–2537
- Mark Dredze, Tim Oates, and Christine Piatko, (2010). “We’re not in Kansas anymore: detecting domain changes in streams”. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp 585–595. Association for Computational Linguistics (ACL).
- Erhan D, Bengio Y, Courville A. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research 2010; 11:625–660 [6] Dr. Moh. Osama K., “HELLO Flood Counter Measure for Wireless Sensor Network,” International Journal of Computer Science and Security, vol. 2 issue 3, 2007, pp-57-64.
- Hege Fromreide, Dirk Hovy, and Anders Søgaard, (2014). “Crowdsourcing and annotating NER for twitter#drift”. European language resources distribution agency
- Hinton G, Osindero S, Teh Y-W. (2006). A fast learning algorithm for deep belief nets. Neural computation 2006; 18:1527–1554
- Krizhevsky A, Sutskever I, Hinton GE. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012; 1097– 1105
- John Lafferty, Andrew McCallum and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, USA.pp.282-289
- Lamblin P, Bengio Y. (2010). Important gains from supervised fine-tuning of deep architectures on large labeled sets. NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop 2010
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013). Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.
- Pattabhi R K Rao T, Vijay Sundar Ram R, Vijayakrishna R and Sobha L. (2007). 'A Text Chunker and Hybrid POS Tagger for Indian Languages'. In the Proceedings of IJCAI Workshop on Shallow Parsing for South Asian Languages, Hyderabad. pp. 9-12.
- Pattabhi RK Rao and Sobha Lalitha Devi. (2017). 'EventXtract-IL: Event Extraction from Newswires and Social Media Text in Indian Languages@ FIRE 2017 - An Overview', In the Forum for Information Retrieval and Evaluation-2017.
- Salakhutdinov R, Mnih A, Hinton G. (2007). Restricted Boltzmann Machines for Collaborative Filtering. Proceedings of the 24th International Conference on Machine Learning 2007; 791–798
- Socher R, Lin CC, Manning C. (2011) Parsing natural scenes and natural language with recursive neural networks. Proceedings of the 28th international conference on machine learning (ICML-11) 2011; 129–136
- Tang B, Wu Y, Jiang M. (2013) Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model. Working Notes for CLEF 2013 Conference 2013; 1179
- Jie Yang and Yue Zhang. (2018). NCRF++: An Open-source Neural Sequence Labeling Toolkit. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, pages 74–79 Melbourne, Australia, July 15 - 20, 2018
- Uzuner è„°zlem, South BR, Shen S. (2011) 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association 2011; 18:552–556.
Abstract Views: 140
PDF Views: 0