Author Details

Text Semantic Similarity can be viewed as one of the challenging tasks as evident from current profound interest in NLP research community that has created achievable milestones through active participation in SemEval task series of the recent decade. Amidst these developments, it was realized that exploring text to compare its semantics largely depends on valid grammatical structures of sentences and sentence formulation types. In this paper, the computation of text semantic similarity is addressed by devising a novel set of generic similarity metrics based on both, word-sense of the phrases constituting the text as well as the grammatical layout and sequencing of these word-phrases forming text with sensible meaning. We have used the combination of word-sense and grammatical similarity metrics over benchmark sentential datasets. Having obtained highest value of Pearson’s correlation coefficient (0.89) with mean human similarity scores, when compared against equivalent scores obtained through closely competent structured approach models, plagiarism-detection classification task was revisited on well-known paragraph-phrased Rewrite corpus articulated by Clough and Stevenson (2011) using our model to provide generic utility perspective to these novel devised similarity metrics. Here also, nearly competent classification model performance (with accuracy 76.8%) encouraged authors to work in directions that are more promising where the performance can be enhanced by improving upon dependency (grammatical relations) component in order to raise the count of true-positives and false-negatives.

Keywords

Structural Features, Word-Sense Similarity, Grammatical Similarity, Generic Similarity Metrics, Wikipedia Rewrite Corpus.

Full Text

References

R. Mihalcea, C. Corley and C. Strapparava, “Corpus-Based and Knowledge Based Measures of Text Semantic Similarity”, Proceedings of American Association for Artificial Intelligence, pp. 775-780, 2006.

Y. Li, D. McLean, Z.A. Bandar, J.D. O’Shea and K. Crockett, “Sentence Similarity based on Semantic Nets and Corpus Statistics”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 8, pp. 1138-1145, 2006.

A. Islam and D. Inkpen, “Semantic Text Similarity using Corpus based Word Similarity and String Similarity”, ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 2, pp. 1-10, 2008.

M.C. Lee, “A Novel Sentence Similarity Measure for Semantic based Expert Systems”, Expert Systems with Applications, Vol. 38, No. 5, pp. 6392-6399, 2011.

D. Gupta, “Detection of Idea Plagiarism using Syntax - Semantic Concept Extractions with Genetic Algorithm”, Expert Systems with Applications, Vol. 73, No. 3, pp. 11-26 ,2017.

S. Ozates., A. Ozgur and D. Radev, “Sentence Similarity based on Dependency Tree Kernels for Multi-document Summarization”, Proceedings of International Conference on Language Resources and Evaluation, pp. 2833-2838, 2016.

P. Zhang, X. Huang, L. Zhang, “Information Mining and Similarity Computation for Semi- Un-Structured Sentences from the Social Data”, IEEE Internet of Things, Vol. 34, No. 2, pp. 2352-8648 ,2020.

S. Alzahrani, M. Salmon and A. Abraham, “An Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods”, IEEE Transactions on Systems, Man, and Cybernetics Part C: Application and Reviews, Vol. 42, No. 2 pp. 133-149,2012.

Ercan Canhasi, “Measuring the Sentence Level Similarity”, Master Thesis, Department of Computer Science, University of Prizren, pp. 1-42, 2013.

S. Alzahrani, N. Salim, and V. Palade, “Uncovering Highly Obfuscated Plagiarism Cases using Fuzzy Semantic-Based Similarity Model”, Journal of King Saud University - Computer and Information Sciences, Vol. 27, pp. 248-268, 2015.

A. Pawar and V. Mago, “Calculating the Similarity between Words and Sentences using a Lexical Database and Corpus Statistics”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18 pp. 1-14, 2018.

K. Vani and D. Gupta, “A Study on Extrinsic Text Plagiarism Detection Techniques and Tools”, Journal of Engineering Science and Technology, Vol. 9, No. 4, pp. 150-164. 2013.

S. Alzahrani and N. Salim, “Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection”, Proceedings of International Conference and Workshop on Multilingual and Multimodal Information Systems, pp. 145-155, 2010.

M. Potthast, B. Stein, A. Eiselt, A. Barron-Cedeno and P. Rosso, “Overview of the 1st International Competition on Plagiarism Detection.”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-69, 2009.

M. Potthast, B. Stein, A. Eiselt, A. Barron-Cedeno and P. Rosso, “Overview of the 2nd International Competition on Plagiarism Detection”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-71, 2010.

M. Potthast, B. Stein, A. Eiselt, A. Barron-Cedeno and P. Rosso, “Overview of the 3rd International Competition on Plagiarism Detection”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-78, 2011.

M. Potthast, T. Gollub, M. Hagen, J. Grabegger, J. Kiesel, M. Michel, A. Barron-Cedeno and P. Rosso, “Overview of the 4th International Competition on Plagiarism Detection”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-68, 2012.

M. Potthast, T. Gollub, M. Hagen, M. Tippmann, J. Kiesel, P. Rosso, E. Stamatatos and B. Stein, “Overview of the 5th International Competition on Plagiarism Detection”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-58, 2013.

M. Potthast, M. Hagen, B. Anna, B. Matthias, Martin Tippmann, Rosso Paolo and Stein Benno, “Overview of the 6th International Competition on Plagiarism Detection”, Proceedings of International Conference on Spanish Society for Natural Language Processing, pp. 1-66,2014.

Gaizauskas J. Foster and Y. Wilks., “The METER Corpus: A Corpus for Analyzing Journalistic Text Reuse”, Proceedings of International Conference on Corpus Linguistics, pp. 214-223, 2001.

Brown Corpus Information, Available at http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/ brown.html, Accessed at 2005.

P. Clough and M. Stevenson, “Developing a Corpus of Plagiarized Short Answers”, Language Resources and Evaluation: Special Issue on Plagiarism and Authorship Analysis, Vol. 45, No. 1, pp. 5-24, 2011.

S. Burrows, M. Potthast, B. Stein and A. Eiselt, “Webis Crowd Paraphrase Corpus 2011”, Available at https://webis.de/data/webis-cpc-11.html, Accessed at 2013.

B. Dolan, C. Quir and C. Brockett, “Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources”, Proceedings of International Conference on Computational Linguistics, pp. 350-355, 2004.

B. Pang, K. Knight and D. Marcu, “Syntax-Based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences”, Proceedings of International Conference on Human Language Technology, pp. 181-188, 2003.

P. Resnik, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy”, Proceedings of International Joint Conference on Artificial Intelligence, pp. 448-453,1995.

D. Lin, “An Information-Theoretic Definition of Similarity”, Proceedings of International Conference on Machine Learning, pp. 296-304, 1998.

J.J. Jiang and D. W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy”, Proceedings of International Conference on Research in Computational Linguistics, pp. 19-33, 1997.

C. Leacock, M. Chodorow and A. Miller George, “Combining Local Context and WordNet Sense Similarity for Word Sense Identification”, MIT Press, 1998.

Z. Wu and M. Palmer “Verb Semantics and Lexical Selection”, Proceedings of Annual Meeting of the Association for Computational Linguistics, pp. 133-138,1994.

M. Honnibal, ‘‘Spacy (version 1.3.0)”, Available at https://spacy.io/, Accessed at 2016.

Bangla Handwritten Character Recognition Using Convolution Neural Network

Abstract Views :188 | PDF Views:1

Authors

Shankha De ¹, Arpana Rawal ¹

Affiliations
1 Department of Computer Science and Engineering, Bhilai Institute of Technology, IN

Source

ICTACT Journal on Soft Computing, Vol 12, No 2 (2022), Pagination: 2545-2550

Abstract

Since, last one-decade, numerous deep learning models have been designed to resolve handwritten character recognition task in languages, namely, English, Chinese, Arabic, Japanese and Russian. Recognition of Bengali handwritten character from document image datasets is undoubtedly an open challenging task. Due to the advancement of neural network, many models have been developed and it is improving performance. The LeNet is a pioneering work in the field handwritten document image recognition specially hand written digits from the images by using CNN. This paper focuses on designing a convolution neural network with refinements on layers and its parameter tuning for Bengali character recognition system for classification of 50 different fonts. Our revised CNN model outperforms on some existing approach and shows font-recognition accuracy of 98.46%.

Keywords

Convolution Neural Network, Handwritten Character, LeNet

Full Text

References

V. Patil and S. Shimpi, “Handwritten English Character Recognition Using Neural Network”, Elixir International Journal: Computer Science and Engineering, Vol. 41, pp. 5587-5591, 2011.

Y.C. Wong, L.J. Choi, R.S.S. Singh, H. Zhang and A.R. Syafeeza, “Deep Learning-Based Racing Bib Number Detection and Recognition”, Jordanian Journal of Computers and Information Technology, Vol. 5, No. 3, pp. 181-194, 2019.

G. Abandah, M. Khedher and K. Younis, “Evaluating and Selecting Features for Recognizing Handwritten Arabic Characters”, Technical Report, Department of Computer Engineering, The University of Jordan, pp. 1-156, 2007.

Idris Alsheikh, Mohd Masnizah and Warlina Lia, “A Review of Arabic Text Recognition Dataset”, Asia-Pacific Journal of Information Technology and Multimedia, Vol. 9, pp. 6981, 2020.

Khaled S. Younis, “Arabic Hand-Written Character Recognition Based on Deep Convolutional Neural Networks”, Jordanian Journal of Computers and Information Technology, Vol. 3, No. 3, pp. 1-13, 2017.

Ahmad Hasasneh, “Arabic Sign Language Characters Recognition Based on A Deep Learning Approach and a Simple Linear Classifier”, Journal of Computers and Information Technology, Vol. 6, No. 3, pp. 1-15, 2020.

F. Yin, Q.F. Wang, X.Y. Zhang and C.L. Liu, “Chinese Handwriting Recognition Competition”, Proceedings of International Conference on Document Analysis and Recognition, pp. 1464-1470, 2013.

B. Zhu, X.D. Zhou, C.L. Liu and M. Nakagawa, “A Robust Model for On-Line Handwritten Japanese Text Recognition”, International Journal on Document Analysis and Recognition, Vol. 13, No. 2, pp. 121-131, 2010.

Mayur Bhargab Bora, Dinthisrang Daimary, Khwairakpam Amitab and Debdatta Kandar, “Handwritten Character Recognition from Images using CNN-ECOC”, Procedia Computer Science, Vol. 167, pp. 2403-2409, 2020

N. Das, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri and D.K. Basu, “A Genetic Algorithm Based Region Sampling for Selection of Local Features in Handwritten Digit Recognition Application”, Applied Soft Computing, Vol. 12, pp. 1592-1606, 2012.

N. Das, J.M. Reddy, R. Sarkar, S. Basu, M. Kundu, M. Nasipuri and D.K. Basu, “A Statistical-Topological Feature Combination for Recognition of Handwritten Numerals”, Applied Soft Computing, Vol. 12, pp. 2486-2495, 2012.

N. Das, K. Acharya, R. Sarkar, S. Basu, M. Kundu and M. Nasipuri, “A Novel GA-SVM Based Multistage Approach for Recognition of Handwritten Bangla Compound Characters”, Proceedings of the International Conference on Information Systems Design and Intelligent Applications, pp. 145-152, 2012.

N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri and D.K. Basu, “Handwritten Bangla Compound Character Recognition: Potential Challenges and Probable Solution”, Proceedings of the International Conference on Artificial Intelligence, pp. 1901-1913, 2009.

N. Das, S. Basu, R. Sarkar, M. Kundu, M. Nasipuri and D.K. Basu, “An Improved Feature Descriptor for Recognition of Handwritten Bangla Alphabet”, Proceedings of the International Conference on Signal and Image Processing, pp. 451-454, 2009.

N. Das, K. Acharya, R. Sarkar, S. Basu, M. Kundu and M. Nasipuri, “A Benchmark Data Base of Isolated Bangla Handwritten Compound Characters”, International Journal on Document Analysis and Recognition, Vol. 17, pp. 413431, 2014.

B.B. Chaudhuri and U. Pal, “A Complete Printed Bangla OCR System”, Pattern Recognition, Vol. 31, pp. 531-549, 1998.

U. Pal, A. Belad and C.H. Choisy “Touching Numeral Segmentation using Water Reservoir Concept”, Pattern Recognition Letters, Vol. 24, pp. 261-272, 2003.

T.K. Bhowmik, U. Bhattacharya and S.K. Parui, “Recognition of Bangla Handwritten Characters using An MLP Classifier Based on Stroke Features”, Proceedings of the International Conference on Neural Information, pp. 814-819, 2004.

S. Basu, N. Das, R. Sarkar and D.K. Basu, “Handwritten Bangla Alphabet Recognition using An MLP Based Classifier”, Proceedings of the International Conference on Artificial Intelligence, pp. 1-8, 2012.

M.D. Rahman and P.C. Shill, “Bangla Handwritten Character Recognition using Convolutional Neural Network”, International Journal on Image, Graphics and Signal Processing, Vol. 8, pp. 42-49, 2015.

A.K.M.S. Rabby, S. Azad, M.S. Haque and S.A. Hossain, “Bornonet: Bangla Handwritten Characters Recognition using Convolutional Neural Network”, Procedia Computer Science, Vol. 143, pp. 528-535, 2018.

A.K.M.S. Rabby, S. Azad, M.S. Haque and S.A. Hossain, “Ekushnet: Using Convolutional Neural Network for Bangla Handwritten Recognition”, Procedia Computer Science, Vol. 143, pp. 603-610, 2018.

M.A.I. Rizvi, K. Deb, M.I. Khan, M.M.S. Kowsar and T. Khanam, “A Comparative Study on Handwritten Bangla Character Recognition”, Turkish Journal of Electrical Engineering and Computer Sciences, Vol. 27, No. 4, pp. 3195-3207, 2019.

S. Chatterjee, R.K. Dutta and S. Roy, “Bengali Handwritten Character Classification using Transfer Learning on Deep Convolutional Network”, Proceedings of the International Conference on Intelligent Human Computer Interaction, pp. 138-148, 2019.

I. Khandokar, M. Hasan, F. Ernawan, S. Islam and M.N. Kabir, “Handwritten Character Recognition using Convolutional Neural Network”, Journal of Physics: Conference Series, Vol. 1918, No. 4, pp. 593-611, 2021.

A. Sayeed, J. Shin and M.M. Hasan, “BengaliNet: A LowCost Novel Convolutional Neural Network for Bengali Handwritten Characters Recognition”, Applied Science, Vol. 11, pp. 6845-6854, 2021.

Alejandro Baldominos, Yago Saez and Pedro Isasi, “Evolutionary Convolutional Neural Networks: an

application to Handwriting Recognition”, Neurocomputing, Vol. 49, pp. 1-16, 2017.

S.L. Chooi and Aimi Syamimi Binti A.B. Ghafar, “Handwritten Character Recognition using Convolutional Neural Network”, Progress in Engineering Application and Technology, Vol. 2, No. 1, pp. 593-611, 2021.

Username
Password
Remember me

Informatics Publishing Limited

Author Details

Rawal, Arpana

Generic Approach of Measuring Text Semantic Similarity

Authors

Source

Abstract

Keywords

Full Text

References

Bangla Handwritten Character Recognition Using Convolution Neural Network

Authors

Source

Abstract

Keywords

Full Text

References