Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Classification, Information Extraction and Similarity Analysis of Indian Legal Cases


Affiliations
1 Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India
     

   Subscribe/Renew Journal


Computer technology can be useful in facilitating legal analysis in the law system. Lakhs of case files pertaining to Indian High Courts, over the past decade, are available in digital form. The problem faced by Indian lawyers and legal personnel is that they have to go through the routine of identifying the type of document and comparing relevant or similar cases. Currently, research has been done around the world to automate the process of text classification and information extraction of legal cases. Fully automatic or semi-automatic systems that carry out semantic text analysis are far less common. However, as per our knowledge, no research has been done to automate the tedium of document review process in India. In this paper, we aim to provide a hybrid approach by combining clustering and classification techniques and develop a system to automate this process in India, using Natural Language Processing and Machine Learning.

Keywords

Classifiers, Clusters, Features, Law System, Legal Cases, Regular Expressions, Similarity Analysis.
User
Subscription Login to verify subscription
Notifications
Font Size

  • “Pending cases go down in Supreme Court, High Courts; but see upward swing in lower courts,” The Indian Express, 01 October 2017. Available: www.indianexpress.com/article/india/pending-cases-go-down-in-supreme-court-high-courts-but-see-upward-swing-in-lower-courts-4869471/
  • “Indian Kanoon - Search engine for Indian Law.” Available: www.indiankanoon.org
  • J. Wang, and D. Nackoul, “Classification and similarity analysis of legal cases,” 6.863 Final Project, Spring 2010.
  • P. Thompson, “Automatic categorization of case law,” Proceedings of the 8th International Conference on Artificial Intelligence and Law (ICAIL’01), pp. 70-77, St. Louis, Missouri, USA, 2001.
  • A. Basarkar, “Document classification using machine learning,” Master’s Theses and Graduate Research, San Jose State University, Spring 2017.
  • M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” Technical Report #00-034, Department of Computer Science and Engineering, University of Minnesota, 2000.
  • M. Chinea-Rios, G. Sanchis-Trilles, and F. Casacuberta, “Sentence clustering using continuous vector space representation,” Pattern Recognition and Human Language Technologies Center, Universitat Polit`ecnica de Val`encia. (n.d.).
  • U. Singh, and S. Hasan, “Survey paper on document classification and classifiers,” International Journal of Computer Science Trends and Technology (IJCST), vol. 3, no. 2, pp. 83-87, March-April 2015.
  • J. G. Conrad, K. Al-Kofahi, Y. Zhao, and G. Karypis, “Effective document clustering for large heterogeneous law firm collections,” in ICAIL’05, ACM, Bologna, Italy, 6-11 June 2005.
  • A. Basu, C. Walters, and M. Shepherd, “Support vector machines for text categorization,” in Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS’03), IEEE, 6-9 January 2003.
  • O.-M. Sulea, M. Zampieri, S. Malmasi, M. Vela, L. P. Dinu, and J. van Genabith, “Exploring the use of text classification in the legal domain,” in 2ND Workshop on Automated Semantic Analysis of Information in Legal Text (ASAIL’2017), London, United Kingdom, June 2017.
  • L. D. Silvestro, D. Spampinato, and A. Torrisi, “Automatic classification of legal textual documents using C4.5,” 2009. Available: http://www.ittig.cnr.it/
  • G. Boella, L. D. Caro, and L. Humphreys, “Using classification to support legal knowledge engineers in the Eunomos legal document management system.” (n.d.). Available: http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.592.5468&rep=rep1&type=pdf.
  • J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 2011.
  • P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Pearson Addison Wesley, 2005.
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, 2008.
  • Scikit-Learn: Machine Learning in Python - Scikit-Learn 0.16.1 Documentation. Available: www.scikit-learn.org/.
  • Python Programming Tutorials. Available: www.pythonprogramming.net/lemmatizing-nltk-tutorial/
  • Distances between Clustering, Hierarchical Clustering, 36-350, Data Mining. Available: http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf
  • A. Ratner, “Leveraging document structure for better classification of complex legal documents,” Stanford University, 353 Serra Mall, Palo Alto, CA. (n.d.).

Abstract Views: 237

PDF Views: 0




  • Classification, Information Extraction and Similarity Analysis of Indian Legal Cases

Abstract Views: 237  |  PDF Views: 0

Authors

Aksheya Rajamani
Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India
Peeyusha Rathi
Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India
Richa Nagda
Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India
Utkarsha Nerkar
Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India
Mahesh Shirole
Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai, Maharashtra, India

Abstract


Computer technology can be useful in facilitating legal analysis in the law system. Lakhs of case files pertaining to Indian High Courts, over the past decade, are available in digital form. The problem faced by Indian lawyers and legal personnel is that they have to go through the routine of identifying the type of document and comparing relevant or similar cases. Currently, research has been done around the world to automate the process of text classification and information extraction of legal cases. Fully automatic or semi-automatic systems that carry out semantic text analysis are far less common. However, as per our knowledge, no research has been done to automate the tedium of document review process in India. In this paper, we aim to provide a hybrid approach by combining clustering and classification techniques and develop a system to automate this process in India, using Natural Language Processing and Machine Learning.

Keywords


Classifiers, Clusters, Features, Law System, Legal Cases, Regular Expressions, Similarity Analysis.

References