Arindam Pal,Senior Research Scientist at Data61

AI for Legal Data Analytics: From Text to Citation Networks and Beyond

Abstract

Legal Data Analytics (LDA) is the branch of data analytics, where techniques from artificial intelligence, machine learning, and information retrieval are applied to solve problems involving legal documents, such as legal court cases and patent applications. In this talk, we will describe the landscape of legal data analytics, and state some important problems in this area, such as catchphrase extraction, document similarity, prior-case retrieval, and text summarization. Then, we will go through the important problem of computing similarity between legal court case documents. This problem has various downstream applications such as prior-case retrieval, citation prediction, and recommendation of legal articles. There are two broad approaches for the task – text-based and citation network-based. We illustrate some methods from each of these classes (like TF-IDF and dispersion), and mention their shortcomings. Then, we will discuss some advanced embedding-based methods, such as Doc2Vec, Node2Vec, Metapath2Vec, and BERT.

In the next part, we will discuss the Hier-SPCNet algorithm to compute legal document similarity, which has been proposed by us. Our method constructs a precedent citation network among case documents. It augments the network with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet, having citation links between case documents and statutes, as well as citation and hierarchy links among the statutes. Then, we apply a random walk based algorithm on this network to compute the final similarity values. We have done extensive experiments over a set of Indian Supreme Court case documents. The results show that our proposed heterogeneous network enables significantly better document similarity estimation, as compared to existing approaches. We also show that the proposed network-based method can complement text-based measures for better estimation of legal document similarity.

Bio

Arindam Pal is a Senior Research Scientist at Data61 in Commonwealth Scientific and Industrial Research Organisation (CSIRO), and a Senior Research Fellow at Cyber Security Cooperative Research Centre (CSCRC). He is also a Conjoint Senior Lecturer in the School of Computer Science and Engineering at UNSW Sydney. His research interests are in Artificial Intelligence, Cyber Security and Machine Learning. He works on business and research problems of CSIRO, and collaborates with faculty members of universities, both in Australia and abroad. He earned his PhD in Computer Science from Indian Institute of Technology Delhi. He has over 13 years of industrial research experience in software companies like Microsoft, Yahoo!, and Novell. He has published academic papers in reputed conferences and journals, and filed patents in various countries like India, USA, and Europe. He is a technical program member for several reputed conferences and technical reviewer for many renowned journals. He is a Senior Member of both ACM and IEEE.