A Comparative Analysis and Evaluation of Natural Language Processing Document Embedding Techniques on Philippine Supreme Court Case Decisions

  • Lorenz Timothy Barco Ranera Mathematical and Computing Sciences Unit, Department of Physical Sciences and Mathematics College of Arts and Sciences, University of the Philippines Manila


This study explores the application of Natural Language Processing in Philippine law to expedite legal research. It focuses on three document embedding techniques: Doc2Vec, TF-IDF, and OpenAI embedding (text-embedding-ada-002), using a dataset of Philippine Supreme Court Case Decisions from 2015 to 2020 (4,400 case decisions). The objective is to uncover and evaluate semantic relationships between case decisions. Importantly, this paper proposes two evaluation measures, “similarity classification” and “similarity comparison,” to evaluate the four embedding models and determine how these captured the semantic similarity relationship between cases. The results show that embedding models performed high accuracy scores in "similarity classification," but performed relatively poorer in the second metric "similarity comparison" with low to moderate accuracy. The best performing model is Doc2Vec with 94% accuracy in "similarity classification" and 72.92% accuracy in "similarity comparison." Future studies can focus on steps to improve performance in "similarity comparison" metric and additional preprocessing techniques such as text reorganization (e.g., summaries, sections). These results clearly demonstrate the potential of document embedding to enhance legal research efficiency in the Philippines and similar domains through Natural Language Processing.