Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents


  • Francina Sole-Mauri Autonomous University of Barcelona
  • Pilar Sánchez-Gijón Autonomous University of Barcelona
  • Antoni Oliver Open University of Catalonia



corpus construction, parallel corpus, Neural Machine Translation (NMT), English–French translation, Cadlaws


This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.

Author Biographies

Francina Sole-Mauri, Autonomous University of Barcelona

Doctoral student of the doctoral program in translation and intercultural studies at the Autonomous University of Barcelona (UAB). Her main research areas are machine translation and computational linguistics. She is a member of the DESPITE-MT project: Describing PostEditese in Machine Translation (Ministry of Science and Innovation).

Pilar Sánchez-Gijón, Autonomous University of Barcelona

Degree in Modern Applied Languages ​​from "Babes-Bolyai" University, Cluj Napoca, Romania, Doctorate in Specialized Translation Studies from Pompeu Fabra University, Barcelona. It is currently a permanent profession of the Department of Translation and Interpretation of the National School of Language, Language and Translation of UNAM, or the teaching of private lessons in translation, theories of translation, documentation and terminology. . In addition, she is the coordinator of the Diploma in English-Spanish Legal Translation at a distance from UNAM. Sessions focus on legal translation studies, computer science and documentaries for the translator and interpreter, lexicography and terminology applied to translation, forensic interpretation.

Antoni Oliver, Open University of Catalonia

Associate Professor of Arts and Humanities Studies at the Open University of Catalonia (UOC) and director of the Master's degree in Translation and Technologies at this university. His main research areas are machine translation and the automatic generation of lexical and terminological resources.


Sole-Mauri, F., Sánchez-Gijón, P., & Oliver, A. (2021). Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents. Mutatis Mutandis. Revista Latinoamericana De Traducción, 14(2), 494–508.