Skip to main content

One post tagged with "legal-nlp"

View All Tags

Train, retrieve, or both? What it takes to make a language model cite the law correctly

· 9 min read

We ran a four-arm head-to-head on Ontario tenancy law. Fine-tuning alone is not enough. Retrieval is the lever, and the right design drives hallucinated citations to zero.

Quick summary: Self-represented tenants, landlords, and help-desk staff need to be pointed at the provision of law that governs their question, with a correct statutory citation. We tested whether you get there by fine-tuning a model, by retrieval, or both, on the Ontario Residential Tenancies Act, 2006 and its core regulation. We ran four arms on Qwen2.5-7B-Instruct: base zero-shot, LoRA fine-tuning, retrieval (RAG), and an SFT+RAG hybrid. The base model cannot cite the law: 0.00 citation exact-match, with 81% of its citations pointing to provisions that do not exist. Fine-tuning alone teaches the citation format but mis-recalls the actual section, reaching only 0.148. Retrieval is the decisive lever: it lifts exact-match to 0.44 and, because the model cites only from a verified inventory, drives hallucinations to zero by construction. The SFT+RAG hybrid scores highest at 0.481. The honest limit: 0.481 is well short of the 0.70 bar we set, and the eval set is 27 items, single-run, and human-verification-pending, so read the fine-grained numbers as preliminary.