L-MARS stands for Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search.
📄 Paper: L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
🤗 Dataset: LegalSearchQA
L-MARS is a multi-agent legal question answering system designed for grounded answers over current legal information. It combines structured query decomposition, agentic web search, evidence filtering, and cited answer synthesis. The project also includes optional local retrieval over user-provided documents and CourtListener integration for case-law search.
L-MARS supports two operating modes:
- Simple Mode: a single-pass retrieval pipeline that decomposes the question, searches for evidence, and synthesizes a grounded answer.
- Multi-Turn Mode: an iterative search-and-verify loop that refines queries until the evidence is sufficient or a maximum number of iterations is reached.
The system can use the following evidence sources:
- Web search via Serper
- Local RAG over user-provided documents using BM25
- CourtListener for case-law retrieval
- Query Agent parses the question into structured search intents.
- Search Agent retrieves evidence from the enabled sources.
- Judge Agent checks whether the evidence is sufficient and flags missing information.
- Summary Agent writes the final answer with citations and rationale.
The paper evaluates L-MARS on two settings:
- LegalSearchQA: a 50-question benchmark that requires post-training, time-sensitive legal knowledge.
- Bar Exam QA: a reasoning-focused benchmark where retrieval provides only limited gains.
Reported metrics in the paper focus on accuracy. The benchmark is designed for grounded legal QA rather than classification metrics such as micro F1.
pip install -r requirements.txtQuick legal research with online search only:
python main.py "Your legal question"Enable offline RAG for local documents:
python main.py --offline-rag "Your legal question"Enable all sources (offline RAG + CourtListener + web search):
python main.py --all-sources "Your legal question"Verbose output:
python main.py -v "Your legal question"Run iterative research with refinement:
python main.py --multi "Complex contract dispute..."Set a custom number of iterations:
python main.py --multi --max-iterations 5 "Your question"If you are reproducing the paper's evaluation pipeline:
python run/single_turn_pipeline.py \
--dataset legalsearchqa \
--model openai:gpt-4o-mini \
--use-cache true \
--output results/lmars_preds.jsonl
python eval/run_eval.py \
--preds results/lmars_preds.jsonl \
--judge-sample 20 \
--llm_model openai:gpt-4o-miniIf you use L-MARS in your research, please cite:
@misc{wang2025lmarslegalmultiagentworkflow,
title={L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search},
author={Ziqi Wang and Boqin Yuan},
year={2025},
eprint={2509.00761},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.00761},
}