EOR Compliance Copilot

Multi-Agent AI System for Policy-Grounded Compliance Assistance

Overview

This project implements a prototype AI compliance copilot designed to answer Employer-of-Record (EOR) policy questions using internal policy documentation.

The system demonstrates how a multi-agent architecture can provide reliable, evidence-grounded answers while enforcing verification, safety checks, and escalation paths required in HR and legal compliance environments.

Agents in the system:

Retriever – finds relevant policy sections using hybrid retrieval (BM25 + embeddings)
Generator – produces grounded answers using retrieved evidence
Verifier – validates citations and evidence consistency
Safety Layer – handles audit logging, escalation, and PII protection

Quick links:

System design: SYSTEM_DESIGN.md
Evaluation plan: evaluation/EVALUATION.md

System Architecture

flowchart TD

A[User Query]
A --> B[Country Detection]
B --> C[Retriever Agent<br>Hybrid Search: BM25 + Embeddings]
C --> D[Relevant Policy Sections]
D --> E[Generator Agent<br>LLM Answer Generation]
E --> F[Draft Answer + Citations]
F --> G[Verifier Agent]
G --> H{Evidence Valid?}

H -->|Yes| I[Final Answer + Citations + Confidence]
H -->|No| J[Escalate to Legal / Compliance]

I --> K[Audit Logging]
J --> K

Key Capabilities

Hybrid Retrieval

Policy documents are retrieved using:

BM25 lexical search
Sentence-transformer semantic embeddings

Hybrid scoring improves accuracy for compliance queries containing both legal terminology and natural language.

Evidence-Grounded Answers

The generator produces answers strictly from retrieved policy sections with explicit citations:

doc_id | section | timestamp

Verification Layer

The verifier agent checks:

citation validity
conflicting policy sources
outdated policy versions
unsupported claims

If the answer cannot be validated, the system escalates instead of guessing.

Safety and Governance

The system includes:

audit logging
PII redaction
escalation logic
confidence scoring

These safeguards are essential for HR, payroll, and employment compliance systems.

Example Interaction

Query

Can we terminate during probation in Germany?

Retrieved Evidence

DE_termination_v1 – Termination During Probation

Answer

Employees in Germany may be terminated during the probation period
with a two-week notice period unless otherwise specified in the
employment contract.

Citation

DE_termination_v1 | Termination During Probation | 2025-01-01

Confidence: Medium
Reason: Answer supported by retrieved policy evidence
Escalation: None

Project Structure

src/
  agents/
    retriever.py      # Hybrid policy retrieval
    generator.py      # LLM answer generation
    verifier.py       # Evidence validation
    safety.py         # Logging + PII protection

data/
  policies/           # Example policy documents

app.py                # Query pipeline orchestration

Running the System

Install dependencies:

pip install -e .

Run the demo queries:

python app.py

Run a single query:

python app.py --query "What is the payroll cutoff for Italy?"

Run offline evaluation:

python -m evaluation.retrieval_eval

Run unit tests:

pytest

Notes:

First run downloads all-MiniLM-L6-v2; HF warning is expected without HF_TOKEN.
Generator uses OpenAI API; set OPENAI_API_KEY for full evaluation.
Queries can be multilingual; the system translates to English for retrieval and returns answers in the user’s language.
Optional offline translation: install argostranslate and language packs to avoid API calls.

Generate a policy knowledge graph export:

python -m scripts.policy_graph

Generate a policy coverage report (missing policy types per country):

python -m scripts.policy_coverage

Optional: render the DOT graph (requires Graphviz):

dot -Tpng outputs/policy_graph.dot -o outputs/policy_graph.png

Multilingual demo (German/Spanish/French):

python -m scripts.multilingual_demo

Design Principles

Grounded Generation
All answers must reference retrieved policy evidence.

Fail-Safe Behavior
If evidence is insufficient or conflicting, the system escalates rather than hallucinating.

Auditability
All decisions are logged for compliance review.

Potential Extensions

Vector database integration (FAISS / pgvector)
Automated policy ingestion pipeline
Retrieval evaluation metrics (Recall@k, citation accuracy)
Structured output validation with JSON schemas
Human-in-the-loop review workflows

Summary

This prototype demonstrates how a multi-agent AI system can support compliance-sensitive workflows by combining retrieval, generation, verification, and safety layers.

The architecture emphasizes reliability, traceability, and controlled escalation—key requirements for deploying AI in HR, payroll, and employment compliance environments.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data/policies		data/policies
evaluation		evaluation
outputs		outputs
scripts		scripts
src/agents		src/agents
tests		tests
.gitignore		.gitignore
README.md		README.md
SYSTEM_DESIGN.md		SYSTEM_DESIGN.md
app.py		app.py
conftest.py		conftest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EOR Compliance Copilot

Overview

System Architecture

Key Capabilities

Hybrid Retrieval

Evidence-Grounded Answers

Verification Layer

Safety and Governance

Example Interaction

Project Structure

Running the System

Design Principles

Potential Extensions

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EOR Compliance Copilot

Overview

System Architecture

Key Capabilities

Hybrid Retrieval

Evidence-Grounded Answers

Verification Layer

Safety and Governance

Example Interaction

Project Structure

Running the System

Design Principles

Potential Extensions

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages