Skip to content

ayushcodes13/Internal-Policy-Intelligence-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

171 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ Internal Policy Intelligence System

A Deterministic, Governance-Gated Retrieval-Augmented Generation (RAG) Architecture for Enterprise Operations.

Live Demo Python 3.10+ Groq API FAISS License: MIT


[Insert Demo Video Placeholder / Replace with Video or GIF later]


πŸ“– Table of Contents


🎯 Overview

The Internal Policy Intelligence System is a heavily controlled RAG engine designed specifically for policy-bound internal operations (e.g., HR, Finance, IT Helpdesks).

Unlike typical conversational AI chatbots that prioritize fluency, this system is designed to prioritize correctness, traceability, and architectural clarity over raw generative capability. It strictly enforces governance so the system knows exactly when to answer, when to refuse, and when to escalateβ€”and importantly, why.


πŸš€ Why Use This?

In corporate environments, large language models (LLMs) cannot be allowed to guess. If an employee asks about a refund policy or a security protocol, giving a "creative" but incorrect answer is a liability.

This project solves that by enforcing:

Zero-Trust Generation: The LLM is only allowed to generate a response after deterministic risk gates confirm the query is safe, and the context retrieved is the latest, owner-approved version. Every generated sentence is lexically verified against the source text.


πŸ›οΈ System Architecture

Execution Flow

The pipeline enforces strict separation of concerns. No layer mixes responsibilities.

  1. 🧭 Intent Detection β†’ What is the user asking?
  2. 🚦 Routing (Owner Scoping) β†’ Which departments are allowed to answer this?
  3. πŸ”Ž Retrieval (FAISS) β†’ Version dominance (v2 > v1) & Owner filtering.
  4. πŸ›‘οΈ Constraint Filtering & Governance β†’ Risk Gate: Is it safe? Should we refuse? Should we escalate?
  5. βš–οΈ Verdict Handler β†’ Execute decision based on the gate.
  6. ✍️ Strict JSON Generation β†’ Formulate objective answer.
  7. πŸ” Lexical Grounding Check β†’ Verify every generated clause against the text.
Architecture Flow

✨ Core Capabilities

πŸ—‚οΈ Version-Aware Retrieval

Older document versions are automatically suppressed in favor of the most recent policy version using deterministic dominance rules during the ranking phase.

πŸ” Owner-Based Scoping

Documents are rigidly filtered based on routed intent ownership, preventing cross-domain contamination (e.g., Support agents accessing Finance internal memos).

πŸ›‘οΈ Deterministic Governance Gating

A structured governance layer classifies queries into strict verdicts:

  • βœ… SAFE
  • 🚫 REFUSE_POLICY
  • ❌ REFUSE_INVALID
  • 🚨 ESCALATE (Limited strictly to security, legal, or compromise cases).

βš–οΈ Evidence-Backed Refusals & Lexical Grounding

Policy denials must cite exact clauses from retrieved documents. Generated SAFE answers undergo sentence-level lexical grounding checks. Unsupported sentences are flagged, preventing silent hallucination drift.

πŸ“Š Interactive Dashboard & Document Viewer

A responsive Streamlit web interface provides basic session-based anti-spam rate limiting, real-time metrics visibility (confidence, verdicts, execution latency), and an interactive document explorer for direct comparison against raw markdown policies.


πŸ› οΈ Tech Stack

  • Frontend: Streamlit (for rapid dashboarding & prototyping)
  • LLM Engine: Groq (Using llama-3.3-70b-versatile for blazing fast structured generation)
  • Vector Database: FAISS (CPU) (For local, high-speed similarity search)
  • Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
  • Evaluation: Custom offline Python test harnesses

πŸ“ Folder Structure

Internal-Policy-Intelligence-System/
β”œβ”€β”€ app.py                      # Main Streamlit Dashboard UI
β”œβ”€β”€ run_app.sh                  # UI Launcher script
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ index/                  # Compiled FAISS vectors & metadata (Git-ignored)
β”‚   └── raw_docs/               # Source-of-truth markdown files
β”‚       β”œβ”€β”€ policies/
β”‚       β”œβ”€β”€ sops/
β”‚       β”œβ”€β”€ faqs/
β”‚       └── notes/
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ metrics.py              # Recall@k, MRR calculation logic
β”‚   └── run_evaluation.py       # Offline evaluation harness
β”œβ”€β”€ src/                        # Core Application Conveyor Belt
β”‚   β”œβ”€β”€ intent_detection/
β”‚   β”œβ”€β”€ pipeline/               # Main rag_pipeline.py orchestration
β”‚   β”œβ”€β”€ retrieval/              # Indexer, Embedder, Chunker, Reranker
β”‚   β”œβ”€β”€ routing/
β”‚   β”œβ”€β”€ rules/                  # Governance & Constraint logic
β”‚   └── utils/
└── system/                     # Configuration Maps (Prompts, Routing tables)

πŸ“ˆ Evaluation Baseline (Frozen v1)

Retrieval and governance performance are routinely measured using structured test cases offline to prevent regressions.

Retrieval Evaluation

Metric Value
Total Queries Evaluated 18
Recall@5 0.8889
MRR 0.5259

Governance Evaluation

Metric Value
Verdict Accuracy 85% (20 test cases)

(No further tuning was performed after freezing the v1 baseline.)


⚠️ Known Limitations

This system intentionally avoids overengineering. Current limitations include:

  • Governance is semantic-only and operates at the query level.
  • Multi-intent queries may bias dense retrieval toward dominant semantic clusters.
  • Hybrid retrieval (BM25 + vector) is implemented but not enabled default.
  • Grounding is lexical, not semantic.
  • No production hardening for the core pipeline (async execution, distributed databases). The Streamlit frontend uses a simple in-memory session state for basic spam prevention.

These limitations are explicitly documented to maintain architectural transparency.


πŸ’» Getting Started

Prerequisites

1. Install Dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Set Environment Variables

Create a .env file in the root directory:

GROQ_API_KEY=gsk_your_api_key_here

3. Build the Vector Index

Run this to parse /data/raw_docs/ into FAISS before running the app.

python -m src.retrieval.index_builder

4. Launch the Interactive Dashboard

./run_app.sh

5. (Optional) Run Evaluations

Test the codebase against the frozen test suite to generate offline metrics:

python -m evaluation.run_evaluation

Engineered for architectural clarity, strict governance, and measurable performance.

About

A governed RAG system for enterprise SaaS internal compliance & IT operations support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors