Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "pip"
directory: "/autoTriage"
schedule:
interval: "weekly"
182 changes: 182 additions & 0 deletions Feedback/CMS-Alternative-Approaches-Research-V1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# CMS Knowledge Accelerator - Alternative Approaches Research V1

**Report ID:** CMS-ALT-V1
**Date:** 2026-02-20
**Researcher:** CMS Watchdog Team - Technology Researcher Agent
**Scope:** Research into alternative and complementary approaches for knowledge agent design
**Classification:** INTERNAL

---

## Executive Summary

The current architecture (declarative agent + SharePoint grounding + MCP server) is fundamentally sound and aligned with Microsoft's strategic direction. Several complementary technologies could significantly improve retrieval quality, reduce maintenance burden, and enhance user experience. Top three recommendations: **Azure AI Search** for hybrid retrieval, **Copilot Tuning** for legal terminology, and **built-in Agent Evaluation** for automated testing.

---

## Technologies Evaluated

### 1. Azure AI Search (Hybrid Retrieval) -- STRONGLY RECOMMENDED

**What it is:** Enterprise-grade search with keyword + vector + semantic ranking. Can be added as a Copilot Studio knowledge source alongside SharePoint grounding.

**How it would work:**
1. Index SharePoint content with chunking and vector embeddings
2. Use Azure OpenAI embeddings for vectorization
3. Enable semantic ranker for re-ranking
4. Connect as Copilot Studio knowledge source

| Aspect | Detail |
|--------|--------|
| Pros | Hybrid search outperforms native SharePoint for legal docs. Proper chunking for large documents. Vector search finds conceptually similar content without keyword matches. |
| Cons | Cost: S1 ~GBP 200/month + embeddings. SP Online indexer in preview. Added infrastructure. Data staleness risk. |
| Effort | 2-3 weeks |
| Impact | VERY HIGH -- step change in retrieval quality |
| Verdict | **Phase 2 priority #1** |

### 2. Copilot Tuning (Legal Terminology) -- RECOMMENDED

**What it is:** Low-code fine-tuning of LLMs with company data for domain-specific terminology, tone, and relevance.

| Aspect | Detail |
|--------|--------|
| Pros | Teaches model CMS legal terminology (LMA, facility agreement, clause types). Zero architecture change. |
| Cons | Preview feature. Requires curated training data. |
| Effort | 1-2 weeks |
| Impact | HIGH -- accuracy improvement with minimal effort |
| Verdict | **Phase 2 priority #2** |

### 3. Built-in Agent Evaluation -- STRONGLY RECOMMENDED

**What it is:** Copilot Studio automated evaluation (public preview) with AI-powered grading: relevance, completeness, groundedness scores.

| Aspect | Detail |
|--------|--------|
| Pros | Replaces manual 81-question testing. Enables rapid iteration. Built-in scoring. |
| Cons | Preview feature. May not match legal domain evaluation nuance. |
| Effort | 1 week |
| Impact | HIGH -- enables evaluation-driven development |
| Verdict | **Implement immediately** |

### 4. SharePoint Premium (Document Processing) -- RECOMMENDED

**What it is:** AI-powered content processing: automatic metadata extraction, classification, summarisation, enhanced search.

| Aspect | Detail |
|--------|--------|
| Pros | Auto-classifies documents. Auto-generates summaries. Reduces KM manual tagging. Free tier through June 2026. |
| Cons | Per-document cost after free tier. Classification accuracy for legal docs needs validation. |
| Effort | 2-3 weeks |
| Impact | MEDIUM-HIGH |
| Verdict | **Phase 2 for metadata enrichment** |

### 5. Microsoft Graph Connectors -- PARK FOR FUTURE

**What it is:** Index external data into Microsoft Graph for native Copilot reasoning with security trimming.

| Aspect | Detail |
|--------|--------|
| Pros | Brings in knowledge from iManage, practice management, matter databases. Native M365 security. |
| Cons | CMS knowledge already in SharePoint. Custom development required. |
| Effort | 3-4 weeks |
| Impact | LOW (current scope) |
| Verdict | **Future consideration for non-SharePoint sources** |

### 6. Multi-Agent Orchestration -- NOT NOW

**What it is:** Parent agent routes to specialized child agents based on intent (Build 2025).

| Aspect | Detail |
|--------|--------|
| Pros | Better per-area prompt engineering. Add specialists without modifying main agent. |
| Cons | Complexity. Routing mistakes. New in Copilot Studio. Overkill for 2 areas. |
| Effort | 2-3 weeks |
| Impact | LOW (2 areas), HIGH (5+ areas) |
| Verdict | **Design for future evolution, don't implement now** |

### 7. GraphRAG (Knowledge Graphs) -- ASPIRATIONAL

**What it is:** Microsoft open-source combining vector search with knowledge graphs. Legal documents have natural citation relationships.

| Aspect | Detail |
|--------|--------|
| Pros | Precision up to 99%. Captures legal citation relationships. |
| Cons | Complex. Requires knowledge graph construction. Significant engineering. |
| Effort | 4-6 weeks |
| Impact | HIGH but complex |
| Verdict | **Phase 3 aspirational** |

### 8. Microsoft Agents SDK (Custom Engine) -- NOT APPROPRIATE

**What it is:** Full custom engine agent with own AI model, multi-channel deployment, complex orchestration.

| Aspect | Detail |
|--------|--------|
| Pros | Full model control. Multi-channel. Complex reasoning pipelines. |
| Cons | Massive scope increase. Hosting costs. Overkill for knowledge retrieval. |
| Effort | 6-8 weeks |
| Impact | HIGH but massive scope |
| Verdict | **Not for current scope.** Revisit for custom fine-tuned legal model or non-M365 deployment. |

---

## Competitor Landscape

### Harvey AI (Allen & Overy, PwC)
- Custom-trained case law models (fine-tuned on all U.S. case law)
- Multi-agent pipeline for knowledge ingestion
- Hallucination detection: decomposes responses into individual factual claims and cross-references each
- Scaled from 6 to 60+ jurisdictions

### DeepJudge
- Focuses on unlocking knowledge trapped inside law firms' own work product
- True advantage comes from proprietary data, not public data

### Industry Trends (2025/2026)
- 79% of legal professionals now use AI (Clio Legal Trends 2025)
- Big law firms building internal LLM fine-tuning strategies
- Context (playbooks, precedents, templates) is the main value driver
- Human oversight remains central

### CMS Could Adopt
- **Hallucination mitigation** through better citation requirements and confidence scoring
- **Automated knowledge health checking** (currency tool is a start)
- **Copilot Tuning** as lightweight alternative to full fine-tuning

---

## Recommended Architecture (If Starting From Scratch)

```
Copilot Studio Agent (Unified)
|
+---------------+---------------+
| | |
SharePoint Grounding Azure AI Search MCP Server
(site-level URL) (Hybrid RAG) (Specialized Tools)
| | |
+-------+-------+ |
| |
SharePoint Online Microsoft Graph
(Document Repository) (Metadata, Analytics)
```

---

## Priority Roadmap

| Priority | Enhancement | Effort | Impact | Phase |
|----------|-------------|--------|--------|-------|
| 1 | Agent Evaluation (automated testing) | 1 week | High | Now |
| 2 | Copilot Tuning for legal terminology | 1-2 weeks | High | Phase 2 |
| 3 | Azure AI Search (hybrid retrieval) | 2-3 weeks | Very High | Phase 2 |
| 4 | SharePoint Premium for auto-metadata | 2-3 weeks | Medium-High | Phase 2 |
| 5 | MCP server tool refinement | 1 week | Medium | Phase 2 |
| 6 | Multi-agent orchestration | 2-3 weeks | Low (now) | Phase 3 |
| 7 | GraphRAG | 4-6 weeks | High | Phase 3 |

---

*Report Version: V1*
*Generated by CMS Watchdog Team - Technology Researcher Agent*
*Date: 2026-02-20*
176 changes: 176 additions & 0 deletions Feedback/CMS-Architecture-Review-V1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# CMS Knowledge Accelerator - Architecture Review (Devil's Advocate) V1

**Report ID:** CMS-ARCH-V1
**Date:** 2026-02-20
**Reviewer:** CMS Watchdog Team - Architecture Reviewer Agent
**Scope:** Full architecture review across all project components
**Classification:** INTERNAL

---

## Executive Summary

The CMS Knowledge Accelerator is genuinely impressive work for a fixed-price POC. The MCP server architecture, config-driven tool system, KQL optimization, and 81-question evaluation framework exceed typical POC quality. However, documentation is inconsistent, CI/CD has material gaps, there are no Python-level tests, and the security model uses broader permissions than documented.

---

## 1. Overall Design Decisions

### What's Good

- **Dual-channel architecture** (SharePoint grounding + MCP server) is belt-and-suspenders, not redundant
- **Config-driven tool registration** (tools.json). Adding a new tool requires zero Python changes
- **KQL keyword extraction** (stop-word stripping, AND-to-OR fallback). A non-obvious fix most POCs miss
- **Document currency/staleness warnings** integrated at the tool level. Critical for legal knowledge

### What's Questionable

- **62 libraries in one agent's `items_by_url`** -- exceeds documented 20-item limit
- **Application permissions instead of OBO** -- ARCHITECTURE.md describes OBO but implementation uses client credentials with tenant-wide access
- **ARCHITECTURE.md is stale** -- still describes two agents with OBO auth

---

## 2. MCP Server Architecture

### Chosen Approach
Python MCP server on Azure Container Apps, Streamable HTTP, 12 tools, client credentials outbound, API key inbound.

### Why It's Defensible

| Aspect | Assessment |
|--------|-----------|
| MCP standard | Microsoft's strategic direction (GA in Copilot Studio) |
| Streamable HTTP | Correct -- SSE deprecated August 2025 |
| Stateless sessions | Matches Copilot Studio invocation pattern |
| Container Apps scale-to-zero | Cost-appropriate (~GBP 4-9/month) |

### Devil's Advocate Challenges

| Challenge | Detail |
|-----------|--------|
| Why both server.py AND function_app.py? | Two complete parallel implementations. Which is deployed? Maintenance burden. |
| Why not Power Automate? | Stays within M365 boundary. No external container, no API key, no CORS. |
| Single point of failure | MCP server down = all 12 tools gone. No circuit breaker or degraded-mode fallback. |
| Cold start | Scale-to-zero: 5-15 second first request after idle. No mitigation deployed. |

---

## 3. Deployment Pipeline Gaps

| Component | Status | Issue |
|-----------|--------|-------|
| validate.yml | EXISTS | Checks JSON/PowerShell only. No Python testing. |
| deploy-sharepoint.yml | EXISTS | Has environment input but doesn't use it. No approval gates. |
| deploy-agents.yml | BROKEN | Matrix deploys banking-agent/corporate-agent but actual agent is cms-knowledge-agent. |
| deploy-mcp-server.yml | **MISSING** | README claims it exists. It does not. Manual deployment only. |
| Integration tests | MISSING | No MCP server startup verification in CI. |
| Rollback automation | MISSING | Manual procedures only. |

---

## 4. Agent Design: Unified vs Split

**Verdict:** Unified was the right call for 2 practice areas.

**Concerns for scaling:**
- System prompt is 4,000+ words, pushing Copilot Studio token limits
- Adding practice areas requires rewriting the entire prompt
- Internal Banking/Corporate routing relies on LLM intent parsing
- Will not scale to 5+ practice areas without multi-agent orchestration

---

## 5. Documentation Contradictions

| Document | Claims | Reality |
|----------|--------|---------|
| ARCHITECTURE.md | Two agents (Banking + Corporate) | One unified agent |
| ARCHITECTURE.md | OBO authentication | Client credentials |
| CLIENT-HANDOVER.md | Two separate agents | One unified agent |
| deploy-agents.yml | Matrix: banking-agent, corporate-agent | Actual: cms-knowledge-agent |
| README.md | deploy-mcp-server.yml exists | File does not exist |

---

## 6. Testing Strategy

### What Exists
- 81-question test suite (41 client + 40 additional)
- 700-line EVALUATION-GUIDE.md with three-point scoring
- Copilot Studio evaluation CSV format

### What's Missing

| Gap | Impact |
|-----|--------|
| No Python unit tests | Zero coverage for MCP server. Typo in tools.json breaks 12 tools undetected. |
| No integration tests | No verification MCP server starts and responds to JSON-RPC. |
| No load testing | Unknown concurrent behaviour. httpx client per request = no pooling. |
| No regression testing | Prompt changes could degrade one category while improving another. |
| Tests evaluate agent, not server | Broken tool = FAIL score but never identifies root cause. |

---

## 7. Scalability at 10x Documents

| Component | Issue at Scale |
|-----------|---------------|
| Graph Search API | `size: 20` max. Documents ranked 21+ invisible. No pagination. |
| list_library_contents | Fetches ALL items. 5,000 items = 25 API calls, each creating new httpx client. |
| generate_briefing_note | Limited to top 20 search results. |
| Query logger | Local JSON file. Multiple replicas = split logs. Lost on restart. |
| items_by_url | Already 47/20 limit. 150 libraries at 10x is unworkable. |

---

## 8. Maintenance Burden After Handover

| Requirement | Expertise Needed | Risk if Missing |
|-------------|-----------------|-----------------|
| MCP server patches | Python developer | Security vulnerabilities accumulate |
| System prompt tuning | AI engineering | Changing one rule cascades into degraded behaviour |
| Secret rotation | Azure expertise | Silent auth failure, degraded agent |
| Monitoring | DevOps | Nobody knows when MCP server is down |

**Biggest risk:** Secret expiry with no alerting. MCP tools fail silently, OneDriveAndSharePoint still works, agent appears "mostly fine" but all value-add features vanish.

---

## 9. Production Failure Scenarios

| Scenario | Impact | Mitigation Status |
|----------|--------|-------------------|
| Client secret expires | All MCP tools fail silently | Not deployed |
| SharePoint index lag (up to 24h) | Agent cites old/archived documents | Currency tool partially mitigates |
| Large .docx OOM | Container memory exhaustion | MAX_DOWNLOAD_BYTES exists but may be insufficient |
| CORS misconfiguration | *.microsoft.com matches unintended domains | API key second layer |
| asyncio.run() conflict | Azure Functions crash | Latent bug |
| practice_areas_found bug | Briefing notes always generic | Code bug, unnoticed |

---

## 10. What's Missing

| Gap | Effort to Fix |
|-----|---------------|
| Rate limiting on MCP endpoint | 1 day |
| Request/response logging with redaction | 2-3 days |
| Deep health check (verify Graph API + SharePoint reachability) | 1 day |
| Retry logic with exponential backoff | 1-2 days |
| PDF text extraction | 2-3 days |
| HTTP connection pooling (shared httpx.AsyncClient) | 1 day |
| Distributed tracing (OpenTelemetry) | 3-5 days |
| deploy-mcp-server.yml workflow | 1-2 days |

---

## Verdict

Strong POC work demonstrating genuine engineering quality. The MCP architecture, tool design, and evaluation framework are above average. The gaps between documentation and reality, the missing CI/CD pipeline, the absence of Python tests, and security model concerns need addressing before production. The biggest strategic risk is maintenance handover.

---

*Report Version: V1*
*Generated by CMS Watchdog Team - Architecture Reviewer Agent*
*Date: 2026-02-20*
Loading