microsoft · RossHastie · Feb 21, 2026 · Feb 22, 2026
@@ -0,0 +1,6 @@
+version: 2
+updates:
+  - package-ecosystem: "pip"
+    directory: "/autoTriage"
+    schedule:
+      interval: "weekly"
@@ -0,0 +1,182 @@
+# CMS Knowledge Accelerator - Alternative Approaches Research V1
+
+**Report ID:** CMS-ALT-V1
+**Date:** 2026-02-20
+**Researcher:** CMS Watchdog Team - Technology Researcher Agent
+**Scope:** Research into alternative and complementary approaches for knowledge agent design
+**Classification:** INTERNAL
+
+---
+
+## Executive Summary
+
+The current architecture (declarative agent + SharePoint grounding + MCP server) is fundamentally sound and aligned with Microsoft's strategic direction. Several complementary technologies could significantly improve retrieval quality, reduce maintenance burden, and enhance user experience. Top three recommendations: **Azure AI Search** for hybrid retrieval, **Copilot Tuning** for legal terminology, and **built-in Agent Evaluation** for automated testing.
+
+---
+
+## Technologies Evaluated
+
+### 1. Azure AI Search (Hybrid Retrieval) -- STRONGLY RECOMMENDED
+
+**What it is:** Enterprise-grade search with keyword + vector + semantic ranking. Can be added as a Copilot Studio knowledge source alongside SharePoint grounding.
+
+**How it would work:**
+1. Index SharePoint content with chunking and vector embeddings
+2. Use Azure OpenAI embeddings for vectorization
+3. Enable semantic ranker for re-ranking
+4. Connect as Copilot Studio knowledge source
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Hybrid search outperforms native SharePoint for legal docs. Proper chunking for large documents. Vector search finds conceptually similar content without keyword matches. |
+| Cons | Cost: S1 ~GBP 200/month + embeddings. SP Online indexer in preview. Added infrastructure. Data staleness risk. |
+| Effort | 2-3 weeks |
+| Impact | VERY HIGH -- step change in retrieval quality |
+| Verdict | **Phase 2 priority #1** |
+
+### 2. Copilot Tuning (Legal Terminology) -- RECOMMENDED
+
+**What it is:** Low-code fine-tuning of LLMs with company data for domain-specific terminology, tone, and relevance.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Teaches model CMS legal terminology (LMA, facility agreement, clause types). Zero architecture change. |
+| Cons | Preview feature. Requires curated training data. |
+| Effort | 1-2 weeks |
+| Impact | HIGH -- accuracy improvement with minimal effort |
+| Verdict | **Phase 2 priority #2** |
+
+### 3. Built-in Agent Evaluation -- STRONGLY RECOMMENDED
+
+**What it is:** Copilot Studio automated evaluation (public preview) with AI-powered grading: relevance, completeness, groundedness scores.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Replaces manual 81-question testing. Enables rapid iteration. Built-in scoring. |
+| Cons | Preview feature. May not match legal domain evaluation nuance. |
+| Effort | 1 week |
+| Impact | HIGH -- enables evaluation-driven development |
+| Verdict | **Implement immediately** |
+
+### 4. SharePoint Premium (Document Processing) -- RECOMMENDED
+
+**What it is:** AI-powered content processing: automatic metadata extraction, classification, summarisation, enhanced search.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Auto-classifies documents. Auto-generates summaries. Reduces KM manual tagging. Free tier through June 2026. |
+| Cons | Per-document cost after free tier. Classification accuracy for legal docs needs validation. |
+| Effort | 2-3 weeks |
+| Impact | MEDIUM-HIGH |
+| Verdict | **Phase 2 for metadata enrichment** |
+
+### 5. Microsoft Graph Connectors -- PARK FOR FUTURE
+
+**What it is:** Index external data into Microsoft Graph for native Copilot reasoning with security trimming.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Brings in knowledge from iManage, practice management, matter databases. Native M365 security. |
+| Cons | CMS knowledge already in SharePoint. Custom development required. |
+| Effort | 3-4 weeks |
+| Impact | LOW (current scope) |
+| Verdict | **Future consideration for non-SharePoint sources** |
+
+### 6. Multi-Agent Orchestration -- NOT NOW
+
+**What it is:** Parent agent routes to specialized child agents based on intent (Build 2025).
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Better per-area prompt engineering. Add specialists without modifying main agent. |
+| Cons | Complexity. Routing mistakes. New in Copilot Studio. Overkill for 2 areas. |
+| Effort | 2-3 weeks |
+| Impact | LOW (2 areas), HIGH (5+ areas) |
+| Verdict | **Design for future evolution, don't implement now** |
+
+### 7. GraphRAG (Knowledge Graphs) -- ASPIRATIONAL
+
+**What it is:** Microsoft open-source combining vector search with knowledge graphs. Legal documents have natural citation relationships.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Precision up to 99%. Captures legal citation relationships. |
+| Cons | Complex. Requires knowledge graph construction. Significant engineering. |
+| Effort | 4-6 weeks |
+| Impact | HIGH but complex |
+| Verdict | **Phase 3 aspirational** |
+
+### 8. Microsoft Agents SDK (Custom Engine) -- NOT APPROPRIATE
+
+**What it is:** Full custom engine agent with own AI model, multi-channel deployment, complex orchestration.
+
+| Aspect | Detail |
+|--------|--------|
+| Pros | Full model control. Multi-channel. Complex reasoning pipelines. |
+| Cons | Massive scope increase. Hosting costs. Overkill for knowledge retrieval. |
+| Effort | 6-8 weeks |
+| Impact | HIGH but massive scope |
+| Verdict | **Not for current scope.** Revisit for custom fine-tuned legal model or non-M365 deployment. |
+
+---
+
+## Competitor Landscape
+
+### Harvey AI (Allen & Overy, PwC)
+- Custom-trained case law models (fine-tuned on all U.S. case law)
+- Multi-agent pipeline for knowledge ingestion
+- Hallucination detection: decomposes responses into individual factual claims and cross-references each
+- Scaled from 6 to 60+ jurisdictions
+
+### DeepJudge
+- Focuses on unlocking knowledge trapped inside law firms' own work product
+- True advantage comes from proprietary data, not public data
+
+### Industry Trends (2025/2026)
+- 79% of legal professionals now use AI (Clio Legal Trends 2025)
+- Big law firms building internal LLM fine-tuning strategies
+- Context (playbooks, precedents, templates) is the main value driver
+- Human oversight remains central
+
+### CMS Could Adopt
+- **Hallucination mitigation** through better citation requirements and confidence scoring
+- **Automated knowledge health checking** (currency tool is a start)
+- **Copilot Tuning** as lightweight alternative to full fine-tuning
+
+---
+
+## Recommended Architecture (If Starting From Scratch)
+
+```
+                    Copilot Studio Agent (Unified)
+                            |
+            +---------------+---------------+
+            |               |               |
+   SharePoint Grounding  Azure AI Search   MCP Server
+   (site-level URL)      (Hybrid RAG)      (Specialized Tools)
+            |               |               |
+            +-------+-------+               |
+                    |                       |
+             SharePoint Online        Microsoft Graph
+            (Document Repository)    (Metadata, Analytics)
+```
+
+---
+
+## Priority Roadmap
+
+| Priority | Enhancement | Effort | Impact | Phase |
+|----------|-------------|--------|--------|-------|
+| 1 | Agent Evaluation (automated testing) | 1 week | High | Now |
+| 2 | Copilot Tuning for legal terminology | 1-2 weeks | High | Phase 2 |
+| 3 | Azure AI Search (hybrid retrieval) | 2-3 weeks | Very High | Phase 2 |
+| 4 | SharePoint Premium for auto-metadata | 2-3 weeks | Medium-High | Phase 2 |
+| 5 | MCP server tool refinement | 1 week | Medium | Phase 2 |
+| 6 | Multi-agent orchestration | 2-3 weeks | Low (now) | Phase 3 |
+| 7 | GraphRAG | 4-6 weeks | High | Phase 3 |
+
+---
+
+*Report Version: V1*
+*Generated by CMS Watchdog Team - Technology Researcher Agent*
+*Date: 2026-02-20*
@@ -0,0 +1,176 @@
+# CMS Knowledge Accelerator - Architecture Review (Devil's Advocate) V1
+
+**Report ID:** CMS-ARCH-V1
+**Date:** 2026-02-20
+**Reviewer:** CMS Watchdog Team - Architecture Reviewer Agent
+**Scope:** Full architecture review across all project components
+**Classification:** INTERNAL
+
+---
+
+## Executive Summary
+
+The CMS Knowledge Accelerator is genuinely impressive work for a fixed-price POC. The MCP server architecture, config-driven tool system, KQL optimization, and 81-question evaluation framework exceed typical POC quality. However, documentation is inconsistent, CI/CD has material gaps, there are no Python-level tests, and the security model uses broader permissions than documented.
+
+---
+
+## 1. Overall Design Decisions
+
+### What's Good
+
+- **Dual-channel architecture** (SharePoint grounding + MCP server) is belt-and-suspenders, not redundant
+- **Config-driven tool registration** (tools.json). Adding a new tool requires zero Python changes
+- **KQL keyword extraction** (stop-word stripping, AND-to-OR fallback). A non-obvious fix most POCs miss
+- **Document currency/staleness warnings** integrated at the tool level. Critical for legal knowledge
+
+### What's Questionable
+
+- **62 libraries in one agent's `items_by_url`** -- exceeds documented 20-item limit
+- **Application permissions instead of OBO** -- ARCHITECTURE.md describes OBO but implementation uses client credentials with tenant-wide access
+- **ARCHITECTURE.md is stale** -- still describes two agents with OBO auth
+
+---
+
+## 2. MCP Server Architecture
+
+### Chosen Approach
+Python MCP server on Azure Container Apps, Streamable HTTP, 12 tools, client credentials outbound, API key inbound.
+
+### Why It's Defensible
+
+| Aspect | Assessment |
+|--------|-----------|
+| MCP standard | Microsoft's strategic direction (GA in Copilot Studio) |
+| Streamable HTTP | Correct -- SSE deprecated August 2025 |
+| Stateless sessions | Matches Copilot Studio invocation pattern |
+| Container Apps scale-to-zero | Cost-appropriate (~GBP 4-9/month) |
+
+### Devil's Advocate Challenges
+
+| Challenge | Detail |
+|-----------|--------|
+| Why both server.py AND function_app.py? | Two complete parallel implementations. Which is deployed? Maintenance burden. |
+| Why not Power Automate? | Stays within M365 boundary. No external container, no API key, no CORS. |
+| Single point of failure | MCP server down = all 12 tools gone. No circuit breaker or degraded-mode fallback. |
+| Cold start | Scale-to-zero: 5-15 second first request after idle. No mitigation deployed. |
+
+---
+
+## 3. Deployment Pipeline Gaps
+
+| Component | Status | Issue |
+|-----------|--------|-------|
+| validate.yml | EXISTS | Checks JSON/PowerShell only. No Python testing. |
+| deploy-sharepoint.yml | EXISTS | Has environment input but doesn't use it. No approval gates. |
+| deploy-agents.yml | BROKEN | Matrix deploys banking-agent/corporate-agent but actual agent is cms-knowledge-agent. |
+| deploy-mcp-server.yml | **MISSING** | README claims it exists. It does not. Manual deployment only. |
+| Integration tests | MISSING | No MCP server startup verification in CI. |
+| Rollback automation | MISSING | Manual procedures only. |
+
+---
+
+## 4. Agent Design: Unified vs Split
+
+**Verdict:** Unified was the right call for 2 practice areas.
+
+**Concerns for scaling:**
+- System prompt is 4,000+ words, pushing Copilot Studio token limits
+- Adding practice areas requires rewriting the entire prompt
+- Internal Banking/Corporate routing relies on LLM intent parsing
+- Will not scale to 5+ practice areas without multi-agent orchestration
+
+---
+
+## 5. Documentation Contradictions
+
+| Document | Claims | Reality |
+|----------|--------|---------|
+| ARCHITECTURE.md | Two agents (Banking + Corporate) | One unified agent |
+| ARCHITECTURE.md | OBO authentication | Client credentials |
+| CLIENT-HANDOVER.md | Two separate agents | One unified agent |
+| deploy-agents.yml | Matrix: banking-agent, corporate-agent | Actual: cms-knowledge-agent |
+| README.md | deploy-mcp-server.yml exists | File does not exist |
+
+---
+
+## 6. Testing Strategy
+
+### What Exists
+- 81-question test suite (41 client + 40 additional)
+- 700-line EVALUATION-GUIDE.md with three-point scoring
+- Copilot Studio evaluation CSV format
+
+### What's Missing
+
+| Gap | Impact |
+|-----|--------|
+| No Python unit tests | Zero coverage for MCP server. Typo in tools.json breaks 12 tools undetected. |
+| No integration tests | No verification MCP server starts and responds to JSON-RPC. |
+| No load testing | Unknown concurrent behaviour. httpx client per request = no pooling. |
+| No regression testing | Prompt changes could degrade one category while improving another. |
+| Tests evaluate agent, not server | Broken tool = FAIL score but never identifies root cause. |
+
+---
+
+## 7. Scalability at 10x Documents
+
+| Component | Issue at Scale |
+|-----------|---------------|
+| Graph Search API | `size: 20` max. Documents ranked 21+ invisible. No pagination. |
+| list_library_contents | Fetches ALL items. 5,000 items = 25 API calls, each creating new httpx client. |
+| generate_briefing_note | Limited to top 20 search results. |
+| Query logger | Local JSON file. Multiple replicas = split logs. Lost on restart. |
+| items_by_url | Already 47/20 limit. 150 libraries at 10x is unworkable. |
+
+---
+
+## 8. Maintenance Burden After Handover
+
+| Requirement | Expertise Needed | Risk if Missing |
+|-------------|-----------------|-----------------|
+| MCP server patches | Python developer | Security vulnerabilities accumulate |
+| System prompt tuning | AI engineering | Changing one rule cascades into degraded behaviour |
+| Secret rotation | Azure expertise | Silent auth failure, degraded agent |
+| Monitoring | DevOps | Nobody knows when MCP server is down |
+
+**Biggest risk:** Secret expiry with no alerting. MCP tools fail silently, OneDriveAndSharePoint still works, agent appears "mostly fine" but all value-add features vanish.
+
+---
+
+## 9. Production Failure Scenarios
+
+| Scenario | Impact | Mitigation Status |
+|----------|--------|-------------------|
+| Client secret expires | All MCP tools fail silently | Not deployed |
+| SharePoint index lag (up to 24h) | Agent cites old/archived documents | Currency tool partially mitigates |
+| Large .docx OOM | Container memory exhaustion | MAX_DOWNLOAD_BYTES exists but may be insufficient |
+| CORS misconfiguration | *.microsoft.com matches unintended domains | API key second layer |
+| asyncio.run() conflict | Azure Functions crash | Latent bug |
+| practice_areas_found bug | Briefing notes always generic | Code bug, unnoticed |
+
+---
+
+## 10. What's Missing
+
+| Gap | Effort to Fix |
+|-----|---------------|
+| Rate limiting on MCP endpoint | 1 day |
+| Request/response logging with redaction | 2-3 days |
+| Deep health check (verify Graph API + SharePoint reachability) | 1 day |
+| Retry logic with exponential backoff | 1-2 days |
+| PDF text extraction | 2-3 days |
+| HTTP connection pooling (shared httpx.AsyncClient) | 1 day |
+| Distributed tracing (OpenTelemetry) | 3-5 days |
+| deploy-mcp-server.yml workflow | 1-2 days |
+
+---
+
+## Verdict
+
+Strong POC work demonstrating genuine engineering quality. The MCP architecture, tool design, and evaluation framework are above average. The gaps between documentation and reality, the missing CI/CD pipeline, the absence of Python tests, and security model concerns need addressing before production. The biggest strategic risk is maintenance handover.
+
+---
+
+*Report Version: V1*
+*Generated by CMS Watchdog Team - Architecture Reviewer Agent*
+*Date: 2026-02-20*