Skip to content

Commit 01b4b59

Browse files
authored
Merge pull request #56 from w7-mgfcode/dev
feat: Phase 9 Agentic Layer - Release v0.3.0
2 parents ce3da0d + beee99c commit 01b4b59

37 files changed

Lines changed: 9299 additions & 136 deletions

.env.example

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,5 +53,51 @@ RAG_INDEX_TYPE=hnsw
5353
RAG_HNSW_M=16
5454
RAG_HNSW_EF_CONSTRUCTION=64
5555

56+
# =============================================================================
57+
# Agentic Layer Configuration (PydanticAI v1.48.0)
58+
# =============================================================================
59+
60+
# Model Configuration
61+
# Model identifier format: "provider:model-name"
62+
# Supported providers:
63+
# - anthropic: Claude models (claude-sonnet-4-5, claude-opus-4-5, etc.)
64+
# - openai: GPT models (gpt-4o, gpt-4o-mini, etc.)
65+
# - google-gla: Gemini models via Google AI Studio (gemini-2-5-flash, gemini-3-flash, gemini-3-pro)
66+
# - google-vertex: Gemini models via Vertex AI (gemini-*) [requires GCP auth]
67+
AGENT_DEFAULT_MODEL=anthropic:claude-sonnet-4-5
68+
AGENT_FALLBACK_MODEL=openai:gpt-4o
69+
70+
# API Keys (only one needed based on your chosen provider)
71+
ANTHROPIC_API_KEY=sk-ant-your-anthropic-api-key-here
72+
# OPENAI_API_KEY=sk-your-openai-api-key-here
73+
# GOOGLE_API_KEY=your-google-api-key-here # For google-gla:* models
74+
75+
# Gemini Extended Reasoning (optional)
76+
# Thinking mode for Gemini 2.5+ models (requires additional tokens)
77+
# Set a token budget (e.g., 2000-8000) or leave unset to disable
78+
# Recommended: 4000 tokens for complex agent planning tasks
79+
# AGENT_THINKING_BUDGET=4000
80+
81+
# Model parameters
82+
AGENT_TEMPERATURE=0.1
83+
AGENT_MAX_TOKENS=4096
84+
85+
# Execution settings
86+
AGENT_MAX_TOOL_CALLS=10
87+
AGENT_TIMEOUT_SECONDS=120
88+
AGENT_RETRY_ATTEMPTS=3
89+
AGENT_RETRY_DELAY_SECONDS=1.0
90+
91+
# Session settings
92+
AGENT_SESSION_TTL_MINUTES=120
93+
AGENT_MAX_SESSIONS_PER_USER=5
94+
95+
# Human-in-the-loop actions (JSON array format required for safe parsing)
96+
AGENT_REQUIRE_APPROVAL=["create_alias","archive_run"]
97+
AGENT_APPROVAL_TIMEOUT_MINUTES=60
98+
99+
# Streaming
100+
AGENT_ENABLE_STREAMING=true
101+
56102
# Frontend (Vite)
57103
VITE_API_BASE_URL=http://localhost:8123

.github/workflows/cd-release.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@ jobs:
3131
- uses: googleapis/release-please-action@v4
3232
id: release
3333
with:
34-
token: ${{ secrets.GITHUB_TOKEN }}
34+
# Use PAT to trigger CI workflows on release PRs
35+
# GITHUB_TOKEN won't trigger workflows (GitHub security feature)
36+
# If RELEASE_PAT is not set, falls back to GITHUB_TOKEN (CI won't auto-trigger)
37+
token: ${{ secrets.RELEASE_PAT || secrets.GITHUB_TOKEN }}
3538
config-file: release-please-config.json
3639
manifest-file: .release-please-manifest.json
3740

PRPs/PRP-10-agentic-layer.md

Lines changed: 102 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
**Feature**: INITIAL-10.md — Agentic Layer
44
**Status**: Ready for Implementation
5-
**Confidence Score**: 7.5/10
5+
**Confidence Score**: 8.0/10
6+
**Last Updated**: 2026-02-01 (Post Phase-9 RAG Review)
67

78
---
89

@@ -65,7 +66,7 @@ This is the "Brain" layer that orchestrates tools from INITIAL-9 (RAG), Phase 5
6566
why: "Official PydanticAI docs - main reference"
6667

6768
- url: https://ai.pydantic.dev/agents/
68-
why: "Agent constructor, result_type, system_prompt, run/run_stream methods"
69+
why: "Agent constructor, output_type, system_prompt, run/run_stream methods"
6970

7071
- url: https://ai.pydantic.dev/tools/
7172
why: "@agent.tool decorator, RunContext, deps_type, tool parameters"
@@ -165,13 +166,14 @@ examples/agents/
165166
### Known Gotchas & Library Quirks
166167

167168
```python
168-
# CRITICAL: PydanticAI model identifier format
169-
# Use "anthropic:claude-sonnet-4-20250514" NOT "claude-sonnet-4-20250514"
170-
agent = Agent(model="anthropic:claude-sonnet-4-20250514")
169+
# CRITICAL: PydanticAI model identifier format (updated Jan 2026)
170+
# Use "anthropic:claude-sonnet-4-5" NOT "claude-sonnet-4-5"
171+
# For production, pin specific version: "anthropic:claude-sonnet-4-5-20250929"
172+
agent = Agent(model="anthropic:claude-sonnet-4-5")
171173

172174
# CRITICAL: deps_type must match RunContext generic parameter
173175
agent = Agent(
174-
model="anthropic:claude-sonnet-4-20250514",
176+
model="anthropic:claude-sonnet-4-5",
175177
deps_type=AgentDeps, # Your dependency dataclass
176178
)
177179

@@ -502,9 +504,12 @@ class WSEvent(BaseModel):
502504
```yaml
503505
MODIFY: pyproject.toml
504506
ADD to dependencies:
505-
- "pydantic-ai>=0.1.0" # PydanticAI agent framework
506-
- "anthropic>=0.40.0" # Anthropic SDK for Claude
507+
- "pydantic-ai>=1.48.0" # PydanticAI agent framework (v1 stable, API guaranteed)
508+
- "anthropic>=0.50.0" # Anthropic SDK for Claude
507509
- "websockets>=13.0" # WebSocket support (already in uvicorn[standard])
510+
511+
NOTE: PydanticAI v1.0 was released Sept 2025 with API stability guarantee.
512+
Current version is 1.48.0 (Jan 2026). Do NOT use 0.x versions.
508513
```
509514
510515
### Task 2: Add Agent Settings to config.py
@@ -514,7 +519,7 @@ MODIFY: app/core/config.py
514519
ADD after RAG settings:
515520

516521
# Agent LLM Configuration
517-
agent_default_model: str = "anthropic:claude-sonnet-4-20250514"
522+
agent_default_model: str = "anthropic:claude-sonnet-4-5"
518523
agent_fallback_model: str = "openai:gpt-4o"
519524
agent_temperature: float = 0.1
520525
agent_max_tokens: int = 4096
@@ -596,28 +601,41 @@ INCLUDE:
596601
CREATE: app/features/agents/tools/registry_tools.py
597602
TOOLS:
598603
- list_runs(ctx, filters) -> list[RunSummary]
604+
# Wraps: RegistryService.list_runs(db, page, page_size, model_type, status, store_id, product_id)
599605
- compare_runs(ctx, run_id_a, run_id_b) -> CompareResult
606+
# Wraps: RegistryService.compare_runs(db, run_id_a, run_id_b)
600607
- create_alias(ctx, alias_name, run_id) -> AliasResult
608+
# Wraps: RegistryService.create_alias(db, AliasCreate(...))
609+
# REQUIRES HUMAN APPROVAL
601610
- archive_run(ctx, run_id) -> ArchiveResult
611+
# Wraps: RegistryService.update_run(db, run_id, RunUpdate(status=RunStatus.ARCHIVED))
612+
# NOTE: No direct archive method - use update_run with ARCHIVED status
613+
# REQUIRES HUMAN APPROVAL
602614

603615
CREATE: app/features/agents/tools/backtesting_tools.py
604616
TOOLS:
605617
- run_backtest(ctx, model_type, config, store_id, product_id, n_splits) -> BacktestResult
618+
# Wraps: BacktestingService.run_backtest(db, store_id, product_id, start_date, end_date, config)
606619

607620
CREATE: app/features/agents/tools/forecasting_tools.py
608621
TOOLS:
609622
- list_models(ctx) -> list[ModelInfo]
623+
# Returns available model types: naive, seasonal_naive, moving_average, lightgbm (if enabled)
610624

611625
CREATE: app/features/agents/tools/rag_tools.py
612626
TOOLS:
613-
- retrieve_context(ctx, query, top_k) -> list[RetrievedChunk]
627+
- retrieve_context(ctx, query, top_k) -> list[ChunkResult]
628+
# Wraps: RAGService.retrieve(db, RetrieveRequest(query=query, top_k=top_k))
629+
# NOTE: RAG service uses retrieve() not retrieve_context()
614630
- format_citation(ctx, chunk) -> Citation
631+
# Transforms ChunkResult to Citation schema
615632

616633
CRITICAL for all tools:
617634
- Use @agent.tool decorator (not @agent.tool_plain) for db access
618635
- First param is RunContext[AgentDeps]
619-
- Detailed docstrings for LLM schema
636+
- Detailed docstrings for LLM schema (Google/numpy style supported)
620637
- Structured logging with timing
638+
- Match actual service method signatures from Phase 5-9 implementations
621639
```
622640
623641
### Task 8: Create Agent Definitions
@@ -736,11 +754,58 @@ ADD websocket: app.add_api_websocket_route("/agents/stream", websocket_stream)
736754
```yaml
737755
CREATE: app/features/agents/tests/conftest.py
738756
FIXTURES:
739-
- db_session: Async session with cleanup
757+
- db_session: Async session with cleanup (follow registry/tests/conftest.py pattern)
740758
- client: AsyncClient with db override
741-
- mock_anthropic: Mock Anthropic API responses
742-
- sample_experiment_request: Test request
743-
- sample_rag_request: Test request
759+
- mock_pydantic_ai_agent: Mock PydanticAI Agent (see pattern below)
760+
- sample_experiment_request: ExperimentRequest fixture
761+
- sample_rag_request: RAGQueryRequest fixture
762+
- sample_agent_session: AgentSession ORM fixture
763+
764+
MOCK PATTERN (following rag/tests/conftest.py mock_embedding_service):
765+
```
766+
767+
```python
768+
@pytest.fixture
769+
def mock_pydantic_ai_agent():
770+
"""Mock PydanticAI Agent for unit tests without LLM calls.
771+
772+
Follows the mock_embedding_service pattern from RAG tests.
773+
Returns deterministic responses without API calls.
774+
"""
775+
from unittest.mock import AsyncMock, MagicMock
776+
from app.features.agents.schemas import ExperimentReport, RunSummary
777+
778+
# Create mock structured output
779+
mock_report = ExperimentReport(
780+
objective="Test objective",
781+
methodology="Tested naive and seasonal_naive models",
782+
experiments_run=2,
783+
best_run=RunSummary(
784+
run_id="test123",
785+
model_type="seasonal_naive",
786+
config={"season_length": 7},
787+
metrics={"mae": 5.0, "smape": 10.0},
788+
),
789+
baseline_comparison=None,
790+
recommendation="Deploy seasonal_naive model",
791+
approval_required=False,
792+
)
793+
794+
# Mock result object
795+
mock_result = MagicMock()
796+
mock_result.output = mock_report
797+
mock_result.usage.return_value = MagicMock(
798+
input_tokens=100,
799+
output_tokens=50,
800+
)
801+
mock_result.messages = []
802+
803+
# Mock agent
804+
agent = MagicMock()
805+
agent.run = AsyncMock(return_value=mock_result)
806+
agent.run_stream = AsyncMock()
807+
808+
return agent
744809
```
745810

746811
### Task 14: Create Unit Tests
@@ -792,9 +857,11 @@ MODIFY: .env.example
792857
ADD:
793858
# Agent Configuration
794859
ANTHROPIC_API_KEY=sk-ant-...
795-
AGENT_DEFAULT_MODEL=anthropic:claude-sonnet-4-20250514
860+
AGENT_DEFAULT_MODEL=anthropic:claude-sonnet-4-5
861+
AGENT_FALLBACK_MODEL=openai:gpt-4o
796862
AGENT_MAX_TOOL_CALLS=10
797863
AGENT_TIMEOUT_SECONDS=120
864+
AGENT_TEMPERATURE=0.1
798865
```
799866

800867
---
@@ -899,22 +966,31 @@ python examples/agents/websocket_client.py
899966

900967
---
901968

902-
## Confidence Score: 7.5/10
969+
## Confidence Score: 8.0/10
903970

904971
**Strengths:**
905-
- PydanticAI has excellent documentation
906-
- Clear FastAPI integration patterns
907-
- Existing service patterns to follow
908-
- Tool integrations with existing modules
972+
- PydanticAI v1.x provides API stability guarantee (released Sept 2025)
973+
- Clear FastAPI integration patterns with excellent documentation
974+
- Existing service patterns from Registry/RAG/Backtesting to follow
975+
- Tool integrations with existing modules well-defined
976+
- Mock patterns established in RAG tests (mock_embedding_service)
909977

910978
**Risks:**
911-
- PydanticAI is relatively new (versioning may change)
912979
- WebSocket streaming with tools is complex
913-
- LLM rate limits may affect tests
980+
- LLM rate limits may affect integration tests
914981
- Message history serialization edge cases
982+
- Tool execution ordering in multi-step workflows
915983

916984
**Mitigations:**
917-
- Pin PydanticAI version in pyproject.toml
918-
- Comprehensive mocking for unit tests
919-
- Rate-limited integration tests
985+
- Pin PydanticAI version >=1.48.0 in pyproject.toml
986+
- Comprehensive mocking following RAG test patterns
987+
- Rate-limited integration tests with retry logic
920988
- JSONB for flexible message storage
989+
- Timeout handling with asyncio.wait_for
990+
991+
**Changes Since Initial Review (2026-02-01):**
992+
- Updated PydanticAI from 0.1.0 to 1.48.0 (v1 stable)
993+
- Updated Claude model identifier to claude-sonnet-4-5 format
994+
- Added service method mapping notes to Task 7
995+
- Added mock_pydantic_ai_agent fixture pattern
996+
- Verified tool wrappers match actual service APIs

README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Portfolio-grade end-to-end retail demand forecasting system.
99
- **Serving Layer**: Typed FastAPI endpoints (Pydantic v2 validation)
1010
- **Model Registry**: Run configs, metrics, artifacts, and data windows for reproducibility
1111
- **RAG Knowledge Base**: Postgres pgvector embeddings + evidence-grounded answers with citations
12+
- **Agentic Layer**: PydanticAI agents for autonomous experimentation and evidence-grounded Q&A with human-in-the-loop approval
1213

1314
## Quick Start
1415

@@ -120,6 +121,8 @@ app/
120121
│ ├── forecasting/ # Model training, prediction, persistence
121122
│ ├── backtesting/ # Time-series CV, metrics, baseline comparisons
122123
│ ├── registry/ # Model run tracking, artifacts, deployment aliases
124+
│ ├── rag/ # pgvector embeddings, semantic search, citations
125+
│ ├── agents/ # PydanticAI agents (experiment, RAG assistant)
123126
│ ├── dimensions/ # Store/product discovery for LLM tool-calling
124127
│ ├── analytics/ # KPI aggregations and drilldown analysis
125128
│ └── jobs/ # Async-ready task orchestration
@@ -507,6 +510,85 @@ curl -X POST http://localhost:8123/rag/retrieve \
507510
- Markdown and OpenAPI chunking strategies
508511
- Configurable embedding dimensions
509512

513+
### Agentic Layer
514+
515+
- `POST /agents/sessions` - Create a new agent session
516+
- `GET /agents/sessions/{session_id}` - Get session status and details
517+
- `POST /agents/sessions/{session_id}/chat` - Send a message to the agent
518+
- `POST /agents/sessions/{session_id}/approve` - Approve or reject a pending action
519+
- `DELETE /agents/sessions/{session_id}` - Close a session
520+
- `WS /agents/stream` - WebSocket streaming endpoint for real-time responses
521+
522+
**Agent Types:**
523+
524+
1. **Experiment Orchestrator** (`agent_type: "experiment"`):
525+
- Autonomous model experimentation workflow
526+
- Runs backtests and compares configurations
527+
- Recommends best model with human-in-the-loop approval
528+
529+
2. **RAG Assistant** (`agent_type: "rag_assistant"`):
530+
- Evidence-grounded documentation Q&A
531+
- Citation-backed responses with confidence scoring
532+
- "Insufficient evidence" detection to prevent hallucination
533+
534+
**Example Create Session Request:**
535+
```bash
536+
curl -X POST http://localhost:8123/agents/sessions \
537+
-H "Content-Type: application/json" \
538+
-d '{
539+
"agent_type": "rag_assistant",
540+
"initial_context": null
541+
}'
542+
```
543+
544+
**Example Chat Request:**
545+
```bash
546+
curl -X POST http://localhost:8123/agents/sessions/{session_id}/chat \
547+
-H "Content-Type: application/json" \
548+
-d '{
549+
"message": "How does backtesting prevent data leakage?"
550+
}'
551+
```
552+
553+
**Features:**
554+
- PydanticAI v1.48.0 for structured, type-safe agent orchestration
555+
- Session management with PostgreSQL JSONB message history
556+
- Human-in-the-loop approval for sensitive actions (create_alias, archive_run)
557+
- WebSocket streaming for real-time token delivery
558+
- Token usage tracking and tool call auditing
559+
560+
**Configuration:**
561+
```bash
562+
# Agent LLM Configuration
563+
# Model format: "provider:model-name" (e.g., anthropic:claude-sonnet-4-5)
564+
AGENT_DEFAULT_MODEL=anthropic:claude-sonnet-4-5
565+
AGENT_FALLBACK_MODEL=openai:gpt-4o
566+
AGENT_TEMPERATURE=0.1
567+
AGENT_MAX_TOKENS=4096
568+
569+
# API Keys (set based on your chosen provider)
570+
ANTHROPIC_API_KEY=sk-ant-your-key
571+
# OPENAI_API_KEY=sk-your-key
572+
# GOOGLE_API_KEY=your-google-api-key # For Gemini models
573+
574+
# Execution Configuration
575+
AGENT_MAX_TOOL_CALLS=10
576+
AGENT_TIMEOUT_SECONDS=120
577+
AGENT_RETRY_ATTEMPTS=3
578+
AGENT_RETRY_DELAY_SECONDS=1.0
579+
580+
# Session Configuration
581+
AGENT_SESSION_TTL_MINUTES=120
582+
AGENT_MAX_SESSIONS_PER_USER=5
583+
584+
# Human-in-the-loop Configuration (JSON array format)
585+
AGENT_REQUIRE_APPROVAL=["create_alias","archive_run"]
586+
AGENT_APPROVAL_TIMEOUT_MINUTES=60
587+
588+
# Streaming Configuration
589+
AGENT_ENABLE_STREAMING=true
590+
```
591+
510592
### Error Responses (RFC 7807)
511593

512594
All error responses follow RFC 7807 Problem Details format with `Content-Type: application/problem+json`:

alembic/env.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from app.core.database import Base
1313

1414
# Import all models for Alembic autogenerate detection
15+
from app.features.agents import models as agents_models # noqa: F401
1516
from app.features.data_platform import models as data_platform_models # noqa: F401
1617
from app.features.jobs import models as jobs_models # noqa: F401
1718
from app.features.rag import models as rag_models # noqa: F401

0 commit comments

Comments
 (0)