Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
adc3f61
Set and clear DynamoPrefixContext for workflow KV optimization
dnandakumar-nv Jan 23, 2026
ffd3d28
Add tests for DynamoPrefixContext integration in Runner
dnandakumar-nv Jan 23, 2026
c35228a
Merge branch 'develop' into dynamic-inference-headers
dnandakumar-nv Jan 23, 2026
360288d
Add design doc for prediction trie inference routing
dnandakumar-nv Jan 23, 2026
80df59f
docs: add prediction trie implementation plan
dnandakumar-nv Jan 23, 2026
64bfc21
feat(profiler): add prediction trie data models
dnandakumar-nv Jan 24, 2026
ee00b66
feat(profiler): add MetricsAccumulator for prediction trie
dnandakumar-nv Jan 24, 2026
325afac
feat(profiler): add PredictionTrieBuilder
dnandakumar-nv Jan 24, 2026
932f674
feat(profiler): add PredictionTrieLookup
dnandakumar-nv Jan 24, 2026
d62fc1e
feat(profiler): add prediction trie serialization
dnandakumar-nv Jan 24, 2026
f916957
feat(llm): add LLMCallTracker for runtime prediction lookups
dnandakumar-nv Jan 24, 2026
381f0af
feat(profiler): integrate prediction trie generation
dnandakumar-nv Jan 24, 2026
ee166c8
feat(llm): add prediction header injection to Dynamo client
dnandakumar-nv Jan 24, 2026
7b8931c
feat(llm): add prediction_trie_path config to DynamoModelConfig
dnandakumar-nv Jan 24, 2026
52fb243
test(profiler): add end-to-end prediction trie test
dnandakumar-nv Jan 24, 2026
6d36b20
docs: add runtime prediction trie integration design
dnandakumar-nv Jan 24, 2026
1e5d370
docs: add runtime prediction trie implementation plan
dnandakumar-nv Jan 24, 2026
6137fb7
feat(context): add function_path_stack ContextVar to ContextState
dnandakumar-nv Jan 24, 2026
b91d1f2
feat(context): update push_active_function to track function path stack
dnandakumar-nv Jan 24, 2026
d4f02f2
feat(step_manager): increment LLM call tracker on LLM_START events
dnandakumar-nv Jan 24, 2026
fa7830d
Add dynamic prediction hook for runtime trie lookups
dnandakumar-nv Jan 24, 2026
bd40ae0
fix(test): include prediction_trie_path in dynamo field names test
dnandakumar-nv Jan 24, 2026
2ec7d7c
Add prediction_lookup parameter to create_httpx_client_with_dynamo_hooks
dnandakumar-nv Jan 24, 2026
3a7e42e
Add end-to-end integration test for runtime prediction trie
dnandakumar-nv Jan 24, 2026
912245a
docs: add prediction trie example config design
dnandakumar-nv Jan 24, 2026
3c6f65d
feat(examples): add prediction trie example configs and docs
dnandakumar-nv Jan 24, 2026
4f09ca3
Refactor header injection with dynamic prediction logic
dnandakumar-nv Jan 24, 2026
a1ab80c
Refactor Dynamo prefix handling to centralize logic.
dnandakumar-nv Jan 24, 2026
2fc6a09
Refactor DynamoPrefixContext for depth-aware prefix handling
dnandakumar-nv Jan 24, 2026
37ad4a4
Refactor DynamoPrefixContext for depth-aware prefix handling
dnandakumar-nv Jan 24, 2026
e4f7893
Merge branch 'develop' into dynamic-inference-headers
dnandakumar-nv Jan 24, 2026
4cbd4e6
Refactor DynamoPrefixContext for depth-aware prefix handling
dnandakumar-nv Jan 24, 2026
a611564
Merge remote-tracking branch 'origin/dynamic-inference-headers' into …
dnandakumar-nv Jan 24, 2026
fa79197
Remove DynamoPrefixContext handling in Runner class
dnandakumar-nv Jan 24, 2026
5bec15d
Merge remote-tracking branch 'upstream/develop' into dynamic-inferenc…
dnandakumar-nv Jan 26, 2026
2cba23c
Add Apache 2.0 license headers to source and test files
dnandakumar-nv Jan 26, 2026
7d2c087
Add "Trie(s)" to accepted vocabulary list
dnandakumar-nv Jan 26, 2026
727f564
Update README and test files for clarity and consistency
dnandakumar-nv Jan 26, 2026
74c191d
Fix formatting of `job_id` in README_PREDICTION_TRIE.md
dnandakumar-nv Jan 26, 2026
c412dc8
Add Apache 2.0 license headers to test files
dnandakumar-nv Jan 26, 2026
22327b9
Refactor imports for PredictionTrieLookup across modules
dnandakumar-nv Jan 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/vale/styles/config/vocabularies/nat/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ Tavily
[Tt]imestamp(s?)
[Tt]okenization
[Tt]okenizer(s?)
[Tt]rie(s?)
triages
[Uu]ncomment(ed)?
[Uu]nencrypted
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
<!--
Copyright (c) 2025-2026, NVIDIA CORPORATION

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- path-check-skip-file -->

# Prediction Trie Optimization for Dynamo

Use profiled execution data to inject accurate per-call prediction headers instead of static guesses.

## Overview

The prediction trie enables **dynamic header injection** for Dynamo's KV-aware routing. Instead of using static values like `prefix_total_requests=10` for every call, the trie provides accurate predictions based on:
- **Function path**: Where in the agent hierarchy the call originates (e.g., `["react_workflow", "react_agent"]`)
- **Call index**: Which LLM call this is within the current function (1st, 2nd, 3rd, etc.)

This allows Dynamo's Thompson Sampling router to make better worker assignment decisions.

## Quick Start

### Phase 1: Build the Prediction Trie

Run profiling to collect execution data and build the trie:

```bash
nat eval --config_file configs/profile_rethinking_full_test.yml
```

**Output location:**
```
outputs/dynamo_evals/rethinking_full_test_for_profiling/<job_id>/prediction_trie.json
```

### Phase 2: Run with Predictions

1. **Update the trie path** in `configs/run_with_prediction_trie.yml`:
```yaml
prediction_trie_path: ./examples/dynamo_integration/react_benchmark_agent/outputs/dynamo_evals/rethinking_full_test_for_profiling/<YOUR_JOB_ID>/prediction_trie.json
```

2. **Run with dynamic predictions:**
```bash
nat eval --config_file configs/run_with_prediction_trie.yml
```

## How It Works

### During Profiling (Phase 1)

The profiler collects data for each LLM call:
- Function path at time of call
- Call index within the parent function
- Output tokens generated
- Time until the next LLM call
- Remaining LLM calls in the workflow

This data is aggregated into a trie structure with statistical summaries (mean, p50, p90, etc.) at each node.

### During Execution (Phase 2)

For each LLM request:
1. Read the current function path from context
2. Read the call index from the LLM call tracker
3. Look up the prediction in the trie
4. Inject headers into the HTTP request

### Fallback Chain

If an exact match isn't found, the trie lookup falls back:
1. Exact path + exact call index (most specific)
2. Exact path + any call index
3. Partial path + exact call index
4. Root aggregated stats (most general)

This ensures predictions are always available, even for novel execution paths.

## Headers Injected

| Header | Source | Description |
|--------|--------|-------------|
| `x-nat-remaining-llm-calls` | `prediction.remaining_calls.mean` | Expected remaining LLM calls in workflow |
| `x-nat-interarrival-ms` | `prediction.interarrival_ms.mean` | Expected milliseconds until next call |
| `x-nat-expected-output-tokens` | `prediction.output_tokens.p90` | Expected output tokens (90th percentile) |

## Comparing Results

To measure the impact of prediction trie vs static headers:

1. **Run with static headers** (baseline):
```bash
nat eval --config_file configs/eval_config_rethinking_full_test.yml
```

2. **Run with prediction trie**:
```bash
nat eval --config_file configs/run_with_prediction_trie.yml
```

3. **Compare metrics**:
- `avg_llm_latency`: Lower is better
- `avg_workflow_runtime`: Lower is better
- Look for improvements in KV cache hit rates in Dynamo logs

## Configuration Reference

### Profiler Configuration (Phase 1)

Enable trie building in the profiler section:

```yaml
profiler:
prediction_trie:
enable: true
output_filename: prediction_trie.json # default
```

### LLM Configuration (Phase 2)

Add the trie path to your Dynamo LLM config:

```yaml
llms:
dynamo_llm:
_type: dynamo
prefix_template: "react-benchmark-{uuid}"

# Static fallbacks (used if trie lookup fails)
prefix_total_requests: 10
prefix_osl: MEDIUM
prefix_iat: MEDIUM

# Dynamic predictions from profiled data
prediction_trie_path: /path/to/prediction_trie.json
```

## Troubleshooting

### "Prediction trie file not found"

The trie file doesn't exist at the configured path. Check:
- Did Phase 1 profiling complete successfully?
- Is the `job_id` in the path correct?
- Is the path relative to where you're running the command?

### "No prediction found for path"

This is normal - it means the trie is using fallback predictions. The trie will fall back to more general predictions when exact matches aren't found.

### Headers not being injected

Ensure:
- `prefix_template` is set (required for Dynamo hooks)
- `prediction_trie_path` points to a valid trie file
- You're using the `dynamo` LLM type

## Files

| File | Purpose |
|------|---------|
| `configs/profile_rethinking_full_test.yml` | Phase 1: Profile and build trie |
| `configs/run_with_prediction_trie.yml` | Phase 2: Run with dynamic predictions |
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,11 @@ eval:
concurrency_spike_analysis:
enable: true
spike_threshold: 24 # Alert when concurrent functions >= 24
# Build prediction trie for dynamic Dynamo header injection
# Output: prediction_trie.json in the output directory
# Use with run_with_prediction_trie.yml for optimized routing
prediction_trie:
enable: true

evaluators:
tool_selection_quality:
Expand Down
Loading