feat(ast): introduce Global Analyzer agent and rank aggregation for multi-metric optimization by Aidaarka · Pull Request #3 · amazon-science/CriSPO

Aidaarka · 2026-05-18T18:24:09Z

Overview

This Pull Request introduces a Global Analyzer module and Rank Aggregation to the Trainer pipeline, significantly enhancing the Automatic Suffix Tuning (AST) process.

While the original CriSPO architecture excels at single-prompt local feedback via the Critique module, it lacks a mechanism to evaluate cross-candidate trade-offs during multi-metric optimization (e.g., balancing ROUGE vs. AlignScore). This PR addresses this limitation. The new Analyzer explicitly evaluates the top-k candidates simultaneously, diagnoses metric conflicts, and generates a global guidance string to stabilize the meta-optimizer and prevent task drift.

Key Changes in the Diff

1. Core Trainer Enhancements (`crispo/trainer/trainer.py`)

Rank Aggregation: Added logic within the evaluate method to compute a mean rank (rank_score) when using a MetricDict. This mathematically penalizes prompts that overfit to a single metric.
Global Analyzer Flow: Injected the update_analyzer execution block inside the main fit loop. It isolates the top-performing prompt records of the current step and processes them to extract cross-metric trade-offs.
Meta-Prompt Injection: Updated the fill_in_meta_prompt method to accept the analyzer_analysis string, appending this global strategy to the instructions sent to the optimizer LLM.
New Parameters: Added analyzer_prompt, analyzer_top_k, and analyzer_num_examples to Trainer.fit() to allow flexible control over the analysis phase.

2. Analyzer Interfaces (`crispo/task/analyzer.py`)

Introduced the AnalyzerPrompt abstract base class, establishing a strict interface (fill and parse methods) for structured global feedback extraction.

3. Task-Specific Analyzer Prompts

AST Suffix Tuning (crispo/task/ast/analyzer_suffix.py): Implemented SuffixAnalyzerPrompt to instruct the LLM to identify recurring failure patterns and output actionable suggestions safely inside <analysis> XML tags.
Summarization Critique (experiments/summarization/ast/critique/analyzer_prompt.py): Implemented a domain-specific analyzer prompt tailored for abstractive summarization trade-offs.

Impact & Benefits

Mitigates Task Drift: By explicitly instructing the meta-optimizer on why certain trade-offs occur, the system avoids oscillating between conflicting metrics.
Faster Convergence: Substantially reduces the number of iterative search steps required to discover Pareto-optimal prompts in multi-objective scenarios.

Testing & Validation

Tested the Trainer.fit loop locally with the analyzer_prompt argument enabled.
Validated the rank aggregation logic for multiple metrics.
Verified fallback parsing mechanisms to gracefully handle instances where the LLM omits the <analysis> XML tags during generation.

…metric optimization

…ulti-metric optimization

Aidaarka added 2 commits May 17, 2026 21:31

feat: introduce Global Analyzer agent and rank aggregation for multi-…

e6d6b82

…metric optimization

feat(ast): introduce Global Analyzer agent and rank aggregation for m…

0bcb7d8

…ulti-metric optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ast): introduce Global Analyzer agent and rank aggregation for multi-metric optimization#3

feat(ast): introduce Global Analyzer agent and rank aggregation for multi-metric optimization#3
Aidaarka wants to merge 2 commits into
amazon-science:mainfrom
Aidaarka:feature/global-analyzer-ast

Aidaarka commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aidaarka commented May 18, 2026

Overview

Key Changes in the Diff

1. Core Trainer Enhancements (crispo/trainer/trainer.py)

2. Analyzer Interfaces (crispo/task/analyzer.py)

3. Task-Specific Analyzer Prompts

Impact & Benefits

Testing & Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Core Trainer Enhancements (`crispo/trainer/trainer.py`)

2. Analyzer Interfaces (`crispo/task/analyzer.py`)