Skip to content

feat(ast): introduce Global Analyzer agent and rank aggregation for multi-metric optimization#3

Open
Aidaarka wants to merge 2 commits into
amazon-science:mainfrom
Aidaarka:feature/global-analyzer-ast
Open

feat(ast): introduce Global Analyzer agent and rank aggregation for multi-metric optimization#3
Aidaarka wants to merge 2 commits into
amazon-science:mainfrom
Aidaarka:feature/global-analyzer-ast

Conversation

@Aidaarka
Copy link
Copy Markdown

Overview

This Pull Request introduces a Global Analyzer module and Rank Aggregation to the Trainer pipeline, significantly enhancing the Automatic Suffix Tuning (AST) process.

While the original CriSPO architecture excels at single-prompt local feedback via the Critique module, it lacks a mechanism to evaluate cross-candidate trade-offs during multi-metric optimization (e.g., balancing ROUGE vs. AlignScore). This PR addresses this limitation. The new Analyzer explicitly evaluates the top-k candidates simultaneously, diagnoses metric conflicts, and generates a global guidance string to stabilize the meta-optimizer and prevent task drift.

Key Changes in the Diff

1. Core Trainer Enhancements (crispo/trainer/trainer.py)

  • Rank Aggregation: Added logic within the evaluate method to compute a mean rank (rank_score) when using a MetricDict. This mathematically penalizes prompts that overfit to a single metric.
  • Global Analyzer Flow: Injected the update_analyzer execution block inside the main fit loop. It isolates the top-performing prompt records of the current step and processes them to extract cross-metric trade-offs.
  • Meta-Prompt Injection: Updated the fill_in_meta_prompt method to accept the analyzer_analysis string, appending this global strategy to the instructions sent to the optimizer LLM.
  • New Parameters: Added analyzer_prompt, analyzer_top_k, and analyzer_num_examples to Trainer.fit() to allow flexible control over the analysis phase.

2. Analyzer Interfaces (crispo/task/analyzer.py)

  • Introduced the AnalyzerPrompt abstract base class, establishing a strict interface (fill and parse methods) for structured global feedback extraction.

3. Task-Specific Analyzer Prompts

  • AST Suffix Tuning (crispo/task/ast/analyzer_suffix.py): Implemented SuffixAnalyzerPrompt to instruct the LLM to identify recurring failure patterns and output actionable suggestions safely inside <analysis> XML tags.
  • Summarization Critique (experiments/summarization/ast/critique/analyzer_prompt.py): Implemented a domain-specific analyzer prompt tailored for abstractive summarization trade-offs.

Impact & Benefits

  • Mitigates Task Drift: By explicitly instructing the meta-optimizer on why certain trade-offs occur, the system avoids oscillating between conflicting metrics.
  • Faster Convergence: Substantially reduces the number of iterative search steps required to discover Pareto-optimal prompts in multi-objective scenarios.

Testing & Validation

  • Tested the Trainer.fit loop locally with the analyzer_prompt argument enabled.
  • Validated the rank aggregation logic for multiple metrics.
  • Verified fallback parsing mechanisms to gracefully handle instances where the LLM omits the <analysis> XML tags during generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant