Skip to content

fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219

Closed
qizwiz wants to merge 652 commits into
EverMind-AI:mainfrom
qizwiz:fix/guard-empty-llm-response
Closed

fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219
qizwiz wants to merge 652 commits into
EverMind-AI:mainfrom
qizwiz:fix/guard-empty-llm-response

Conversation

@qizwiz
Copy link
Copy Markdown

@qizwiz qizwiz commented May 18, 2026

What

Add guards at three LLM evaluation call sites in the EvoAgentBench domain evaluators before accessing choices[0].message.content.

Why

client.chat.completions.create() can return two empty-response shapes:

  1. choices = [] — on content-policy rejections, rate-limit errors, or provider failures
  2. choices[0].message = None — e.g. Gemini 2.5 Flash (via OpenAI-compatible endpoint) returns HTTP 200 with finish_reason: PROHIBITED_CONTENT and message=None

Both crash with IndexError or AttributeError. The existing try/except blocks catch these as generic "LLM evaluation failed: list index out of range" errors, making benchmark runs hard to diagnose.

Files changed

File Fix
benchmarks/EvoAgentBench/src/domains/information_retrieval/judge.py Guard before resp.choices[0].message.content or ""
benchmarks/EvoAgentBench/src/domains/knowledge_work/evaluate.py Guard before resp.choices[0].message.content
benchmarks/EvoAgentBench/src/domains/reasoning/evaluate.py Guard before response.choices[0].message.content
# Before
eval_text = resp.choices[0].message.content

# After
if not resp.choices or resp.choices[0].message is None:
    raise ValueError("LLM returned empty or filtered response")
eval_text = resp.choices[0].message.content

Corpus context

Detected by pact (llm_response_unguarded mode), a Z3-verified static analyzer for LLM crash vectors. This pattern was found across 13.8k violations in 800+ repos.

hui zhang and others added 30 commits December 25, 2025 13:35
Feat/profilev2

See merge request npc-work/aic/ai/evermemos-opensource!61
Feat/update demo

See merge request npc-work/aic/ai/evermemos-opensource!62
Use self-deployed embedding and rerank APIs by default

See merge request npc-work/aic/ai/evermemos-opensource!64
vLLM Rerank API adopts an instruction-tuned approach

See merge request npc-work/aic/ai/evermemos-opensource!65
shallyan and others added 27 commits April 16, 2026 14:11
Update EverMemOS: optimize search perf, improve skill search
chore: add devcontainer configuration and development tooling
Fix/unify port in demo and docs to the default 1995
Update the GitHub asset URLs for all banner images to ensure they point to the correct and current locations. This includes fixing a typo in a section title from "LAI Wearable" to "AI Wearable".
Update the asset IDs for the banner GIFs in the README to point to the correct new assets.
…ind-AI#186)

* feat: add game of throne demo and claude code plugin use cases

Add two new use cases to the repository:
1. Game of Thrones Story Memory Demo - A full-stack web application demonstrating EverMem's memory capabilities through a side-by-side comparison interface for book Q&A
2. Claude Code Plugin - A memory plugin for Claude Code that automatically stores and retrieves context from past coding sessions

The demo includes React frontend, Express backend, Docker configurations, and novel loading scripts. The plugin provides hooks for automatic memory injection and search capabilities.

* docs: update readme links to use relative paths for use cases
Update the banner image URL in the README file to point to the new asset location.
…erMind-AI#192)

server.ts used 8001 but the EverMemOS server default is 1995.
This broke local demo runs unless EVERMEMOS_URL was manually set.

Fixes EverMind-AI#28

Co-authored-by: pazyork <pazyorkcc@gmail.com>
Skip tool call/response msg in profile generation
* feat(use-cases): add OpenHer persona engine with EverMemOS integration

OpenHer is an AI Being engine that creates personas with emergent
personality, emotional thermodynamics, and long-term memory.

EverMemOS integration:
- 4D relationship vector (depth, valence, trust, foresight)
  expands neural network perception from 8D to 12D
- Async two-stage memory retrieval (fire on Turn N,
  collect on Turn N+1) with 500ms timeout + graceful fallback
- Semi-emergent relationship EMA blending EverMemOS priors
  with LLM-judged deltas per turn
- Fire-and-forget turn storage via asyncio.create_task

Includes:
- README with architecture diagrams and integration walkthrough
- Runnable demo with simulation mode (no EverMemOS needed)
- Core integration code: mixin, types, context features
- .env.example with placeholder values

Repo: https://github.com/kellyvv/OpenHer

* docs: rewrite README — storytelling style, emotion-first

* docs: English README, storytelling style, no emoji

* rename: openher-persona-engine → openher

---------

Co-authored-by: kellyzxiaowei <129767595+kellyzxiaowei@users.noreply.github.com>
* chore: rename project from evermemos to EverCore

This commit renames the project directory and updates all internal references from "evermemos" to "EverCore". The changes include:
- Renaming the main directory from `methods/evermemos` to `methods/EverCore`
- Updating all import paths and module references
- Maintaining the same code structure and functionality
- Adding new configuration files (.vscode/settings.json, .pylintrc, pyrightconfig.json)
- Updating Dockerfile and project metadata

* docs: update references from evermemos to EverCore

Update documentation files to reflect the renaming of the 'evermemos' directory to 'EverCore'. This includes fixing clone commands, directory paths, and documentation links across multiple files to ensure consistency and correct navigation for users.

* chore: rename EverMemOS to EverCore across codebase

This is a project-wide rebranding from EverMemOS to EverCore. The changes include:
- Update project name in source files, documentation, and configuration
- Rename API references, environment variables, and service names
- Modify demo descriptions and benchmark configurations
- Update URLs and citations to reflect new project identity

All functionality remains identical; only naming has changed to align with the new project branding.

* docs: update README with EverCore focus and restructured TOC

- Add line break before Table of Contents for better visual separation
- Rewrite project description to highlight EverCore as the central component
- Reorder directory tree to prioritize benchmarks and methods over use-cases
- Update use-cases list with more examples and clarify they are templates
- Improve flow from Quick Start to use-cases to benchmarks

* docs: update README with clearer methods description and benchmarks

Add benchmark numbers directly in the method summaries for better visibility.
Clarify introductory text to emphasize choice and composition of methods.

* docs: fix markdown formatting in README table of contents

Adjust whitespace and line breaks to ensure proper rendering of the collapsible table of contents section.
…d-AI#204)

- Replace specific EverMemBench-Dynamic badge with general EverMind-AI HuggingFace badge
- Remove redundant License badge
- Change "Methods" section heading to "Architecture Methods"
- Update sub-section headings from h4 (####) to h3 (###) for better hierarchy
…rMind-AI#208)

* docs: restructure README and add AGENTS.md for better navigation

- Reorder sections to emphasize architecture methods and use cases
- Move use cases section before quick start for better flow
- Rename "Methods" to "Architecture Methods" for clarity
- Add AGENTS.md with quick commands and key entry points
- Update section headers to improve document hierarchy
- Maintain all existing content while improving organization

* docs: add community and contribution files

* docs: reorder README directory tree for logical grouping

* docs: move community files to .github/ and update references

* ci: change deploy workflow trigger from feature branch to main
* docs: restructure README and add subdirectory guides

Move the directory tree from the main README to new dedicated README files for each top-level folder (use-cases, methods, benchmarks). Add detailed introductions and tables to guide users to the appropriate subprojects. This improves navigation and provides clear entry points for different use cases.

* docs: expand showcase section with new projects and links

Add six new project entries to the README showcase, each with a banner image, description, and code/plugin link. Also update an existing benchmark entry to include a dataset link. This enhances the repository's demonstration of real-world applications and available resources.
* docs(readme): update project links and formatting

* docs(use-cases): enhance README with visual catalogue of demos

Expand the use cases section from a simple table to a detailed visual catalogue with project banners, descriptions, and links. This improves user engagement and provides a better showcase of community integrations and demos.

* docs: update READMEs and add validation for use-case links
…eval calls

client.chat.completions.create() can return choices=[] on content-policy
rejections or provider errors, and choices[0].message=None on filtered
responses (e.g. Gemini PROHIBITED_CONTENT via OpenAI-compatible endpoint).
Both crash with IndexError/AttributeError. The existing try/except blocks
catch these as generic 'LLM evaluation failed' errors, making them hard
to diagnose. Explicit guards surface the root cause clearly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.