fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219
Closed
qizwiz wants to merge 652 commits into
Closed
fix(benchmarks): guard against empty choices and message=None in LLM eval calls#219qizwiz wants to merge 652 commits into
qizwiz wants to merge 652 commits into
Conversation
Feat/profilev2 See merge request npc-work/aic/ai/evermemos-opensource!61
Feat/update demo See merge request npc-work/aic/ai/evermemos-opensource!62
Use self-deployed embedding and rerank APIs by default See merge request npc-work/aic/ai/evermemos-opensource!64
vLLM Rerank API adopts an instruction-tuned approach See merge request npc-work/aic/ai/evermemos-opensource!65
Update EverMemOS: optimize search perf, improve skill search
chore: add devcontainer configuration and development tooling
Fix/unify port in demo and docs to the default 1995
Update the GitHub asset URLs for all banner images to ensure they point to the correct and current locations. This includes fixing a typo in a section title from "LAI Wearable" to "AI Wearable".
Update the asset IDs for the banner GIFs in the README to point to the correct new assets.
…ind-AI#186) * feat: add game of throne demo and claude code plugin use cases Add two new use cases to the repository: 1. Game of Thrones Story Memory Demo - A full-stack web application demonstrating EverMem's memory capabilities through a side-by-side comparison interface for book Q&A 2. Claude Code Plugin - A memory plugin for Claude Code that automatically stores and retrieves context from past coding sessions The demo includes React frontend, Express backend, Docker configurations, and novel loading scripts. The plugin provides hooks for automatic memory injection and search capabilities. * docs: update readme links to use relative paths for use cases
Update the banner image URL in the README file to point to the new asset location.
…erMind-AI#192) server.ts used 8001 but the EverMemOS server default is 1995. This broke local demo runs unless EVERMEMOS_URL was manually set. Fixes EverMind-AI#28 Co-authored-by: pazyork <pazyorkcc@gmail.com>
Skip tool call/response msg in profile generation
* feat(use-cases): add OpenHer persona engine with EverMemOS integration OpenHer is an AI Being engine that creates personas with emergent personality, emotional thermodynamics, and long-term memory. EverMemOS integration: - 4D relationship vector (depth, valence, trust, foresight) expands neural network perception from 8D to 12D - Async two-stage memory retrieval (fire on Turn N, collect on Turn N+1) with 500ms timeout + graceful fallback - Semi-emergent relationship EMA blending EverMemOS priors with LLM-judged deltas per turn - Fire-and-forget turn storage via asyncio.create_task Includes: - README with architecture diagrams and integration walkthrough - Runnable demo with simulation mode (no EverMemOS needed) - Core integration code: mixin, types, context features - .env.example with placeholder values Repo: https://github.com/kellyvv/OpenHer * docs: rewrite README — storytelling style, emotion-first * docs: English README, storytelling style, no emoji * rename: openher-persona-engine → openher --------- Co-authored-by: kellyzxiaowei <129767595+kellyzxiaowei@users.noreply.github.com>
* chore: rename project from evermemos to EverCore This commit renames the project directory and updates all internal references from "evermemos" to "EverCore". The changes include: - Renaming the main directory from `methods/evermemos` to `methods/EverCore` - Updating all import paths and module references - Maintaining the same code structure and functionality - Adding new configuration files (.vscode/settings.json, .pylintrc, pyrightconfig.json) - Updating Dockerfile and project metadata * docs: update references from evermemos to EverCore Update documentation files to reflect the renaming of the 'evermemos' directory to 'EverCore'. This includes fixing clone commands, directory paths, and documentation links across multiple files to ensure consistency and correct navigation for users. * chore: rename EverMemOS to EverCore across codebase This is a project-wide rebranding from EverMemOS to EverCore. The changes include: - Update project name in source files, documentation, and configuration - Rename API references, environment variables, and service names - Modify demo descriptions and benchmark configurations - Update URLs and citations to reflect new project identity All functionality remains identical; only naming has changed to align with the new project branding. * docs: update README with EverCore focus and restructured TOC - Add line break before Table of Contents for better visual separation - Rewrite project description to highlight EverCore as the central component - Reorder directory tree to prioritize benchmarks and methods over use-cases - Update use-cases list with more examples and clarify they are templates - Improve flow from Quick Start to use-cases to benchmarks * docs: update README with clearer methods description and benchmarks Add benchmark numbers directly in the method summaries for better visibility. Clarify introductory text to emphasize choice and composition of methods. * docs: fix markdown formatting in README table of contents Adjust whitespace and line breaks to ensure proper rendering of the collapsible table of contents section.
…d-AI#204) - Replace specific EverMemBench-Dynamic badge with general EverMind-AI HuggingFace badge - Remove redundant License badge - Change "Methods" section heading to "Architecture Methods" - Update sub-section headings from h4 (####) to h3 (###) for better hierarchy
…rMind-AI#208) * docs: restructure README and add AGENTS.md for better navigation - Reorder sections to emphasize architecture methods and use cases - Move use cases section before quick start for better flow - Rename "Methods" to "Architecture Methods" for clarity - Add AGENTS.md with quick commands and key entry points - Update section headers to improve document hierarchy - Maintain all existing content while improving organization * docs: add community and contribution files * docs: reorder README directory tree for logical grouping * docs: move community files to .github/ and update references * ci: change deploy workflow trigger from feature branch to main
* docs: restructure README and add subdirectory guides Move the directory tree from the main README to new dedicated README files for each top-level folder (use-cases, methods, benchmarks). Add detailed introductions and tables to guide users to the appropriate subprojects. This improves navigation and provides clear entry points for different use cases. * docs: expand showcase section with new projects and links Add six new project entries to the README showcase, each with a banner image, description, and code/plugin link. Also update an existing benchmark entry to include a dataset link. This enhances the repository's demonstration of real-world applications and available resources.
* docs(readme): update project links and formatting * docs(use-cases): enhance README with visual catalogue of demos Expand the use cases section from a simple table to a detailed visual catalogue with project banners, descriptions, and links. This improves user engagement and provides a better showcase of community integrations and demos. * docs: update READMEs and add validation for use-case links
…eval calls client.chat.completions.create() can return choices=[] on content-policy rejections or provider errors, and choices[0].message=None on filtered responses (e.g. Gemini PROHIBITED_CONTENT via OpenAI-compatible endpoint). Both crash with IndexError/AttributeError. The existing try/except blocks catch these as generic 'LLM evaluation failed' errors, making them hard to diagnose. Explicit guards surface the root cause clearly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add guards at three LLM evaluation call sites in the EvoAgentBench domain evaluators before accessing
choices[0].message.content.Why
client.chat.completions.create()can return two empty-response shapes:choices = []— on content-policy rejections, rate-limit errors, or provider failureschoices[0].message = None— e.g. Gemini 2.5 Flash (via OpenAI-compatible endpoint) returns HTTP 200 withfinish_reason: PROHIBITED_CONTENTandmessage=NoneBoth crash with
IndexErrororAttributeError. The existingtry/exceptblocks catch these as generic"LLM evaluation failed: list index out of range"errors, making benchmark runs hard to diagnose.Files changed
benchmarks/EvoAgentBench/src/domains/information_retrieval/judge.pyresp.choices[0].message.content or ""benchmarks/EvoAgentBench/src/domains/knowledge_work/evaluate.pyresp.choices[0].message.contentbenchmarks/EvoAgentBench/src/domains/reasoning/evaluate.pyresponse.choices[0].message.contentCorpus context
Detected by
pact(llm_response_unguardedmode), a Z3-verified static analyzer for LLM crash vectors. This pattern was found across 13.8k violations in 800+ repos.