fix(eval): expose full graph/vector context for RAGAS and update mocks for async API by jmponcebe · Pull Request #3 · jmponcebe/PharmaGraphRAG

jmponcebe · 2026-04-21T09:40:03Z

Summary

Fix RAGAS evaluation so it scores the real context the LLM saw, not placeholder snippets, and migrate the metrics module to the RAGAS 0.4.x async API.

Background

/query was returning sources with snippets like "Knowledge graph data for {drug}" (a fixed string) and vector texts truncated to 200 chars. The runner fed those snippets to RAGAS, so Faithfulness scored around 0.12 on classic mode even when the LLM answer was fully grounded in the real graph/vector context.

Changes

API: QueryResponse now includes graph_context and vector_context with the full text the LLM received. sources[] is kept for UI compatibility.
Runner: _call_classic prefers the new full context fields; falls back to sources[].snippet for backward compatibility.
Metrics: migrated to RAGAS 0.4.x using SingleTurnSample + single_turn_ascore wrapped in asyncio.run. LLM via LangchainLLMWrapper(ChatOpenAI(...)) against Gemini's OpenAI-compatible endpoint. Embeddings via GoogleGenerativeAIEmbeddings (the OpenAI-compat endpoint returns 501 for embeddings). Explicit AnswerSimilarity(embeddings=...) for AnswerCorrectness. max_tokens raised to 8192 so Faithfulness and AnswerCorrectness don't truncate.
Tests: mocks updated to AsyncMock on single_turn_ascore; added one case that verifies _call_classic prefers full contexts over snippets.

Validation

Ran --limit 3 against local API on this branch:

Metric	Classic before	Classic after
Faithfulness	0.12	0.94
Context Precision	0.22	0.83

All 43 tests in test_evaluation.py and 18 tests in test_api.py pass.

…s for async API - api/main.py + api/models.py: add graph_context and vector_context fields to QueryResponse so evaluation runners can score the actual text the LLM saw instead of placeholder snippets like 'Knowledge graph data for {drug}' - evaluation/runner.py: prefer full graph_context/vector_context with graceful fallback to sources[].snippet for backward compatibility - evaluation/metrics.py: migrate to RAGAS 0.4.x async API using single_turn_ascore + SingleTurnSample, LangchainLLMWrapper for the Gemini OpenAI-compatible endpoint, and GoogleGenerativeAIEmbeddings for embeddings (OpenAI-compat returns 501 for embeddings) - metrics.py: max_tokens raised to 8192 to avoid truncation in Faithfulness and AnswerCorrectness multi-statement prompts - metrics.py: explicit AnswerSimilarity(embeddings=...) wiring for AnswerCorrectness - tests/test_evaluation.py: update mocks to AsyncMock on single_turn_ascore, add test for full context priority in _call_classic

Copilot

Pull request overview

This PR fixes RAGAS evaluation so metrics are computed against the actual graph/vector context provided to the LLM (instead of placeholder/truncated snippets), and updates the evaluation metrics wrapper to the RAGAS 0.4.x async scoring API.

Changes:

Expose full graph_context / vector_context in POST /query responses and prefer them in the evaluation runner (with backward-compatible fallback to sources[].snippet).
Migrate metric scoring to RAGAS 0.4.x using SingleTurnSample + single_turn_ascore via asyncio.run.
Update evaluation tests to mock async metric scoring and add coverage for the “prefer full contexts” behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/test_evaluation.py`	Updates mocks to `AsyncMock` for async metric API; adds test ensuring runner prefers full contexts over snippets.
`src/pharmagraphrag/evaluation/runner.py`	Prefers `graph_context`/`vector_context` in `_call_classic`, falling back to snippets for older API responses.
`src/pharmagraphrag/evaluation/metrics.py`	Migrates scoring to RAGAS 0.4.x async API; switches evaluator LLM wrapper to LangChain; updates embeddings to use Gemini native embeddings.
`src/pharmagraphrag/api/models.py`	Extends `QueryResponse` schema with full `graph_context`/`vector_context` fields.
`src/pharmagraphrag/api/main.py`	Populates `graph_context`/`vector_context` in `/query` response from retrieved context.

Copilot · 2026-04-21T09:43:44Z

+    from langchain_openai import ChatOpenAI
+    from ragas.llms import LangchainLLMWrapper

    api_key = os.environ.get("GEMINI_API_KEY", "")
    if not api_key:
        raise ValueError("GEMINI_API_KEY env var is required for RAGAS evaluation")

-    client = OpenAI(
+    chat = ChatOpenAI(
+        model=model,
        api_key=api_key,
        base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
+        max_tokens=8192,
+        temperature=0.0,
    )
-    return llm_factory(model, provider="openai", client=client)
+    return LangchainLLMWrapper(chat)


_get_evaluator_llm() now imports langchain_openai.ChatOpenAI, but langchain-openai is not listed in pyproject.toml dependencies. A fresh install will fail at runtime when running evaluation. Add langchain-openai (and ensure compatible openai dependency) to the project dependencies or switch to an evaluator LLM wrapper that’s already included in the dependency set.

Copilot · 2026-04-21T09:43:44Z

            has_graph_context=result.context.has_graph,
            has_vector_context=result.context.has_vector,
            sources=sources,
+            graph_context=result.context.graph_context or "",
+            vector_context=result.context.vector_context or "",


QueryResponse now always includes graph_context/vector_context strings. These can be very large (especially vector_context), increasing response payload size/latency for every /query call even when the UI only needs sources. Consider gating these fields behind an opt-in request flag (e.g. include_full_context: bool = False) or only populating them when running in evaluation/debug mode.

Copilot · 2026-04-21T09:43:45Z

+    graph_context: str = Field(
+        "", description="Full graph context passed to the LLM (for evaluation/debugging)."
+    )
+    vector_context: str = Field(
+        "", description="Full vector context passed to the LLM (for evaluation/debugging)."
+    )


Adding full graph_context/vector_context to the response schema is useful for evaluation, but since these fields may contain large texts, it would help API consumers if the schema clearly indicates when they are populated (or if they’re optional/omitted by default). Consider making them optional (None) unless explicitly requested, or adding an explicit include_full_context request parameter to avoid unintentionally large responses.

Addresses PR review feedback: - Add include_full_context flag (default False) to QueryRequest so large graph/vector context is opt-in - Make graph_context/vector_context optional (None by default) in QueryResponse - Update /query endpoint to respect the flag - Update evaluation runner to set include_full_context=True when calling classic - Add langchain-openai dependency (used by RAGAS metrics via OpenAI-compatible endpoint)

Uses dorny/paths-filter to detect changes in docker/, pyproject.toml, uv.lock, requirements.txt or ci.yml. PRs that only touch Python code will skip the ~6-10 min Docker build. To force a build, include '[docker]' in the commit message.

Copilot AI review requested due to automatic review settings April 21, 2026 09:40

Copilot started reviewing on behalf of jmponcebe April 21, 2026 09:40 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

jmponcebe added 2 commits April 21, 2026 11:54

jmponcebe merged commit 8abd9f4 into main Apr 21, 2026
4 checks passed

jmponcebe deleted the fix/eval-context-and-ragas-async branch April 21, 2026 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): expose full graph/vector context for RAGAS and update mocks for async API#3

fix(eval): expose full graph/vector context for RAGAS and update mocks for async API#3
jmponcebe merged 3 commits into
mainfrom
fix/eval-context-and-ragas-async

jmponcebe commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmponcebe commented Apr 21, 2026

Summary

Background

Changes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants