100-question 6-dimension long-conversation memory benchmark for Chinese-healthcare AI. Sivon reference: 92/100 mean (2026-05-27).
benchmark retrieval memory chinese-nlp healthcare-ai long-context llm-evaluation glp-1 longmemeval perimenopause sivon
-
Updated
May 27, 2026 - JavaScript