From 7d32818d7be278f35bfb3e1c4c15b37b173cfa63 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Sun, 15 Feb 2026 19:27:29 -0500 Subject: [PATCH 01/10] Improve explain Chinese prompt --- functions/prompts/explain-chinese.prompt | 25 +++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index 36ba166b..bfd27d0f 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -6,4 +6,27 @@ input: output: schema: ChineseExplanationSchema --- -Explain the Chinese text {{text}}. +You are an expert Chinese language tutor helping learners understand Chinese text deeply. + +Analyze the following Chinese text: {{text}} + +Provide a comprehensive explanation including: + +**Translation & Pinyin:** +- Accurate English translation +- Complete pinyin with proper tone marks for EVERY character in the input text (do not omit any characters) + +**Plain Text Explanation:** +Write a clear, helpful explanation that covers: +- The overall meaning and context of the text +- Any typos, grammatical errors, or unnatural phrasing you detect (explain what's wrong and how to fix it) +- Common substitutions: ONLY if a similar word could genuinely replace a key term in this specific context, mention it and explain any subtle differences. Be precise about when substitutions would NOT work (e.g., 又 vs 再 have distinct temporal meanings and are not interchangeable). Do not suggest substitutions that would be ungrammatical or change the meaning incorrectly. + +**Grammar Highlights:** +For each notable grammar point, provide: +- The grammar concept name (e.g., "把 construction", "Topic-comment structure") +- A clear explanation of how it works in this sentence + +IMPORTANT: Your analysis must exactly match the input text. Do not add, remove, or change any characters when providing pinyin or explanations. If the input contains 了, your pinyin must include it. + +Focus on insights that help learners understand not just WHAT the text means, but WHY it's structured this way and how native speakers would naturally express similar ideas. From 50a63097674117a229c5b9c0a3515ba4d89857b2 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Sun, 15 Feb 2026 23:39:37 -0500 Subject: [PATCH 02/10] Try dropping substitution instruction --- functions/prompts/explain-chinese.prompt | 1 - 1 file changed, 1 deletion(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index bfd27d0f..87ef103d 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -20,7 +20,6 @@ Provide a comprehensive explanation including: Write a clear, helpful explanation that covers: - The overall meaning and context of the text - Any typos, grammatical errors, or unnatural phrasing you detect (explain what's wrong and how to fix it) -- Common substitutions: ONLY if a similar word could genuinely replace a key term in this specific context, mention it and explain any subtle differences. Be precise about when substitutions would NOT work (e.g., 又 vs 再 have distinct temporal meanings and are not interchangeable). Do not suggest substitutions that would be ungrammatical or change the meaning incorrectly. **Grammar Highlights:** For each notable grammar point, provide: From c1b5cb314e8a42fddc33d035d5b60588cf518cc2 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Mon, 16 Feb 2026 00:03:20 -0500 Subject: [PATCH 03/10] Revert "Try dropping substitution instruction" This reverts commit 50a63097674117a229c5b9c0a3515ba4d89857b2. --- functions/prompts/explain-chinese.prompt | 1 + 1 file changed, 1 insertion(+) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index 87ef103d..bfd27d0f 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -20,6 +20,7 @@ Provide a comprehensive explanation including: Write a clear, helpful explanation that covers: - The overall meaning and context of the text - Any typos, grammatical errors, or unnatural phrasing you detect (explain what's wrong and how to fix it) +- Common substitutions: ONLY if a similar word could genuinely replace a key term in this specific context, mention it and explain any subtle differences. Be precise about when substitutions would NOT work (e.g., 又 vs 再 have distinct temporal meanings and are not interchangeable). Do not suggest substitutions that would be ungrammatical or change the meaning incorrectly. **Grammar Highlights:** For each notable grammar point, provide: From 2e8eb26dc56d450adc0dd337f3e5e01d18e81e49 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Mon, 16 Feb 2026 00:03:25 -0500 Subject: [PATCH 04/10] Revert "Improve explain Chinese prompt" This reverts commit 7d32818d7be278f35bfb3e1c4c15b37b173cfa63. --- functions/prompts/explain-chinese.prompt | 25 +----------------------- 1 file changed, 1 insertion(+), 24 deletions(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index bfd27d0f..36ba166b 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -6,27 +6,4 @@ input: output: schema: ChineseExplanationSchema --- -You are an expert Chinese language tutor helping learners understand Chinese text deeply. - -Analyze the following Chinese text: {{text}} - -Provide a comprehensive explanation including: - -**Translation & Pinyin:** -- Accurate English translation -- Complete pinyin with proper tone marks for EVERY character in the input text (do not omit any characters) - -**Plain Text Explanation:** -Write a clear, helpful explanation that covers: -- The overall meaning and context of the text -- Any typos, grammatical errors, or unnatural phrasing you detect (explain what's wrong and how to fix it) -- Common substitutions: ONLY if a similar word could genuinely replace a key term in this specific context, mention it and explain any subtle differences. Be precise about when substitutions would NOT work (e.g., 又 vs 再 have distinct temporal meanings and are not interchangeable). Do not suggest substitutions that would be ungrammatical or change the meaning incorrectly. - -**Grammar Highlights:** -For each notable grammar point, provide: -- The grammar concept name (e.g., "把 construction", "Topic-comment structure") -- A clear explanation of how it works in this sentence - -IMPORTANT: Your analysis must exactly match the input text. Do not add, remove, or change any characters when providing pinyin or explanations. If the input contains 了, your pinyin must include it. - -Focus on insights that help learners understand not just WHAT the text means, but WHY it's structured this way and how native speakers would naturally express similar ideas. +Explain the Chinese text {{text}}. From bf5757405a2dd3f914165ee5daaf9f545ec0d99e Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Mon, 16 Feb 2026 00:08:06 -0500 Subject: [PATCH 05/10] Try system instructions instead? --- functions/prompts/explain-chinese.prompt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index 36ba166b..e169f0b2 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -6,4 +6,8 @@ input: output: schema: ChineseExplanationSchema --- +{{role "system"}} +You are an expert Chinese language tutor. Prioritize accuracy over comprehensiveness—only explain what you are confident about. When providing pinyin, ensure it exactly matches every character in the input. + +{{role "user"}} Explain the Chinese text {{text}}. From 552768ce424aad66a3b585c0d905dff36914edc0 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Mon, 16 Feb 2026 09:48:43 -0500 Subject: [PATCH 06/10] Next attempt --- functions/prompts/explain-chinese.prompt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index e169f0b2..005b5139 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -7,7 +7,12 @@ output: schema: ChineseExplanationSchema --- {{role "system"}} -You are an expert Chinese language tutor. Prioritize accuracy over comprehensiveness—only explain what you are confident about. When providing pinyin, ensure it exactly matches every character in the input. +You are an expert Chinese language tutor. Prioritize accuracy over comprehensiveness — only explain what you are confident about. +When providing pinyin, ensure it exactly matches every character in the input. {{role "user"}} Explain the Chinese text {{text}}. + +In your explanation: +* if there are any typos, grammatical errors, or unnatural phrasing you detect, please explain what's wrong and how to fix it. +* if the text contains words that commonly trip up learners (like 又/再, 还是/或者, 会/能/可以), briefly clarify the usage shown here. From 28f6921ff5f2a1df9b068193c4f2b0312ce1db0c Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Mon, 16 Feb 2026 10:15:36 -0500 Subject: [PATCH 07/10] See if we can stop overexplanations --- functions/prompts/explain-chinese.prompt | 1 - 1 file changed, 1 deletion(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index 005b5139..be13ced0 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -15,4 +15,3 @@ Explain the Chinese text {{text}}. In your explanation: * if there are any typos, grammatical errors, or unnatural phrasing you detect, please explain what's wrong and how to fix it. -* if the text contains words that commonly trip up learners (like 又/再, 还是/或者, 会/能/可以), briefly clarify the usage shown here. From b91db234cd1b70bac0754ea0c4b0dd3cfaf4201e Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Tue, 17 Feb 2026 17:15:01 -0500 Subject: [PATCH 08/10] add authoritative sources? --- functions/prompts/explain-chinese.prompt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/functions/prompts/explain-chinese.prompt b/functions/prompts/explain-chinese.prompt index be13ced0..cb8a09b7 100644 --- a/functions/prompts/explain-chinese.prompt +++ b/functions/prompts/explain-chinese.prompt @@ -7,7 +7,9 @@ output: schema: ChineseExplanationSchema --- {{role "system"}} -You are an expert Chinese language tutor. Prioritize accuracy over comprehensiveness — only explain what you are confident about. +You are an expert Chinese language tutor. You have thoroughly studied the Chinese Grammar Wiki and HSK Standard Course textbooks, and you use their terminology and teaching approaches. + +Prioritize accuracy over comprehensiveness — only explain what you are confident about. When providing pinyin, ensure it exactly matches every character in the input. {{role "user"}} From d83c48b871bb5253343ec2b66a6bbc2d56ad36c1 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Sat, 21 Feb 2026 22:14:29 -0500 Subject: [PATCH 09/10] add expected error cases to eval set --- functions/datasets/explain-chinese.json | 140 ++++++++++++++++++++++++ functions/src/genkit-eval.ts | 24 +++- 2 files changed, 162 insertions(+), 2 deletions(-) diff --git a/functions/datasets/explain-chinese.json b/functions/datasets/explain-chinese.json index 0ec20a4b..5a6d80b5 100644 --- a/functions/datasets/explain-chinese.json +++ b/functions/datasets/explain-chinese.json @@ -703,5 +703,145 @@ ], "expectedTranslation": "I would rather go hungry than eat that" } + }, + { + "input": "他跑的很快", + "reference": { + "expectedGrammarPoints": [ + "得 degree complement" + ], + "expectedTranslation": "He runs very fast", + "expectedError": { + "type": "wrong particle", + "description": "Should use 得 instead of 的 for degree complements after verbs", + "correction": "他跑得很快" + } + } + }, + { + "input": "天气是很好", + "reference": { + "expectedGrammarPoints": [ + "Adjectival predicate" + ], + "expectedTranslation": "The weather is very good", + "expectedError": { + "type": "unnecessary 是", + "description": "是 should not be used before adjectives in Chinese adjectival predicate sentences", + "correction": "天气很好" + } + } + }, + { + "input": "我买了一个书", + "reference": { + "expectedGrammarPoints": [ + "Measure words" + ], + "expectedTranslation": "I bought a book", + "expectedError": { + "type": "wrong measure word", + "description": "书 (book) requires measure word 本, not 个", + "correction": "我买了一本书" + } + } + }, + { + "input": "我昨天不去学校", + "reference": { + "expectedGrammarPoints": [ + "Negation of past actions" + ], + "expectedTranslation": "I didn't go to school yesterday", + "expectedError": { + "type": "wrong negation", + "description": "Past actions should be negated with 没(有), not 不", + "correction": "我昨天没去学校" + } + } + }, + { + "input": "他很高兴地笑", + "reference": { + "expectedGrammarPoints": [ + "地 adverbial particle" + ], + "expectedTranslation": "He laughed happily", + "expectedError": { + "type": "unnatural phrasing", + "description": "While grammatically acceptable, native speakers would more naturally say 他高兴地笑了 or 他开心地笑", + "correction": "他高兴地笑了" + } + } + }, + { + "input": "在见", + "reference": { + "expectedGrammarPoints": [ + "Common farewell expression" + ], + "expectedTranslation": "Goodbye / See you again", + "expectedError": { + "type": "typo/homophone error", + "description": "在 (at/in) is wrong; should be 再 (again)", + "correction": "再见" + } + } + }, + { + "input": "我想买那个红的衣服", + "reference": { + "expectedGrammarPoints": [ + "的 with adjectives modifying nouns" + ], + "expectedTranslation": "I want to buy that red piece of clothing", + "expectedError": { + "type": "unnecessary 的", + "description": "Single-syllable adjectives like 红 directly modify nouns without 的: 红衣服", + "correction": "我想买那件红衣服" + } + } + }, + { + "input": "我吃饭了已经", + "reference": { + "expectedGrammarPoints": [ + "Word order with 已经" + ], + "expectedTranslation": "I have already eaten", + "expectedError": { + "type": "word order error", + "description": "已经 should come before the verb, not at the end of the sentence", + "correction": "我已经吃饭了" + } + } + }, + { + "input": "她的很漂亮", + "reference": { + "expectedGrammarPoints": [ + "Adjective predicates" + ], + "expectedTranslation": "She is very beautiful", + "expectedError": { + "type": "misplaced 的", + "description": "的 should not be placed after a pronoun when followed by an adjective predicate; 的 makes it possessive", + "correction": "她很漂亮" + } + } + }, + { + "input": "我给你打电话明天", + "reference": { + "expectedGrammarPoints": [ + "Time word placement" + ], + "expectedTranslation": "I will call you tomorrow", + "expectedError": { + "type": "word order error", + "description": "Time words like 明天 should come before the verb phrase or at the beginning of the sentence", + "correction": "我明天给你打电话" + } + } } ] \ No newline at end of file diff --git a/functions/src/genkit-eval.ts b/functions/src/genkit-eval.ts index 80a2f912..b167a56f 100644 --- a/functions/src/genkit-eval.ts +++ b/functions/src/genkit-eval.ts @@ -177,6 +177,25 @@ export const grammarExplanationQualityEvaluator = ai.defineEvaluator( const output = typeof datapoint.output === 'string' ? datapoint.output : JSON.stringify(datapoint.output); + + // Check if reference includes expected error information + const reference = datapoint.reference as { + expectedError?: { + type: string; + description: string; + correction: string; + }; + } | undefined; + + const hasExpectedError = reference?.expectedError != null; + const errorContext = hasExpectedError + ? `\n\nIMPORTANT: The input text contains an intentional error that the tool should identify: +- Error type: ${reference!.expectedError!.type} +- What's wrong: ${reference!.expectedError!.description} +- Correct form: ${reference!.expectedError!.correction} + +The tool MUST identify and explain this error to receive a high score. If the output does not mention this error, give a score of 1 or 2.` + : ''; const { output: evalResult } = await ai.generate({ model: vertexAI.model('gemini-3-pro-preview'), @@ -184,7 +203,7 @@ export const grammarExplanationQualityEvaluator = ai.defineEvaluator( Input (Chinese text to explain): ${input} -Output (explanation provided): ${output} +Output (explanation provided): ${output}${errorContext} Evaluate the quality of this explanation on a scale of 1-5: 1 = Poor: Incorrect, confusing, or unhelpful @@ -197,7 +216,8 @@ Consider: - Is the translation accurate? - Are grammar explanations clear and correct? - Is the pinyin accurate? -- Would this help a learner understand the text?`, +- Would this help a learner understand the text? +- If the input contains errors, does the output identify and explain them?`, output: { schema: z.object({ score: z.number().min(1).max(5).describe('Quality score from 1-5'), From 2bfe26177adee216e72e660426204866bfe73733 Mon Sep 17 00:00:00 2001 From: Matthew Reichhoff Date: Sat, 21 Feb 2026 22:15:34 -0500 Subject: [PATCH 10/10] [to revert] only evaluate the thing we're testing out --- .github/workflows/eval-functions.yml | 54 ++++++++++++++-------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/.github/workflows/eval-functions.yml b/.github/workflows/eval-functions.yml index 796b66c0..d69075e1 100644 --- a/.github/workflows/eval-functions.yml +++ b/.github/workflows/eval-functions.yml @@ -111,33 +111,33 @@ jobs: --batchSize 10 \ --output eval-results/explain-chinese-results.json || { echo "⚠️ explainText evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } - echo "Running explainEnglish evaluation..." - genkit eval:flow explainEnglish \ - --input datasets/explain-english.json \ - --evaluators=custom/chineseTextPresent,custom/validPinyinFormat,custom/outputStructureValid,custom/grammarExplanationQuality \ - --batchSize 10 \ - --output eval-results/explain-english-results.json || { echo "⚠️ explainEnglish evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } - - echo "Running generateChineseSentences evaluation..." - genkit eval:flow generateChineseSentences \ - --input datasets/generate-chinese-sentences.json \ - --evaluators=custom/chineseTextPresent,custom/validPinyinFormat,custom/outputStructureValid,custom/sentenceGenerationQuality \ - --batchSize 10 \ - --output eval-results/generate-sentences-results.json || { echo "⚠️ generateChineseSentences evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } - - echo "Running analyzeCollocation evaluation..." - genkit eval:flow analyzeCollocation \ - --input datasets/analyze-collocation.json \ - --evaluators=custom/chineseTextPresent,custom/englishTranslationPresent,custom/outputStructureValid \ - --batchSize 10 \ - --output eval-results/collocation-results.json || { echo "⚠️ analyzeCollocation evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } - - echo "Running explainWordInContext evaluation..." - genkit eval:flow explainWordInContext \ - --input datasets/explain-word-in-context.json \ - --evaluators=custom/chineseTextPresent,custom/englishTranslationPresent,custom/outputStructureValid \ - --batchSize 10 \ - --output eval-results/word-context-results.json || { echo "⚠️ explainWordInContext evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } + # echo "Running explainEnglish evaluation..." + # genkit eval:flow explainEnglish \ + # --input datasets/explain-english.json \ + # --evaluators=custom/chineseTextPresent,custom/validPinyinFormat,custom/outputStructureValid,custom/grammarExplanationQuality \ + # --batchSize 10 \ + # --output eval-results/explain-english-results.json || { echo "⚠️ explainEnglish evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } + + # echo "Running generateChineseSentences evaluation..." + # genkit eval:flow generateChineseSentences \ + # --input datasets/generate-chinese-sentences.json \ + # --evaluators=custom/chineseTextPresent,custom/validPinyinFormat,custom/outputStructureValid,custom/sentenceGenerationQuality \ + # --batchSize 10 \ + # --output eval-results/generate-sentences-results.json || { echo "⚠️ generateChineseSentences evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } + + # echo "Running analyzeCollocation evaluation..." + # genkit eval:flow analyzeCollocation \ + # --input datasets/analyze-collocation.json \ + # --evaluators=custom/chineseTextPresent,custom/englishTranslationPresent,custom/outputStructureValid \ + # --batchSize 10 \ + # --output eval-results/collocation-results.json || { echo "⚠️ analyzeCollocation evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } + + # echo "Running explainWordInContext evaluation..." + # genkit eval:flow explainWordInContext \ + # --input datasets/explain-word-in-context.json \ + # --evaluators=custom/chineseTextPresent,custom/englishTranslationPresent,custom/outputStructureValid \ + # --batchSize 10 \ + # --output eval-results/word-context-results.json || { echo "⚠️ explainWordInContext evaluation had errors"; EVAL_ERRORS=$((EVAL_ERRORS+1)); } if [ $EVAL_ERRORS -gt 0 ]; then echo "⚠️ $EVAL_ERRORS evaluation(s) had errors - check results for details"