Skip to content

Actually upgrade gemini#141

Merged
mreichhoff merged 3 commits into
mainfrom
update-gemini-for-real
May 17, 2026
Merged

Actually upgrade gemini#141
mreichhoff merged 3 commits into
mainfrom
update-gemini-for-real

Conversation

@mreichhoff
Copy link
Copy Markdown
Owner

Turns out .prompt takes precedence, whoops.

We stay on 2.5 for analyze image for costs / some odd errors with 3.

Turns out .prompt takes precedence, whoops.

We stay on 2.5 for analyze image for costs / some odd errors with 3
@github-actions
Copy link
Copy Markdown

🧪 AI Evaluation Results

collocation

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 4/4 (100%) 100.0%
englishTranslationPresent ✅ 4/4 (100%) 100.0%
outputStructureValid ✅ 4/4 (100%) 100.0%

explain chinese

Evaluator Pass Rate Avg Score
chineseTextPresent 🟡 69/70 (99%) 98.6%
validPinyinFormat ✅ 70/70 (100%) 100.0%
grammarExplanationQuality ❌ 0/70 (0%) NaN%
outputStructureValid ✅ 70/70 (100%) 100.0%

explain english

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 75/75 (100%) 100.0%
validPinyinFormat ✅ 75/75 (100%) 100.0%
grammarExplanationQuality ❌ 0/75 (0%) NaN%
outputStructureValid ✅ 75/75 (100%) 100.0%

generate sentences

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
validPinyinFormat ✅ 5/5 (100%) 100.0%
sentenceGenerationQuality ❌ 0/5 (0%) NaN%
outputStructureValid ✅ 5/5 (100%) 100.0%

word context

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
englishTranslationPresent ✅ 5/5 (100%) 100.0%
outputStructureValid ✅ 5/5 (100%) 100.0%

📦 Download full results

@github-actions
Copy link
Copy Markdown

🧪 AI Evaluation Results

collocation

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 4/4 (100%) 100.0%
englishTranslationPresent ✅ 4/4 (100%) 100.0%
outputStructureValid ✅ 4/4 (100%) 100.0%

explain chinese

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 70/70 (100%) 100.0%
validPinyinFormat ✅ 70/70 (100%) 100.0%
grammarExplanationQuality ✅ 70/70 (100%) 99.7%
outputStructureValid ✅ 70/70 (100%) 100.0%

explain english

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 75/75 (100%) 100.0%
validPinyinFormat ✅ 75/75 (100%) 100.0%
grammarExplanationQuality 🟡 74/75 (99%) 99.2%
outputStructureValid ✅ 75/75 (100%) 100.0%

generate sentences

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
validPinyinFormat ✅ 5/5 (100%) 100.0%
sentenceGenerationQuality ❌ 0/5 (0%) NaN%
outputStructureValid ✅ 5/5 (100%) 100.0%

word context

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
englishTranslationPresent ✅ 5/5 (100%) 100.0%
outputStructureValid ✅ 5/5 (100%) 100.0%

📦 Download full results

@github-actions
Copy link
Copy Markdown

🧪 AI Evaluation Results

collocation

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 4/4 (100%) 100.0%
englishTranslationPresent ✅ 4/4 (100%) 100.0%
outputStructureValid ✅ 4/4 (100%) 100.0%

explain chinese

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 70/70 (100%) 100.0%
validPinyinFormat ✅ 70/70 (100%) 100.0%
grammarExplanationQuality ✅ 70/70 (100%) 100.0%
outputStructureValid ✅ 70/70 (100%) 100.0%

explain english

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 75/75 (100%) 100.0%
validPinyinFormat ✅ 75/75 (100%) 100.0%
grammarExplanationQuality ✅ 75/75 (100%) 99.7%
outputStructureValid ✅ 75/75 (100%) 100.0%

generate sentences

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
validPinyinFormat ✅ 5/5 (100%) 100.0%
sentenceGenerationQuality ✅ 5/5 (100%) 100.0%
outputStructureValid ✅ 5/5 (100%) 100.0%

word context

Evaluator Pass Rate Avg Score
chineseTextPresent ✅ 5/5 (100%) 100.0%
englishTranslationPresent ✅ 5/5 (100%) 100.0%
outputStructureValid ✅ 5/5 (100%) 100.0%

📦 Download full results

@mreichhoff mreichhoff merged commit 4fca524 into main May 17, 2026
1 check passed
@mreichhoff mreichhoff deleted the update-gemini-for-real branch May 17, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant