Comprehensive examples demonstrating vibecheck's capabilities.
Test your model across 10+ languages with the same evaluation.
This example tests a model's ability to describe making a peanut butter and jelly sandwich in multiple languages:
# examples/multilingual-pbj.yaml
metadata:
name: multilingual-pbj
model: meta-llama/llama-4-maverick
system_prompt: "You are a translator. Respond both in the language the question is asked as well as English."
evals:
- prompt: "Describe how to make a peanut butter and jelly sandwich."
checks:
- match: "*bread*"
- llm_judge:
criteria: "Does this accurately describe how to make a peanut butter and jelly sandwich in English"
- min_tokens: 20
- max_tokens: 300
- prompt: "Décrivez comment faire un sandwich au beurre d'arachide et à la confiture."
checks:
- match: "*pain*"
- llm_judge:
criteria: "Does this accurately describe how to make a peanut butter and jelly sandwich in French"
- min_tokens: 20
- max_tokens: 300vibe check -f examples/multilingual-pbj.yaml- Pattern Matching: Validates language-specific keywords (e.g., "bread" in English, "pain" in French)
- LLM Judge: Ensures the response is accurate in the target language
- Token Constraints: Enforces appropriate response length
Validate MCP (Model Context Protocol) tool calling with external services. This example shows how to test Linear MCP integration using secrets and variables to securely configure the MCP server.
Obtain your Linear API key from your Linear workspace settings. Navigate to Settings → API → Personal API Keys in your Linear workspace to create a new API key.
Set your Linear API key as a secret (sensitive, write-only):
vibe set secret linear.apiKey "your-linear-api-key-here"Set your Linear project ID and team name as variables:
vibe set var linear.projectId "your-project-id"
vibe set var linear.projectTeam "your-team-name"Run the Linear MCP evaluation (the suite is preloaded):
vibe check linear-mcpThe evaluation tests three scenarios:
- Listing Issues: Retrieves recent issues from your Linear workspace
- Issue Details: Gets details on a specific Linear todo item
- Creating Items: Creates a new todo item in Linear
- Secret Management: API keys are stored securely and never exposed
- Variable Substitution: Configuration values can be updated without modifying YAML
- MCP Integration: Tests tool calling and external service integration
- Runtime Resolution: Secrets and vars are resolved when the evaluation runs
Secrets and vars are resolved at runtime when the evaluation runs, so you can update them without modifying your YAML files.
Combine multiple check types for comprehensive testing.
This example demonstrates combining semantic similarity, LLM judges, pattern matching, and token constraints:
# examples/hello-world.yaml
evals:
- prompt: How are you today?
checks:
- semantic:
expected: "I'm doing well, thank you for asking"
threshold: 0.7
- llm_judge:
criteria: "Is this a friendly and appropriate response to 'How are you today?'"
- min_tokens: 10
- max_tokens: 100
- prompt: What is 2+2?
checks:
- or:
- match: "*4*"
- match: "*four*"
- llm_judge:
criteria: "Is this a correct mathematical answer to 2+2?"
- min_tokens: 1
- max_tokens: 20vibe check -f examples/hello-world.yamlThis example showcases:
- Semantic Similarity: Validates meaning rather than exact wording
- LLM Judge: Subjective quality assessment
- Token Constraints: Ensures concise responses
- OR Logic: Accepts multiple valid answer formats
- Pattern Matching: Simple text validation
match: When you need specific keywords or phrasessemantic: When meaning matters more than exact wordingllm_judge: For subjective quality or complex criteriamin_tokens/max_tokens: To enforce response lengthor: When multiple valid formats existnot_match: To exclude unwanted content
Additional examples are available in the examples/ directory:
hello-world.yaml: Basic checks and getting startedfinance.yaml: Financial knowledge evaluationhealthcare.yaml: Medical knowledge evaluationlang.yaml: Multilingual capabilitiespolitics.yaml: Political knowledge evaluationsports.yaml: Sports knowledge evaluationstrawberry.yaml: Reasoning capabilities
# Set your API key
export VIBECHECK_API_KEY=your-api-key
# Build and run all examples
npm run build
npm run test:examplesSee Example Tests README for more details on automated example testing.