Featured Examples

Comprehensive examples demonstrating vibecheck's capabilities.

Multilingual Testing
MCP Tool Integration
Advanced Evaluation Patterns

🌍 Multilingual Testing

Test your model across 10+ languages with the same evaluation.

Example: Multilingual PB&J Instructions

This example tests a model's ability to describe making a peanut butter and jelly sandwich in multiple languages:

# examples/multilingual-pbj.yaml
metadata:
  name: multilingual-pbj
  model: meta-llama/llama-4-maverick
  system_prompt: "You are a translator. Respond both in the language the question is asked as well as English."

evals:
  - prompt: "Describe how to make a peanut butter and jelly sandwich."
    checks:
      - match: "*bread*"
      - llm_judge:
          criteria: "Does this accurately describe how to make a peanut butter and jelly sandwich in English"
      - min_tokens: 20
      - max_tokens: 300

  - prompt: "Décrivez comment faire un sandwich au beurre d'arachide et à la confiture."
    checks:
      - match: "*pain*"
      - llm_judge:
          criteria: "Does this accurately describe how to make a peanut butter and jelly sandwich in French"
      - min_tokens: 20
      - max_tokens: 300

Running the Example

vibe check -f examples/multilingual-pbj.yaml

Key Features

Pattern Matching: Validates language-specific keywords (e.g., "bread" in English, "pain" in French)
LLM Judge: Ensures the response is accurate in the target language
Token Constraints: Enforces appropriate response length

🔧 MCP Tool Integration

Validate MCP (Model Context Protocol) tool calling with external services. This example shows how to test Linear MCP integration using secrets and variables to securely configure the MCP server.

Step 1: Get Your Linear API Key

Obtain your Linear API key from your Linear workspace settings. Navigate to Settings → API → Personal API Keys in your Linear workspace to create a new API key.

Step 2: Set Up the Secret

Set your Linear API key as a secret (sensitive, write-only):

vibe set secret linear.apiKey "your-linear-api-key-here"

Step 3: Set Up Variables

Set your Linear project ID and team name as variables:

vibe set var linear.projectId "your-project-id"
vibe set var linear.projectTeam "your-team-name"

Step 4: Run the Evaluation

Run the Linear MCP evaluation (the suite is preloaded):

vibe check linear-mcp

What It Tests

The evaluation tests three scenarios:

Listing Issues: Retrieves recent issues from your Linear workspace
Issue Details: Gets details on a specific Linear todo item
Creating Items: Creates a new todo item in Linear

Key Features

Secret Management: API keys are stored securely and never exposed
Variable Substitution: Configuration values can be updated without modifying YAML
MCP Integration: Tests tool calling and external service integration
Runtime Resolution: Secrets and vars are resolved when the evaluation runs

Secrets and vars are resolved at runtime when the evaluation runs, so you can update them without modifying your YAML files.

🧠 Advanced Evaluation Patterns

Combine multiple check types for comprehensive testing.

Example: Mixed Check Types

This example demonstrates combining semantic similarity, LLM judges, pattern matching, and token constraints:

# examples/hello-world.yaml
evals:
  - prompt: How are you today?
    checks:
      - semantic:
          expected: "I'm doing well, thank you for asking"
          threshold: 0.7
      - llm_judge:
          criteria: "Is this a friendly and appropriate response to 'How are you today?'"
      - min_tokens: 10
      - max_tokens: 100

  - prompt: What is 2+2?
    checks:
      - or:
          - match: "*4*"
          - match: "*four*"
      - llm_judge:
          criteria: "Is this a correct mathematical answer to 2+2?"
      - min_tokens: 1
      - max_tokens: 20

Running the Example

vibe check -f examples/hello-world.yaml

Check Type Combinations

This example showcases:

Semantic Similarity: Validates meaning rather than exact wording
LLM Judge: Subjective quality assessment
Token Constraints: Ensures concise responses
OR Logic: Accepts multiple valid answer formats
Pattern Matching: Simple text validation

When to Use Each Check Type

match: When you need specific keywords or phrases
semantic: When meaning matters more than exact wording
llm_judge: For subjective quality or complex criteria
min_tokens/max_tokens: To enforce response length
or: When multiple valid formats exist
not_match: To exclude unwanted content

More Examples

Additional examples are available in the examples/ directory:

hello-world.yaml: Basic checks and getting started
finance.yaml: Financial knowledge evaluation
healthcare.yaml: Medical knowledge evaluation
lang.yaml: Multilingual capabilities
politics.yaml: Political knowledge evaluation
sports.yaml: Sports knowledge evaluation
strawberry.yaml: Reasoning capabilities

Running All Examples

# Set your API key
export VIBECHECK_API_KEY=your-api-key

# Build and run all examples
npm run build
npm run test:examples

See Example Tests README for more details on automated example testing.

← Back to README | YAML Syntax Reference | CLI Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Featured Examples

Table of Contents

🌍 Multilingual Testing

Example: Multilingual PB&J Instructions

Running the Example

Key Features

🔧 MCP Tool Integration

Step 1: Get Your Linear API Key

Step 2: Set Up the Secret

Step 3: Set Up Variables

Step 4: Run the Evaluation

What It Tests

Key Features

🧠 Advanced Evaluation Patterns

Example: Mixed Check Types

Running the Example

Check Type Combinations

When to Use Each Check Type

More Examples

Running All Examples

FilesExpand file tree

examples.md

Latest commit

History

examples.md

File metadata and controls

Featured Examples

Table of Contents

🌍 Multilingual Testing

Example: Multilingual PB&J Instructions

Running the Example

Key Features

🔧 MCP Tool Integration

Step 1: Get Your Linear API Key

Step 2: Set Up the Secret

Step 3: Set Up Variables

Step 4: Run the Evaluation

What It Tests

Key Features

🧠 Advanced Evaluation Patterns

Example: Mixed Check Types

Running the Example

Check Type Combinations

When to Use Each Check Type

More Examples

Running All Examples