Skip to content

Add Ability to Generate Test Cases for GEPA based on User Instruction#293

Draft
auschoi96 wants to merge 4 commits intodatabricks-solutions:mainfrom
auschoi96:main
Draft

Add Ability to Generate Test Cases for GEPA based on User Instruction#293
auschoi96 wants to merge 4 commits intodatabricks-solutions:mainfrom
auschoi96:main

Conversation

@auschoi96
Copy link
Collaborator

Summary

After some initial testing and discussion internally, we decided that more test cases are needed, especially as resources update. For example, if the zerobus sdk updates and has breaking changes, we want to make sure that's captured in the skill. Or, maybe the user only wants serverless to be used, then they can run this to make sure it prioritizes the skill.

This should generate new test_cases in the ground_truth.yaml and help make GEPA more accurate.

What's in the PR

This PR address the following:

  1. addition of --focus that users can add multiple times and it will generate test cases
  2. Change in judges for more targeted evaluation
  3. Change in weights on what GEPA considers as important

Test Plan.

You can run the following commands to test the new flags and optimizations. You will need to set the correct env variables according to the .test/README.md:

uv run python .test/scripts/optimize.py databricks-zerobus-ingest --reflection-lm databricks/gepa-fallbacks --judge-model databricks/gepa-fallbacks --preset quick --agent-eval --mlflow-experiment "/Users/austin.choi@databricks.com/GenAI/mlflow updates/AC updates dc-assistant-agent_experiment"

This is in example of using --focus to generate more examples which will aide in the optimization

uv run python .test/scripts/optimize.py databricks-zerobus-ingest --reflection-lm databricks/gepa-fallbacks --judge-model databricks/gepa-fallbacks --preset quick --agent-eval --mlflow-experiment "/Users/austin.choi@databricks.com/GenAI/mlflow updates/AC updates dc-assistant-agent_experiment" --focus "ensure the latest databricks-sdk is being used like 0.97.0" --focus "ensure the latest databricks zerobus sdk is being using 1.1.0" --focus "make sure compatibility across runtimes"

Test plan

  • Ensured new test cases are generated
  • Ensure new judges are being used
  • Ensure the CLI command above works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant