Multi-module Java 17 + Gradle (Groovy DSL) framework for reproducible Java CodeFixes benchmarking.
bench-core: domain model, config, fingerprints, stage contractsbench-io: JSONL+Zstd streaming IO, blob store, resume state indexbench-dataset: dataset generation + masking stagebench-retrieval: Lucene BM25 index/retrieval + leakage filterbench-augmentation: prompt block constructionbench-llm: OpenAI-compatible generation clientbench-metrics: exact/edit/compile metricsbench-runner: E2E orchestration, batching, resume/reuse/shufflebench-cli: Spring Boot + picocli CLI commands
bench run --config <path> [--resume] [--start-from-stage <STAGE>] [--reuse-from-run <RUN_ID>]bench inspect-run --run-id <RUN_ID> [--output-dir runs]bench validate-config --config <path>
This repository already contains a fixed Java project:
examples/fixed-project
And ready configs:
examples/run_config.fixed.yaml(normal run with local LLM endpoint)examples/run_config.fixed.no_llm.yaml(run without available LLM, still produces numeric artifacts)
If Gradle wrapper is not present yet, replace ./gradlew with gradle in commands below.
./gradlew :bench-cli:bootRun --args='validate-config --config examples/run_config.fixed.no_llm.yaml'RUN_ID=fixed-no-llm-001
./gradlew :bench-cli:bootRun --args="run --config examples/run_config.fixed.no_llm.yaml --run-id ${RUN_ID}"./gradlew :bench-cli:bootRun --args="inspect-run --run-id ${RUN_ID}"Dataset size:
zstd -dc runs/${RUN_ID}/dataset/raw_samples*.jsonl.zst | wc -lRetrieval leakage counters:
for f in runs/${RUN_ID}/retrieving/part-*.jsonl.zst; do zstd -dc "$f"; done | \
jq -s '{
samples: length,
leakage_detected: map(select(.leakage_detected == true)) | length,
leakage_filtered_total: (map(.leakage_filtered_count) | add)
}'Generation status (how many records contain errors):
for f in runs/${RUN_ID}/generation/part-*.jsonl.zst; do zstd -dc "$f"; done | \
jq -s '{
samples: length,
with_candidates: map(select((.candidates | length) > 0)) | length,
with_errors: map(select((.errors | length) > 0)) | length
}'Main metrics:
for f in runs/${RUN_ID}/metrics/part-*.jsonl.zst; do zstd -dc "$f"; done | \
jq -s '{
rows: length,
exact_match_rate: (map(if .metrics.exact_match then 1 else 0 end) | add / (length | if . == 0 then 1 else . end)),
mean_edit_similarity: (map(.metrics.edit_similarity) | add / (length | if . == 0 then 1 else . end)),
compile_pass_rate: (map(if .metrics.compiles then 1 else 0 end) | add / (length | if . == 0 then 1 else . end))
}'If local vLLM (or any OpenAI-compatible server) is available on http://127.0.0.1:8000:
RUN_ID=fixed-llm-001
./gradlew :bench-cli:bootRun --args="run --config examples/run_config.fixed.yaml --run-id ${RUN_ID}"Then reuse the same metrics commands above for real quality numbers.
$RUN_ID = "fixed-no-llm-001"
./gradlew :bench-cli:bootRun --args='validate-config --config examples/run_config.fixed.no_llm.yaml'
./gradlew :bench-cli:bootRun --args="run --config examples/run_config.fixed.no_llm.yaml --run-id $RUN_ID"
./gradlew :bench-cli:bootRun --args="inspect-run --run-id $RUN_ID"For each run, outputs are written under runs/<run_id>/:
run_manifest.jsonrun_config.yamldataset/raw_samples.jsonl.zstmasking/part-*.jsonl.zstretrieving/part-*.jsonl.zstaugmentation/part-*.jsonl.zstgeneration/part-*.jsonl.zstmetrics/part-*.jsonl.zstblobs/sha256/...
See:
examples/run_config.yamlexamples/run_config.fixed.yamlexamples/run_config.fixed.no_llm.yaml
- The implementation assumes Gradle wrapper (
./gradlew) for compile metrics in project/module mode. - Stage outputs include
sample_id,stage_config_fingerprint, andinput_fingerprintfor resume/reuse. - Relative paths inside YAML (
dataset.project_path,execution.output_dir) are resolved relative to the config file location.