Hi — To keep up with current standards, I updated this repository’s Python syntax version, removed the potentially outdated EM metric, retained its forward-looking ideas and logic as they relate to the EX metric, and added support for proper indentation in SQL syntax (the previous TXT-based input format did not handle standard indentation well). The new version of the repository is available at: test-suite-sql-eval_2026
Notes:
- The computation method used in this repository may differ slightly from the original version. Scores could be higher by around 0.x in some cases, but overall evaluations are stable and do not fluctuate wildly.
- Users can control the maximum execution time for SQL queries; the default is 30 seconds and it can be adjusted as needed. (Previously there were score fluctuations because some SQL queries would run successfully only sometimes due to database caching behavior, which caused inconsistent query execution times.)
Hi — To keep up with current standards, I updated this repository’s Python syntax version, removed the potentially outdated EM metric, retained its forward-looking ideas and logic as they relate to the EX metric, and added support for proper indentation in SQL syntax (the previous TXT-based input format did not handle standard indentation well). The new version of the repository is available at: test-suite-sql-eval_2026
Notes: