LLMSQL · DzmitryPihulski · Mar 5, 2026 · Mar 4, 2026 · Mar 4, 2026
@@ -42,31 +42,27 @@ The package doesn't have the dataset, it is stored on our [HuggingFace page](htt
 
 ## Latest News 📣
 
-* [2025/12] Evaluation class converted to function see [new `evaluate(...)` function](./llmsql/evaluation/evaluate.py#evaluate)
+* [2026/03] Added support for API inference, for now only for OpenAI-compatable APIs, see [`inference_api()` function](./llmsql/inference/inference_api.py#inference_api)
 
-* New page version added to [`https://llmsql.github.io/llmsql-benchmark/`](https://llmsql.github.io/llmsql-benchmark/)
+* [2026/03] The page now contains first version of [leaderboard](https://llmsql.github.io/llmsql-benchmark/#:~:text=%F0%9F%93%8A%20Leaderboard%20%E2%80%94%20Execution%20Accuracy%20%28EX)!
 
-* Vllm inference method now supports chat templates, see [`inference_vllm(...)`](./llmsql/inference/inference_vllm.py#inference_vllm).
-* Transformers inference now supports custom chat tempalates with `chat_template` argument, see [`inference_transformers(...)`](./llmsql/inference/inference_transformers.py#inference_transformers)
+* [2026/02] The new LLMSQL 2.0 version is out now! See the [dataset](https://huggingface.co/datasets/llmsql-bench/llmsql-2.0). The support is already added with the `version` parameter to each `inference` function.
 
-* More stable and deterministic inference with  [`inference_vllm(...)`](./llmsql/inference/inference_vllm.py#inference_vllm) function added by setting [some envars](./llmsql/inference/inference_vllm.py)
+* [2025/12] Evaluation class converted to function see [new `evaluate(...)` function](./llmsql/evaluation/evaluate.py#evaluate)
 
-* `padding_side` argument added to [`inference_transformers(...)`](./llmsql/inference/inference_transformers.py#inference_transformers) function with default `left` option.
 
 
 ## Usage Recommendations
 
-Modern LLMs are already strong at **producing SQL queries without finetuning**.
+Modern LLMs are already strong at producing SQL queries without finetuning.
 We therefore recommend that most users:
 
 1. **Run inference** directly on the full benchmark:
-    model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
-    output_file="path_to_your_outputs.jsonl",
-   - Use [`llmsql.inference_transformers`](./llmsql/inference/inference_transformers.py) (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use [`llmsql.inference_vllm`](./llmsql/inference/inference_vllm.py). Works both with HF model id, e.g. `Qwen/Qwen2.5-1.5B-Instruct` and model instance passed directly, e.g. `inference_transformers(model_or_model_name_or_path=model, ...)`
+   - Use [`llmsql.inference_transformers`](./llmsql/inference/inference_transformers.py) (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use [`llmsql.inference_vllm`](./llmsql/inference/inference_vllm.py). Works both with HF model id, e.g. `Qwen/Qwen2.5-1.5B-Instruct` and model instance passed directly, e.g. `inference_transformers(model_or_model_name_or_path=model, ...)`. The api inference is also supported, see [`inference_api()`](./llmsql/inference/inference_api.py#inference_api)
    - Evaluate results against the benchmark with the [`llmsql.evaluate`](./llmsql/evaluation/evaluator.py) function.
 
 2. **Optional finetuning**:
-   - For research or domain adaptation, we provide finetuning version for HF models. Use [Finetune Ready](https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark-finetune-ready) dataset from HuggingFace.
+   - For research or domain adaptation, we provide finetuning version for HF models. Use [Finetune Ready](https://huggingface.co/collections/llmsql-bench/fine-tune-ready-versions-of-the-llmsql-benchmark) datasets from HuggingFace.
 
 > [!Tip]
 > You can find additional manuals in the README files of each folder([Inferece Readme](./llmsql/inference/README.md), [Evaluation Readme](./llmsql/evaluation/README.md))
@@ -80,7 +76,7 @@ We therefore recommend that most users:
 ```
 
 llmsql/
-├── evaluation/          # Scripts for downloading DB + evaluating predictions
+├── evaluation/          # Scripts for evaluation
 └── inference/           # Generate SQL queries with your LLM
 ```
 
@@ -159,10 +155,12 @@ print(report)
 ```
 
 
+For more examples check the [examples folder](./examples/)
+
 ## Prompt Template
 
-The prompt defines explicit constraints on the generated output. 
-The model is instructed to output only a valid SQL `SELECT` query, to use a fixed table name (`"Table"`) **(which will be replaced with the actual table name during evaluation)**, to quote all table and column names, and to restrict generation to the specified SQL functions, condition operators, and keywords. 
+The prompt defines explicit constraints on the generated output.
+The model is instructed to output only a valid SQL `SELECT` query, to use a fixed table name (`"Table"`) **(which will be replaced with the actual table name during evaluation)**, to quote all table and column names, and to restrict generation to the specified SQL functions, condition operators, and keywords.
 The full prompt specification is provided in the prompt template.
 
 Below is an example of the **5-shot prompt template** used during inference.
@@ -224,13 +222,6 @@ Implementations of 0-shot, 1-shot, and 5-shot prompt templates are available her
 👉 [link-to-file](./llmsql/prompts/prompts.py)
 
 
-
-## Suggested Workflow
-
-* **Primary**: Run inference on all questions with vllm or transformers → Evaluate with `evaluate()`.
-* **Secondary (optional)**: Fine-tune on `train/val` → Test on `test_questions.jsonl`. You can find the datasets here [HF Finetune Ready](https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark-finetune-ready).
-
-
 ## Contributing
 
 Check out our [open issues](https://github.com/LLMSQL/llmsql-benchmark/issues), fork this repo and feel free to submit pull requests!

@@ -113,15 +113,15 @@ <h3>1️⃣ Installation</h3>
   <h3>2️⃣ Inference from CLI</h3>
 
 <p><strong>vLLM Backend (Recommended)</strong></p>
-<pre><code>llmsql inference --method vllm \
+<pre><code>llmsql inference vllm \
 --model-name Qwen/Qwen2.5-1.5B-Instruct \
 --output-file outputs/preds.jsonl \
 --batch-size 8 \
 --num_fewshots 5 \
 --temperature 0.0</code></pre>
 
 <p><strong>Transformers Backend</strong></p>
-<pre><code>llmsql inference --method transformers \
+<pre><code>llmsql inference transformers \
 --model-or-model-name-or-path Qwen/Qwen2.5-1.5B-Instruct \
 --output-file outputs/preds.jsonl \
 --batch-size 8 \
@@ -163,7 +163,7 @@ <h2 id="citation">📄 Citation</h2>
     <pre><code>@inproceedings{llmsql_bench,
   title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
   author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
-  booktitle={2025 IEEE ICувцDMW},
+  booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
   year={2025},
   organization={IEEE}
 }

@@ -14,6 +14,12 @@ Inference API Reference
 
 ---
 
+.. automodule:: llmsql.inference.inference_api
+   :members:
+   :undoc-members:
+
+---
+
 .. raw:: html
 
    <div style="text-align:center; margin-top:2rem; color:#666;">

@@ -77,6 +77,41 @@ Using vllm backend.
     print(report)
 
 
+Using OpenAI-compateble API.
+
+.. code-block:: python
+
+    from llmsql import inference_api
+    from dotenv import load_dotenv
+    import os
+    load_dotenv()
+
+    # Run inference (will take some time)
+    results = inference_api(
+        model_name="gpt-5-mini",
+        base_url="https://api.openai.com/v1/",
+        api_key=os.environ["OPENAI_API_KEY"],
+        api_kwargs={
+            "response_format": {
+                    "type": "text"
+                },
+                "verbosity": "medium",
+                "reasoning_effort": "medium",
+                "store": False
+        },
+        requests_per_minute=100,
+        output_file="test_output_api.jsonl",
+        limit=50,
+        num_fewshots = 5,
+        seed=42,
+        version="2.0"
+    )
+
+    # Evaluate the results
+    evaluator = LLMSQLEvaluator()
+    report = evaluator.evaluate(outputs_path="outputs/preds_transformers.jsonl")
+    print(report)
+
 ---
 
 .. raw:: html

@@ -0,0 +1,132 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5409b21a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from llmsql import inference_api\n",
+    "from dotenv import load_dotenv\n",
+    "import os\n",
+    "load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "581e9c25",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2026-03-04 08:10:34,504 [INFO] llmsql-bench: Removing existing path: llmsql_workdir/questions.jsonl\n",
+      "2026-03-04 08:10:34,506 [INFO] llmsql-bench: Downloading questions.jsonl from Hugging Face Hub...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a71443d8f32840838ba484eadf26d9d0",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "questions.jsonl:   0%|          | 0.00/18.3M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2026-03-04 08:10:35,608 [INFO] llmsql-bench: Downloaded questions.jsonl to: llmsql_workdir/questions.jsonl\n",
+      "2026-03-04 08:10:35,608 [INFO] llmsql-bench: Removing existing path: llmsql_workdir/tables.jsonl\n",
+      "2026-03-04 08:10:35,611 [INFO] llmsql-bench: Downloading tables.jsonl from Hugging Face Hub...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "62ec9ecc8d8b48f7a835019688ee1894",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tables.jsonl:   0%|          | 0.00/45.3M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2026-03-04 08:10:36,293 [INFO] llmsql-bench: Downloaded tables.jsonl to: llmsql_workdir/tables.jsonl\n",
+      "Generating: 100%|██████████| 50/50 [00:33<00:00,  1.48it/s]\n",
+      "2026-03-04 08:11:11,394 [INFO] llmsql-bench: Generation completed. 50 results saved to test_output_api.jsonl\n"
+     ]
+    }
+   ],
+   "source": [
+    "results = inference_api(\n",
+    "    model_name=\"gpt-5-mini\",\n",
+    "    base_url=\"https://api.openai.com/v1/\",\n",
+    "    api_key=os.environ[\"OPENAI_API_KEY\"],\n",
+    "    api_kwargs={\n",
+    "          \"response_format\": {\n",
+    "                \"type\": \"text\"\n",
+    "            },\n",
+    "            \"verbosity\": \"medium\",\n",
+    "            \"reasoning_effort\": \"medium\",\n",
+    "            \"store\": False\n",
+    "    },\n",
+    "    requests_per_minute=100,\n",
+    "    output_file=\"test_output_api.jsonl\",\n",
+    "    limit=50,\n",
+    "    num_fewshots = 5,\n",
+    "    seed=42,\n",
+    "    version=\"2.0\"\n",
+    ")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "llmsql-benchmark-3.11 (3.11.13)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
@@ -0,0 +1,50 @@
+{"question_id": 15, "completion": "SELECT \"Launched\" FROM \"Table\" WHERE \"Laid down\" = \"September 1, 1964\";"}
+{"question_id": 48, "completion": "SELECT \"Tone (Latvian notation: /~/ - level, /^/ - broken)\" FROM \"Table\" WHERE \"Translation\" = \"Gen.Sing. plague\";"}
+{"question_id": 4, "completion": "SELECT COUNT(\"School/Club Team\") FROM \"Table\" WHERE \"Player\" = \"Jalen Rose\";"}
+{"question_id": 21, "completion": "SELECT \"US Air Force equivalent\" FROM \"Table\" WHERE \"Commonwealth equivalent\" = \"Pilot Officer\";"}
+{"question_id": 31, "completion": "SELECT COUNT(\"Total W–L\") FROM \"Table\" WHERE \"Doubles W–L\" = \"11–11\";"}
+{"question_id": 42, "completion": "SELECT \"BTs retail price (regulated)\" FROM \"Table\" WHERE \"Tariff code\" = \"ff0 PRS\";"}
+{"question_id": 7, "completion": "SELECT \"Date\" FROM \"Table\" WHERE \"Circuit\" = \"Misano\";"}
+{"question_id": 28, "completion": "SELECT \"Rank Each wrestlers total number of days as champion are ranked highest to lowest; wrestlers with the same number mean that they are tied for that certain rank.\" FROM \"Table\" WHERE \"Wrestler\" = \"Go Shiozaki\";"}
+{"question_id": 36, "completion": "SELECT \"Frequency\" FROM \"Table\" WHERE \"Market/Rank\" = \"Burlington - Plattsburgh , Vermont - New York /143\";"}
+{"question_id": 23, "completion": "SELECT \"Rank in Spanish\" FROM \"Table\" WHERE \"Rank in English\" = \"Major\";"}
+{"question_id": 11, "completion": "SELECT COUNT(DISTINCT \"Nationality\") FROM \"Table\" WHERE \"NHL team\" = \"New Jersey Devils\";"}
+{"question_id": 47, "completion": "SELECT \"BTs retail price (regulated)\" FROM \"Table\" WHERE \"Tariff code\" = \"g10\";"}
+{"question_id": 12, "completion": "SELECT \"Pick\" FROM \"Table\" WHERE \"Player\" = \"Dorain Anneck\";"}
+{"question_id": 16, "completion": "SELECT \"#\" FROM \"Table\" WHERE \"Commissioned\" = \"December 18, 1965\";"}
+{"question_id": 27, "completion": "SELECT \"Combined days\" FROM \"Table\" WHERE \"Wrestler\" = \"Go Shiozaki\";"}
+{"question_id": 32, "completion": "SELECT COUNT(\"Singles W–L\") FROM \"Table\" WHERE \"Doubles W–L\" = \"11–14\";"}
+{"question_id": 22, "completion": "SELECT \"Commonwealth equivalent\" FROM \"Table\" WHERE \"US Air Force equivalent\" = \"Major General\";"}
+{"question_id": 43, "completion": "SELECT \"Approx premium\" FROM \"Table\" WHERE \"Tariff code\" = \"g9\";"}
+{"question_id": 49, "completion": "SELECT MIN(\"Radius (R ☉ )\") FROM \"Table\";"}
+{"question_id": 34, "completion": "SELECT MAX(\"Ties played\") FROM \"Table\" WHERE \"Player\" = \"Josip Palada Category:Articles with hCards\";"}
+{"question_id": 39, "completion": "SELECT \"Format\" FROM \"Table\" WHERE \"Branding\" = \"1290 WKBK W281AU 104.1\";"}
+{"question_id": 6, "completion": "SELECT \"No\" FROM \"Table\" WHERE \"Race winner\" = \"Kevin Curtain\";"}
+{"question_id": 18, "completion": "SELECT \"Laid down\" FROM \"Table\" WHERE \"Commissioned\" = \"October 29, 1965\";"}
+{"question_id": 13, "completion": "SELECT \"Nationality\" FROM \"Table\" WHERE \"NHL team\" = \"Vancouver Canucks\";"}
+{"question_id": 38, "completion": "SELECT \"Branding\" FROM \"Table\" WHERE \"Calls\" = \"WRKO\";"}
+{"question_id": 2, "completion": "SELECT \"School/Club Team\" FROM \"Table\" WHERE \"Years in Toronto\" = \"1995-96\";"}
+{"question_id": 29, "completion": "SELECT \"Province\" FROM \"Table\" WHERE \"Electorate\" = \"Grey and Bell\";"}
+{"question_id": 44, "completion": "SELECT COUNT(\"Tariff code\") FROM \"Table\" WHERE \"BTs retail price (regulated)\" = \"2p/min or inclusive\";"}
+{"question_id": 26, "completion": "SELECT \"Rank Each wrestlers total number of days as champion are ranked highest to lowest; wrestlers with the same number mean that they are tied for that certain rank.\" FROM \"Table\" WHERE \"Wrestler\" = \"Bryan Danielson\";"}
+{"question_id": 33, "completion": "SELECT \"Total W–L\" FROM \"Table\" WHERE \"Player\" = \"Boro Jovanović Category:Articles with hCards\";"}
+{"question_id": 37, "completion": "SELECT \"Branding\" FROM \"Table\" WHERE \"Group owner\" = \"Qantam of Cape Cod, LLC\";"}
+{"question_id": 8, "completion": "SELECT COUNT(DISTINCT \"Position\") FROM \"Table\" WHERE \"College/junior/club team\" = \"Sherbrooke Faucons (QMJHL)\";"}
+{"question_id": 3, "completion": "SELECT \"School/Club Team\" FROM \"Table\" WHERE \"Years in Toronto\" = \"2003-06\";"}
+{"question_id": 24, "completion": "SELECT \"Wrestler\" FROM \"Table\" WHERE \"# of reigns\" = 2;"}
+{"question_id": 14, "completion": "SELECT \"Pick\" FROM \"Table\" WHERE \"College/junior/club team\" = \"Springfield Olympics (NEJHL)\";"}
+{"question_id": 45, "completion": "SELECT COUNT(\"Tariff code\") FROM \"Table\" WHERE \"BTs retail price (regulated)\" = \"2.553p/min\";"}
+{"question_id": 30, "completion": "SELECT \"Province\" FROM \"Table\" WHERE \"Electorate\" = \"Bay of Islands\";"}
+{"question_id": 25, "completion": "SELECT MIN(\"# of reigns\") FROM \"Table\";"}
+{"question_id": 19, "completion": "SELECT \"Commonwealth equivalent\" FROM \"Table\" WHERE \"Rank in Spanish\" = \"Coronel\";"}
+{"question_id": 40, "completion": "SELECT \"Market/Rank\" FROM \"Table\" WHERE \"Calls\" = \"WCRN\";"}
+{"question_id": 35, "completion": "SELECT SUM(\"Ties played\") FROM \"Table\" WHERE \"Total W–L\" = \"38–24\";"}
+{"question_id": 50, "completion": "SELECT \"Spectral type\" FROM \"Table\" WHERE \"Star (Pismis24-#)\" = \"1SW\";"}
+{"question_id": 20, "completion": "SELECT \"Rank in Spanish\" FROM \"Table\" WHERE \"Rank in English\" = \"Group Captain\";"}
+{"question_id": 1, "completion": "SELECT \"Nationality\" FROM \"Table\" WHERE \"Player\" = \"Terrence Ross\";"}
+{"question_id": 5, "completion": "SELECT \"Circuit\" FROM \"Table\" WHERE \"Round\" = \"Assen\";"}
+{"question_id": 10, "completion": "SELECT COUNT(DISTINCT \"College/junior/club team\") FROM \"Table\" WHERE \"NHL team\" = \"Washington Capitals\";"}
+{"question_id": 46, "completion": "SELECT \"Prefixes\" FROM \"Table\" WHERE \"Scheme\" = \"Pence per minute, fixed at all times\" AND \"Approx premium\" = \"3p/min\";"}
+{"question_id": 17, "completion": "SELECT \"#\" FROM \"Table\" WHERE \"Commissioned\" = \"September 30, 1967\";"}
+{"question_id": 9, "completion": "SELECT \"Nationality\" FROM \"Table\" WHERE \"College/junior/club team\" = \"Thunder Bay Flyers (USHL)\";"}
+{"question_id": 41, "completion": "SELECT \"Frequency\" FROM \"Table\" WHERE \"Calls\" = \"WEGP\";"}
@@ -26,7 +26,12 @@ def __getattr__(name: str):  # type: ignore
         from .inference.inference_transformers import inference_transformers
 
         return inference_transformers
+    elif name == "inference_api":
+        from .inference.inference_api import inference_api
+
+        return inference_api
+
     raise AttributeError(f"module {__name__} has no attribute {name!r}")
 
 
-__all__ = ["evaluate", "inference_vllm", "inference_transformers"]
+__all__ = ["evaluate", "inference_vllm", "inference_transformers", "inference_api"]