Evaluation API Reference¶
The evaluate() function allows you to benchmark Text-to-SQL model outputs @@ -153,37 +155,6 @@
Report Saving
-LLMSQL Evaluation Module¶
-Provides the evaluate() function to benchmark Text-to-SQL model outputs
-on the LLMSQL benchmark.
-See the documentation for full usage details.
-
-
-
- -llmsql.evaluation.evaluate.evaluate(outputs, *, workdir_path: str | None = 'llmsql_workdir', questions_path: str | None = None, db_path: str | None = None, save_report: str | None = None, show_mismatches: bool = True, max_mismatches: int = 5) dict[source]¶ -
Evaluate predicted SQL queries against the LLMSQL benchmark.
--
-
- Parameters: -
-
-
outputs – Either a JSONL file path or a list of dicts.
-workdir_path – Directory for auto-downloads (ignored if all paths provided).
-questions_path – Manual path to benchmark questions JSONL.
-db_path – Manual path to SQLite benchmark DB.
-save_report – Optional manual save path. If None → auto-generated.
-show_mismatches – Print mismatches while evaluating.
-max_mismatches – Max mismatches to print.
-
-- Returns: -
Metrics and mismatches.
-
-- Return type: -
dict
-
-
—
Table of Contents
Navigation
LLMSQL package Documentation¶
- -Welcome to the LLMSQL documentation! + + ← Back to main page +
Welcome to the LLMSQL documentation! This guide covers everything you need to use the project, from running inference to evaluating Text-to-SQL models.
—
@@ -161,16 +163,13 @@Navigation
Inference API Reference¶
-LLMSQL Transformers Inference Function¶
-This module provides a single function inference_transformers() that performs -text-to-SQL generation using large language models via the Transformers backend.
-Example
-from llmsql.inference import inference_transformers
-
-results = inference_transformers(
- model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
- output_file="outputs/preds_transformers.jsonl",
- questions_path="data/questions.jsonl",
- tables_path="data/tables.jsonl",
- num_fewshots=5,
- batch_size=8,
- max_new_tokens=256,
- temperature=0.7,
- model_args={
- "torch_dtype": "bfloat16",
- },
- generate_kwargs={
- "do_sample": False,
- },
-)
-Notes
-This function uses the HuggingFace Transformers backend and may produce -slightly different outputs than the vLLM backend even with the same inputs -due to differences in implementation and numerical precision.
--
-
- -llmsql.inference.inference_transformers.inference_transformers(model_or_model_name_or_path: str | AutoModelForCausalLM, tokenizer_or_name: str | Any | None = None, *, trust_remote_code: bool = True, dtype: dtype = torch.float16, device_map: str | dict[str, int] | None = 'auto', hf_token: str | None = None, model_kwargs: dict[str, Any] | None = None, tokenizer_kwargs: dict[str, Any] | None = None, chat_template: str | None = None, max_new_tokens: int = 256, temperature: float = 0.0, do_sample: bool = False, top_p: float = 1.0, top_k: int = 50, generation_kwargs: dict[str, Any] | None = None, output_file: str = 'llm_sql_predictions.jsonl', questions_path: str | None = None, tables_path: str | None = None, workdir_path: str = 'llmsql_workdir', num_fewshots: int = 5, batch_size: int = 8, seed: int = 42) list[dict[str, str]][source]¶ -
Inference a causal model (Transformers) on the LLMSQL benchmark.
--
-
- Parameters: -
-
-
model_or_model_name_or_path – Model object or HF model name/path.
-tokenizer_or_name – Tokenizer object or HF tokenizer name/path.
-Loading (# Tokenizer)
-trust_remote_code – Whether to trust remote code (default: True).
-dtype – Torch dtype for model (default: float16).
-device_map – Device placement strategy (default: “auto”).
-hf_token – Hugging Face authentication token.
-model_kwargs – Additional arguments for AutoModelForCausalLM.from_pretrained(). -Note: ‘dtype’, ‘device_map’, ‘trust_remote_code’, ‘token’ -are handled separately and will override values here.
-Loading
-tokenizer_kwargs – Additional arguments for AutoTokenizer.from_pretrained(). ‘padding_side’ defaults to “left”. -Note: ‘trust_remote_code’, ‘token’ are handled separately and will override values here.
-Chat (# Prompt &)
-chat_template – Optional chat template to apply before tokenization.
-Generation (#)
-max_new_tokens – Maximum tokens to generate per sequence.
-temperature – Sampling temperature (0.0 = greedy).
-do_sample – Whether to use sampling vs greedy decoding.
-top_p – Nucleus sampling parameter.
-top_k – Top-k sampling parameter.
-generation_kwargs – Additional arguments for model.generate(). -Note: ‘max_new_tokens’, ‘temperature’, ‘do_sample’, -‘top_p’, ‘top_k’ are handled separately.
-Benchmark (#)
-output_file – Output JSONL file path for completions.
-questions_path – Path to benchmark questions JSONL.
-tables_path – Path to benchmark tables JSONL.
-workdir_path – Working directory path.
-num_fewshots – Number of few-shot examples (0, 1, or 5).
-batch_size – Batch size for inference.
-seed – Random seed for reproducibility.
-
-- Returns: -
List of generated SQL results with metadata.
-
-
Inference API Reference¶
+Inference API Reference¶
—
-LLMSQL vLLM Inference Function¶
-This module provides a single function inference_vllm() that performs -text-to-SQL generation using large language models via the vLLM backend.
-Example
-from llmsql.inference import inference_vllm
-
-results = inference_vllm(
- model_name="Qwen/Qwen2.5-1.5B-Instruct",
- output_file="outputs/predictions.jsonl",
- questions_path="data/questions.jsonl",
- tables_path="data/tables.jsonl",
- num_fewshots=5,
- batch_size=8,
- max_new_tokens=256,
- temperature=0.7,
- tensor_parallel_size=1,
-)
-Notes
-This function uses the vLLM backend. Outputs may differ from the Transformers -backend due to differences in implementation, batching, and numerical precision.
--
-
- -llmsql.inference.inference_vllm.inference_vllm(model_name: str, *, trust_remote_code: bool = True, tensor_parallel_size: int = 1, hf_token: str | None = None, llm_kwargs: dict[str, Any] | None = None, use_chat_template: bool = True, max_new_tokens: int = 256, temperature: float = 1.0, do_sample: bool = True, sampling_kwargs: dict[str, Any] | None = None, output_file: str = 'llm_sql_predictions.jsonl', questions_path: str | None = None, tables_path: str | None = None, workdir_path: str = 'llmsql_workdir', num_fewshots: int = 5, batch_size: int = 8, seed: int = 42) list[dict[str, str]][source]¶ -
Run SQL generation using vLLM.
--
-
- Parameters: -
-
-
model_name – Hugging Face model name or path.
-Loading (# Model)
-trust_remote_code – Whether to trust remote code (default: True).
-tensor_parallel_size – Number of GPUs for tensor parallelism (default: 1).
-hf_token – Hugging Face authentication token.
-llm_kwargs – Additional arguments for vllm.LLM(). -Note: ‘model’, ‘tokenizer’, ‘tensor_parallel_size’, -‘trust_remote_code’ are handled separately and will -override values here.
-Generation (#)
-max_new_tokens – Maximum tokens to generate per sequence.
-temperature – Sampling temperature (0.0 = greedy).
-do_sample – Whether to use sampling vs greedy decoding.
-sampling_kwargs – Additional arguments for vllm.SamplingParams(). -Note: ‘temperature’, ‘max_tokens’ are handled -separately and will override values here.
-Benchmark (#)
-output_file – Path to write outputs (will be overwritten).
-questions_path – Path to questions.jsonl (auto-downloads if missing).
-tables_path – Path to tables.jsonl (auto-downloads if missing).
-workdir_path – Directory to store downloaded data.
-num_fewshots – Number of few-shot examples (0, 1, or 5).
-batch_size – Number of questions per generation batch.
-seed – Random seed for reproducibility.
-
-- Returns: -
List of dicts containing question_id and generated completion.
-
-
—