NL2SQL Robustness Study

Overview

This project investigates the robustness of Natural Language to SQL (NL2SQL) models against schema obfuscation. We analyze how removing semantic names from tables and columns affects the exact match accuracy of LLMs.

Project Structure

src/: Python source code for the pipeline.
- run_pipeline.py: Main entry point.
- obfuscator.py: Logic for scrambling schema names.
- schema_utils.py: Helpers for loading Spider schemas.
- evaluator.py: Evaluation script (Exact Match).
- llm_client.py: Interfaces for OpenAI and Simulated models.
data/spider_obfuscated/: The generated obfuscated schemas (tables_random.json) and mappings.
results/: Output logs and prediction text files.
docs/: Project documentation (walkthrough.md, task.md).

Evaluation Scope

This project focuses on Exact Match Accuracy.

Execution Accuracy: Supported but requires the full Spider SQLite databases.
1. Download the database folder from the Spider dataset page.
2. Place it at data/spider/database/.
3. The pipeline will automatically detect it and run execution accuracy.

Setup

Install dependencies:
```
pip install -r src/requirements.txt
```
(Optional) Set up OpenAI key in src/.env if running real inference.

NLTK Setup: The evaluation modules require NLTK tokenizers. Run this once:

python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

Reproducing Findings (With OpenAI API Key)

To reproduce the full set of results (clean, obfuscated, ambiguous):

Set the API Key: The submission requires the following OpenAI API Key to run: Windows PowerShell:

$env:OPENAI_API_KEY = "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"

Note: This key is provided for reproduction only.

Generate Data Variants: Create the obfuscated and ambiguous schema files:
```
python generate_variants.py
```
Run All Experiments: Execute the batch script to run all 6 conditions (Clean/Obfuscated/Ambiguous x Desc/NoDesc):
```
python run_all_experiments.py
```
Note: This runs 50 examples per condition (300 requests total). To run the full dev set, edit the script to set limit = None.
View Results: The script produces results/summary_table.md containing the accuracy table.

Results

See results/summary_table.md for the latest experiment accuracy metrics. Detailed logs are available in results/pred_*.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NL2SQL Robustness Study

Overview

Project Structure

Evaluation Scope

Setup

Reproducing Findings (With OpenAI API Key)

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/spider		data/spider
results		results
src		src
.gitignore		.gitignore
README.md		README.md
generate_variants.py		generate_variants.py
requirements.txt		requirements.txt
run_all_experiments.py		run_all_experiments.py

Folders and files

Latest commit

History

Repository files navigation

NL2SQL Robustness Study

Overview

Project Structure

Evaluation Scope

Setup

Reproducing Findings (With OpenAI API Key)

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages