This project investigates the robustness of Natural Language to SQL (NL2SQL) models against schema obfuscation. We analyze how removing semantic names from tables and columns affects the exact match accuracy of LLMs.
src/: Python source code for the pipeline.run_pipeline.py: Main entry point.obfuscator.py: Logic for scrambling schema names.schema_utils.py: Helpers for loading Spider schemas.evaluator.py: Evaluation script (Exact Match).llm_client.py: Interfaces for OpenAI and Simulated models.
data/spider_obfuscated/: The generated obfuscated schemas (tables_random.json) and mappings.results/: Output logs and prediction text files.docs/: Project documentation (walkthrough.md,task.md).
This project focuses on Exact Match Accuracy.
- Execution Accuracy: Supported but requires the full Spider SQLite databases.
- Download the
databasefolder from the Spider dataset page. - Place it at
data/spider/database/. - The pipeline will automatically detect it and run execution accuracy.
- Download the
- Install dependencies:
pip install -r src/requirements.txt
- (Optional) Set up OpenAI key in
src/.envif running real inference. - NLTK Setup: The evaluation modules require NLTK tokenizers. Run this once:
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"
To reproduce the full set of results (clean, obfuscated, ambiguous):
-
Set the API Key: The submission requires the following OpenAI API Key to run: Windows PowerShell:
$env:OPENAI_API_KEY = "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"
Note: This key is provided for reproduction only.
-
Generate Data Variants: Create the obfuscated and ambiguous schema files:
python generate_variants.py
-
Run All Experiments: Execute the batch script to run all 6 conditions (Clean/Obfuscated/Ambiguous x Desc/NoDesc):
python run_all_experiments.py
Note: This runs 50 examples per condition (300 requests total). To run the full dev set, edit the script to set
limit = None. -
View Results: The script produces
results/summary_table.mdcontaining the accuracy table.
See results/summary_table.md for the latest experiment accuracy metrics.
Detailed logs are available in results/pred_*.txt.