Task 4: Extend evaluate_internal with 4-bit loading, add HF Hub publish script, and Streamlit remote-inference UI by brej-29 · Pull Request #5 · brej-29/analytics-copilot-text2sql

brej-29 · 2026-01-10T17:38:51Z

This PR implements Task 4 features: 4-bit loading in internal evaluation, a Hugging Face Hub publish workflow for adapters, and a Streamlit UI that uses remote inference via Hugging Face Inference. It adds tests, docs, and lightweight UI scaffolding to enable end-to-end workflow without loading large models locally.

What’s included

Part A — evaluate_internal.py 4-bit support and smoke mode

Extend scripts/evaluate_internal.py with CLI flags:
--load_in_4bit, --bnb_4bit_quant_type, --bnb_4bit_compute_dtype, --bnb_4bit_use_double_quant
Mirror evaluate_spider_external.py logic: when load_in_4bit is enabled, use transformers.BitsAndBytesConfig and pass quantization_config to from_pretrained. Ensure CPU compatibility by disabling 4-bit if no CUDA or CUDA unavailable and gracefully fallback with a warning.
Keep adapter loading unchanged (PEFT adapter_dir).
Add a clear log line indicating whether 4-bit was enabled or skipped.
Add a lightweight smoke mode: python scripts/evaluate_internal.py --smoke that loads a tiny sample (e.g., 5) and exits with code 0. In CPU-only environments this mode auto-falls back to mock behavior to avoid heavy model loading.
Update tests to cover 4-bit and smoke flags (test_eval_cli_args updated accordingly).

Part B — Hugging Face Hub publish workflow

Add new script scripts/publish_to_hub.py to publish adapter artifacts to HF Hub.
- CLI:
  --repo_id (required)
  --adapter_dir (default outputs/adapters)
  --private (bool, default False)
  --commit_message (default: Add QLoRA adapter artifacts)
  --include_metrics (optional path to a metrics JSON file)
- Uses huggingface_hub.HfApi to create repo if missing and upload_folder for the adapters.
- Ensures a README.md model card is present in adapter_dir with metadata:
  description of the adapter, training dataset, usage notes, safety, and metrics if provided.
- Fails gracefully with a clear message if HF token is missing.
README and docs updated to describe how to publish to HF Hub and remote inference notes.

Part C — Streamlit UI for remote HF Inference

Add UI at app/streamlit_app.py that runs on Streamlit Community Cloud and calls remote inference using hugggingface_hub.InferenceClient.
- UI inputs: Schema (DDL) and Question (NL). Button: Generate SQL. Output shows SQL in a code block with a copy option and an optional Show prompt expander.
- InferenceClient construction priority:
  1. If st.secrets["HF_INFERENCE_BASE_URL"] is set, use InferenceClient(base_url=..., api_key=HF_TOKEN).
  2. Else use InferenceClient(model=HF_MODEL_ID, api_key=HF_TOKEN, provider=HF_PROVIDER).
- Secrets expected: HF_TOKEN, HF_MODEL_ID, optional HF_INFERENCE_BASE_URL, HF_PROVIDER.
- Lightweight: app does not import torch/transformers; includes timeouts and user-friendly errors.
Add .streamlit/secrets.toml.example with placeholders for HF_TOKEN, HF_MODEL_ID, HF_INFERENCE_BASE_URL, HF_PROVIDER; ensure gitignore ignores secrets.toml.
Update requirements.txt to include streamlit and huggingface_hub.
Documentation in README.md and context.md explains how to run Streamlit locally and on Streamlit Cloud, and how remote inference works.

Project-wide improvements

src/text2sql/infer.py updated to accept new 4-bit knobs (bnb_4bit_quant_type, bnb_4bit_use_double_quant) and to log quantization settings clearly; supports loading with 4-bit quantization when requested.
docs/evaluation.md updated to reflect 4-bit args and smoke mode behavior.
context.md updated to reflect Task 4 extensions, including details about 4-bit eval, HF Hub publishing, and Streamlit remote UI.
Added smoke-friendly test coverage ensuring CLI args parsing supports new flags and smoke mode.

How to use (quick references)

4-bit evaluation with CPU fallback (smoke):
python scripts/evaluate_internal.py --smoke --val_path data/processed/val.jsonl --out_dir reports/
Publish adapter artifacts to HF Hub:
python scripts/publish_to_hub.py --repo_id your-username/analytics-copilot-text2sql-mistral7b-qlora --adapter_dir outputs/adapters --private
Run Streamlit UI locally:
cp .streamlit/secrets.toml.example .streamlit/secrets.toml
streamlit run app/streamlit_app.py
Remote inference notes: UI uses HF InferenceClient; Streamlit Cloud does not load models locally; if serverless inference is insufficient for large models, consider Inference Endpoints.

Notes on tests and quality gates

pytest -q should pass locally with new tests added for 4-bit args and smoke.
python -m compileall . should succeed (no syntax regressions).
The publish_to_hub.py script is robust to missing HF tokens and repository creation errors, with clear error messages.
The UI app is lightweight and does not import heavy ML libraries; it relies on HF Inference for generation.

This PR delivers the end-to-end workflow for 4-bit evaluation, HF Hub publishing, and a remote-inference Streamlit UI, aligned with backward-compatible defaults and robust logging/diagnostics.

This pull request was co-created with Cosine Genie

Original Task: analytics-copilot-text2sql/40b8o5133snj
Author: Brejesh Balakrishnan

…nd 4-bit quant support with smoke tests; update docs and tests Co-authored-by: Cosine <agent@cosine.sh>

…nused import, adjust exception typing, improve readme generation quoting) Co-authored-by: Cosine <agent@cosine.sh>

…tions and auto README; include HuggingFace deploy docs; add tests Co-authored-by: Cosine <agent@cosine.sh>

…or client creation to use it, and add smoke script and tests Co-authored-by: Cosine <agent@cosine.sh>

brej-29 and others added 5 commits January 10, 2026 17:38

feat: add Streamlit UI for remote inference, HF Hub publish script, a…

55b4cde

…nd 4-bit quant support with smoke tests; update docs and tests Co-authored-by: Cosine <agent@cosine.sh>

style: lint and formatting fixes across scripts (noqa/isort, remove u…

0f6a6e2

…nused import, adjust exception typing, improve readme generation quoting) Co-authored-by: Cosine <agent@cosine.sh>

feat(publish-to-hub): validate adapter_dir; add skip/strict README op…

9b4c3a2

…tions and auto README; include HuggingFace deploy docs; add tests Co-authored-by: Cosine <agent@cosine.sh>

feat: introduce HF config resolver (endpoint+adapter support), refact…

ac503fc

…or client creation to use it, and add smoke script and tests Co-authored-by: Cosine <agent@cosine.sh>

Minor fixes

9c4e610

brej-29 merged commit 687ca67 into main Jan 11, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task 4: Extend evaluate_internal with 4-bit loading, add HF Hub publish script, and Streamlit remote-inference UI#5

Task 4: Extend evaluate_internal with 4-bit loading, add HF Hub publish script, and Streamlit remote-inference UI#5
brej-29 merged 5 commits intomainfrom
cosine/internal-eval-4bit-hf-publish-streamlit

brej-29 commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brej-29 commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant