Skip to content

Task 4: Extend evaluate_internal with 4-bit loading, add HF Hub publish script, and Streamlit remote-inference UI#5

Merged
brej-29 merged 5 commits intomainfrom
cosine/internal-eval-4bit-hf-publish-streamlit
Jan 11, 2026
Merged

Task 4: Extend evaluate_internal with 4-bit loading, add HF Hub publish script, and Streamlit remote-inference UI#5
brej-29 merged 5 commits intomainfrom
cosine/internal-eval-4bit-hf-publish-streamlit

Conversation

@brej-29
Copy link
Copy Markdown
Owner

@brej-29 brej-29 commented Jan 10, 2026

This PR implements Task 4 features: 4-bit loading in internal evaluation, a Hugging Face Hub publish workflow for adapters, and a Streamlit UI that uses remote inference via Hugging Face Inference. It adds tests, docs, and lightweight UI scaffolding to enable end-to-end workflow without loading large models locally.

What’s included

Part A — evaluate_internal.py 4-bit support and smoke mode

  • Extend scripts/evaluate_internal.py with CLI flags:
    --load_in_4bit, --bnb_4bit_quant_type, --bnb_4bit_compute_dtype, --bnb_4bit_use_double_quant
  • Mirror evaluate_spider_external.py logic: when load_in_4bit is enabled, use transformers.BitsAndBytesConfig and pass quantization_config to from_pretrained. Ensure CPU compatibility by disabling 4-bit if no CUDA or CUDA unavailable and gracefully fallback with a warning.
  • Keep adapter loading unchanged (PEFT adapter_dir).
  • Add a clear log line indicating whether 4-bit was enabled or skipped.
  • Add a lightweight smoke mode: python scripts/evaluate_internal.py --smoke that loads a tiny sample (e.g., 5) and exits with code 0. In CPU-only environments this mode auto-falls back to mock behavior to avoid heavy model loading.
  • Update tests to cover 4-bit and smoke flags (test_eval_cli_args updated accordingly).

Part B — Hugging Face Hub publish workflow

  • Add new script scripts/publish_to_hub.py to publish adapter artifacts to HF Hub.
    • CLI:
      --repo_id (required)
      --adapter_dir (default outputs/adapters)
      --private (bool, default False)
      --commit_message (default: Add QLoRA adapter artifacts)
      --include_metrics (optional path to a metrics JSON file)
    • Uses huggingface_hub.HfApi to create repo if missing and upload_folder for the adapters.
    • Ensures a README.md model card is present in adapter_dir with metadata:
      description of the adapter, training dataset, usage notes, safety, and metrics if provided.
    • Fails gracefully with a clear message if HF token is missing.
  • README and docs updated to describe how to publish to HF Hub and remote inference notes.

Part C — Streamlit UI for remote HF Inference

  • Add UI at app/streamlit_app.py that runs on Streamlit Community Cloud and calls remote inference using hugggingface_hub.InferenceClient.
    • UI inputs: Schema (DDL) and Question (NL). Button: Generate SQL. Output shows SQL in a code block with a copy option and an optional Show prompt expander.
    • InferenceClient construction priority:
      1. If st.secrets["HF_INFERENCE_BASE_URL"] is set, use InferenceClient(base_url=..., api_key=HF_TOKEN).
      2. Else use InferenceClient(model=HF_MODEL_ID, api_key=HF_TOKEN, provider=HF_PROVIDER).
    • Secrets expected: HF_TOKEN, HF_MODEL_ID, optional HF_INFERENCE_BASE_URL, HF_PROVIDER.
    • Lightweight: app does not import torch/transformers; includes timeouts and user-friendly errors.
  • Add .streamlit/secrets.toml.example with placeholders for HF_TOKEN, HF_MODEL_ID, HF_INFERENCE_BASE_URL, HF_PROVIDER; ensure gitignore ignores secrets.toml.
  • Update requirements.txt to include streamlit and huggingface_hub.
  • Documentation in README.md and context.md explains how to run Streamlit locally and on Streamlit Cloud, and how remote inference works.

Project-wide improvements

  • src/text2sql/infer.py updated to accept new 4-bit knobs (bnb_4bit_quant_type, bnb_4bit_use_double_quant) and to log quantization settings clearly; supports loading with 4-bit quantization when requested.
  • docs/evaluation.md updated to reflect 4-bit args and smoke mode behavior.
  • context.md updated to reflect Task 4 extensions, including details about 4-bit eval, HF Hub publishing, and Streamlit remote UI.
  • Added smoke-friendly test coverage ensuring CLI args parsing supports new flags and smoke mode.

How to use (quick references)

  • 4-bit evaluation with CPU fallback (smoke):
    python scripts/evaluate_internal.py --smoke --val_path data/processed/val.jsonl --out_dir reports/
  • Publish adapter artifacts to HF Hub:
    python scripts/publish_to_hub.py --repo_id your-username/analytics-copilot-text2sql-mistral7b-qlora --adapter_dir outputs/adapters --private
  • Run Streamlit UI locally:
    cp .streamlit/secrets.toml.example .streamlit/secrets.toml
    streamlit run app/streamlit_app.py
  • Remote inference notes: UI uses HF InferenceClient; Streamlit Cloud does not load models locally; if serverless inference is insufficient for large models, consider Inference Endpoints.

Notes on tests and quality gates

  • pytest -q should pass locally with new tests added for 4-bit args and smoke.
  • python -m compileall . should succeed (no syntax regressions).
  • The publish_to_hub.py script is robust to missing HF tokens and repository creation errors, with clear error messages.
  • The UI app is lightweight and does not import heavy ML libraries; it relies on HF Inference for generation.

This PR delivers the end-to-end workflow for 4-bit evaluation, HF Hub publishing, and a remote-inference Streamlit UI, aligned with backward-compatible defaults and robust logging/diagnostics.


This pull request was co-created with Cosine Genie

Original Task: analytics-copilot-text2sql/40b8o5133snj
Author: Brejesh Balakrishnan

brej-29 and others added 5 commits January 10, 2026 17:38
…nd 4-bit quant support with smoke tests; update docs and tests

Co-authored-by: Cosine <agent@cosine.sh>
…nused import, adjust exception typing, improve readme generation quoting)

Co-authored-by: Cosine <agent@cosine.sh>
…tions and auto README; include HuggingFace deploy docs; add tests

Co-authored-by: Cosine <agent@cosine.sh>
…or client creation to use it, and add smoke script and tests

Co-authored-by: Cosine <agent@cosine.sh>
@brej-29 brej-29 merged commit 687ca67 into main Jan 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant