Experiment on fine-tuning a Qwen model for data extraction with dynamic forms and short, natural speech.
For this kind of task, LLMs with prompt engineering are usually enough. For small setups, a small language model (SLM) fine-tuned for extraction can be cheaper and easier to self-host. This experiment was done to see how well the idea translates to real use and to other domains later on.
- Frontend: React (Vite, Tailwind) chat UI—mostly vibe-coded, with some pieces (e.g. WebSocket client, config) taken from earlier projects for cleaner setup. Define form fields in the sidebar, send text via chat; WebSocket (MessagePack) to the API.
- Backend: Redis pub/sub with FastAPI as gateway; API on port 4000, WebSocket at
/ws. - Extraction service: Loads Qwen2.5-3B + LoRA adapter from
artifacts/lora_formfill, runs inference on extraction tasks from Redis, publishes results. Uses snake case for field keys; stops generation at the first complete JSON object. - Training service: LoRA fine-tuning (PEFT, bitsandbytes 4-bit) on Qwen; saves adapter to
artifacts/lora_formfill. Optional; uncomment indocker-compose.ymland run when (re)training. - STT service: Speech-to-text via Redis; optional for the extraction flow.
Session state (form fields + chat history) is stored in Redis and restored on reload. Clear-chat button resets the conversation.
Requires Docker (and NVIDIA Container Toolkit if using GPU for extraction/training).
docker compose up --build- API: http://localhost:4000
- Frontend: http://localhost:3000 (WebSocket target
ws://localhost:4000)
Place the LoRA adapter under ./artifacts/lora_formfill (e.g. by running the training service once); otherwise extraction returns empty forms.
- Chat UI with configurable form fields (left panel), message list, and filled-form blocks with edit/save.
- WebSocket API:
chat(extraction),save_form,transcribe(STT); session GET/PUT for persistence. - Extraction pipeline: prompt + 4-bit base model + LoRA, JSON output, snake_case normalization for field names.
- Training pipeline: dataset from CSV, same prompt format, LoRA training, adapter written to
artifacts/lora_formfill. - Backend logging (API and extraction-service) to stdout and log files; no prompt/model output sent to the frontend.
Notes:
- Stopping criteria: The model often produced valid JSON, but as a small model it frequently continued with extra dialogue (e.g. follow-up turns). Generation is stopped after the first complete top-level JSON object. A simple brace-matching stack is used to detect the closing
}so we only keep and parse that first object. - STT: Speech-to-text wiring exists (channel, payloads) but is not fully implemented; to be done.
Benchmarking (manual and otherwise) indicates that the fine-tuned model performs better than the base Qwen model for this extraction task. Results are good, with the usual caveats: occasional bugs and clear room for improvement (data quality, prompt tweaks, more training). The stack is runnable end-to-end-chat, forms, session persistence, training pipeline and works as a small experimental baseline to build on or adapt for other domains.
