Fix Dockerfile for successful builds, improve /chat response quality, and add dev workflow tooling#8
Open
sriharip123 wants to merge 3 commits intogrctest:mainfrom
Open
Fix Dockerfile for successful builds, improve /chat response quality, and add dev workflow tooling#8sriharip123 wants to merge 3 commits intogrctest:mainfrom
sriharip123 wants to merge 3 commits intogrctest:mainfrom
Conversation
- Fix apt package issues (remove software-properties-common, lsb-release) - Patch const-correctness error in ggml-bitnet-mad.cpp for clang - Use pre-built GGUF model to skip broken HF-to-GGUF conversion - Add docker-compose.yml for easy container management - Add Postman collection covering all 19 API endpoints
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes the FastAPI-BitNet project buildable and usable out of the box with Docker, fixes the /chat endpoint to return clean responses, and adds developer convenience tooling.
Changes
Dockerfile fixes (81eb61b)
Removed unavailable packages (software-properties-common, lsb-release) for the python:3.10 base image
Switched to Debian repo clang package instead of the LLVM install script
Added sed patch for const-correctness error in ggml-bitnet-mad.cpp (clang is stricter than gcc)
Switched to pre-built GGUF model from microsoft/bitnet-b1.58-2B-4T-gguf to avoid broken HF-to-GGUF conversion (BitNetForCausalLM not supported by the converter)
Added docker-compose.yml for single-command builds
Added Postman collection covering all 19 API endpoints
Chat endpoint fix (d407312)
Switched from /completion to /v1/chat/completions so llama-server applies the correct LLaMA 3 chat template automatically
Added _clean_response() post-processor to strip repetitive patterns (Question:, Input:, (no answer), etc.)
Reduced default n_predict from 256 to 128 for cleaner output
Developer workflow (33e3d31)
Added volume mounts for main.py and lib/ in docker-compose.yml so code changes don't require a full rebuild
Added --reload flag to uvicorn for automatic hot-reloading on file changes