Checklist
Motivation
There are some areas of improvements while getting on board for and developing SpecForge.
- The current setup steps isn't clearly documented, which may slow down the on board process. A markdown can be helpful for both the beginners of SpecForge and of remote dev with .devcontainer.
- The dev Dockerfile can also be improved to support the remote dev experience.
- Using the sglang image as the base will lead to many package search duplications. Even though pip install will show
Requirement already satisfied, spending time traversing/resolving package list can be easily simplified by a layer of docker cache.
- Dockerfile should configure the virtual environment properly. The SpecForge virtual environment isn't set in the PATH variable, which makes the command like
pip calling the global pip command (confusingly, python and pip3 points to the virtual environment executables).
- Dockerfile should install SpecForge-specific dependencies itself. The virtual environment activated in the above step seems to inherit directly from SGLang. However, SpecForge has its own dependencies to install. Baking dependency installation in Dockerfile can reuse build caches on the remote machine to reduce build time.
- The steps to execute training examples are not clear. At least two steps are necessary after mounting to the remote container (after it is built properly and the venv is activated):
- Generate a dataset, e.g.
python scripts/prepare_data.py --dataset sharegpt
- Hugging face login
huggingface-cli login
- Execute
bash examples/run_llama3.1_8b_eagle3_online.sh
Therefore
Related resources
A new comer's on board & dev experience by cloning the SpecForge repo to the remote machine and then Dev Container: Open Folder in a Container.
Note:
Dev Containers: Clone repo in a container will not work because of a bug saying Docker api is 1.41. This may also complicate the git credentials setup.
Dev Containers: Attach to a Running Container: Git repo does not exist and git credentials also do not exist. Also the environment is not setup for SpecForge - dependencies need to be installed manually (Yet Dockerfile provides build caches).
Checklist
Motivation
There are some areas of improvements while getting on board for and developing SpecForge.
Requirement already satisfied, spending time traversing/resolving package list can be easily simplified by a layer of docker cache.pipcalling the global pip command (confusingly,pythonandpip3points to the virtual environment executables).python scripts/prepare_data.py --dataset sharegpthuggingface-cli loginbash examples/run_llama3.1_8b_eagle3_online.shTherefore
.devcontainer/Dockerfileshould contain virtual environment setup commands.Related resources
A new comer's on board & dev experience by cloning the SpecForge repo to the remote machine and then
Dev Container: Open Folder in a Container.Note:
Dev Containers: Clone repo in a containerwill not work because of a bug saying Docker api is 1.41. This may also complicate the git credentials setup.Dev Containers: Attach to a Running Container: Git repo does not exist and git credentials also do not exist. Also the environment is not setup for SpecForge - dependencies need to be installed manually (Yet Dockerfile provides build caches).