BitDistill

TODO

Fix the training speed problem when using NVIDIA GPUs. (We run all experiments on AMD Mi300x)
Support more types of models (now only Qwen series are supported).

Dataset

Please organize the dataset in the format used by LLaMA-Factory.

Docker

AMD: rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.5.1
NVIDIA: yushuiwx/rl:v2.0.2

Environments setup

AMD

bash mi300_setup.sh

NVIDIA

bash setup.sh

Training Commands

qwen3 series exps please refer to

bash qwen3-exp.sh

for training deepseekdistill fp16 baseline on downstream task:
- $lr: learning rate
- $model: Qwen model name
- $gpu: gpu index for training

bash ds-exp-run-sft-baseline.sh $lr $model $gpu

for training deepseekdistill bitdistill on downstream task using fp16 baseline as teacher:
- $teacher: local path for fp16 huggingface-format teacher model
- $beta: loss weight for logits distillation
- $minilmweight: loss weight for minilm v2 distillation
- $distilllayer: use which layer to apply minilm v2 distillation

bash ds-exp-run-sft-bitdistill.sh $lr $model $gpu $teacher $beta $minilmweight $distilllayer

BitNet Model Test Demo

./test-ds-model/test.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
data		data
docker		docker
evaluation		evaluation
examples		examples
scripts		scripts
src		src
test-ds-model		test-ds-model
tests		tests
tools		tools
yamls/training_args		yamls/training_args
.dockerignore		.dockerignore
.env.local		.env.local
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
ds-exp-run-sft-baseline.sh		ds-exp-run-sft-baseline.sh
ds-exp-run-sft-bitdistill.sh		ds-exp-run-sft-bitdistill.sh
local_docker.sh		local_docker.sh
mi300_setup.sh		mi300_setup.sh
pyproject.toml		pyproject.toml
qwen3-exp.sh		qwen3-exp.sh
requirements.txt		requirements.txt
run_qwen3_sft_baseline.sh		run_qwen3_sft_baseline.sh
run_qwen3_sft_distill.sh		run_qwen3_sft_distill.sh
setup.py		setup.py
setup.sh		setup.sh
test_qwen3.py		test_qwen3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BitDistill

TODO

Dataset

Docker

Environments setup

Training Commands

BitNet Model Test Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BitDistill

TODO

Dataset

Docker

Environments setup

Training Commands

BitNet Model Test Demo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages