Added distilkit tasks by ParamThakkar123 · Pull Request #27 · transformerlab/transformerlab-examples

ParamThakkar123 · 2026-03-27T15:17:41Z

Added a new task for DistillKit (https://github.com/arcee-ai/DistillKit), a flexible toolkit for knowledge distillation of large language models.

Changes

Created `distilkit-distillation/` directory
Added `task.yaml` with task configuration including setup to clone and install DistillKit, run parameters for distillation
Added `train.py` script that generates a DistillKit config.yaml based on parameters and executes the distillation

Features

Supports offline distillation using pre-captured teacher outputs
Configurable student model, teacher dataset, loss functions, training arguments
Integrates with TransformerLab's job tracking, logging, and artifact saving
Uses WandB for optional logging

Parameters

`model`: Student model (e.g., Qwen/Qwen3-8B)
`train_dataset_repo`: HF repo for teacher dataset (e.g., arcee-ai/Qwen3-235B-Logits-Packed-8192)
Various training and compression settings (vocab_size, k, exact_k, epochs, batch size, etc.)

How to Test

In TransformerLab, select the 'distilkit-distillation' task
Configure parameters (use defaults for quick test)
Run the task (requires A100 GPU, HF_TOKEN secret)
Monitor progress and check saved model/artifacts

Note: For local testing, ensure DistillKit is installed as per its docs, and have access to the specified HF datasets.

…ab-examples into add/distilkit

deep1401

This fails for me. I got a A100 on runpod and ran and I see this error:

File "/usr/local/bin/distillkit", line 7, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1485, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1406, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1269, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 824, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.ssh/distillkit/distillkit/main.py", line 393, in main
do_distill(config)
File "/root/.ssh/distillkit/distillkit/main.py", line 306, in do_distill
tokenizer = load_tokenizer(config)
^^^^^^^^^^^^^^^^^^^^^^
File "/root/.ssh/distillkit/distillkit/main.py", line 288, in load_tokenizer
return transformers.AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 693, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 530, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 278, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 512, in cached_files
raise e
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 422, in cached_files
hf_hub_download(
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 88, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 997, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1216, in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1828, in _download_to_tmp_and_move
with incomplete_path.open("ab") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 5] Input/output error

ParamThakkar123 and others added 4 commits March 27, 2026 20:47

Added distilkit tasks

f0aff99

Merge branch 'main' of https://github.com/transformerlab/transformerl…

72d61a2

…ab-examples into add/distilkit

Update task.yaml

e7ba284

Fix output path in training configuration

0595f4c

deep1401 requested review from Copilot and deep1401 and removed request for Copilot April 17, 2026 15:41

deep1401 requested changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added distilkit tasks#27

Added distilkit tasks#27
ParamThakkar123 wants to merge 4 commits intomainfrom
add/distilkit

ParamThakkar123 commented Mar 27, 2026 •

edited

Loading

Uh oh!

deep1401 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ParamThakkar123 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Features

Parameters

How to Test

Uh oh!

deep1401 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ParamThakkar123 commented Mar 27, 2026 •

edited

Loading