Update domino example by hwchen2017 · Pull Request #975 · deepspeedai/DeepSpeedExamples

hwchen2017 · 2025-06-08T20:10:28Z

Fix the performance issue of the previous example
Update documentation

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

This PR reorganizes and refactors the DeepSpeed huggingface inference examples. Changes in this PR: - Remove Transformers folder - Add README(s) - pip install requirements.txt - Code example - Point to benchmarking and other resources in DeepSpeed repo - Normalize all names i.e. test-[model_name].py - Add T5 translation task (English to French) - Add huggingface pipeline() object and refactor test-gptj.py - Create folders for different types of ML tasks (text-generation, fill-mask, etc) - Add BERT fill-mask example - Update queries to something more sensible - TODO: Add test-bloom to text-generation folder - Fix typos in code comments Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

This PR adds a bloom inference example (bigscience/bloom-3b) and a corresponding helper Pipeline class meant to mimic the functionality and API of the huggingface pipelines. This class was added in order to comprehend bloom meta tensors and checkpoint loading in a more organized way, that closely matched the existing examples. This PR also cleans up extra whitespace across the inference examples. Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

Add presharded model support to BLOOM example Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

This PR adds device comprehension to the BLOOM Pipeline utility class to expand support for devices and also support the case where the DeepSpeed init_inference API isn't used.

) * Fix inference-test.py script and update README * change init arg name to model to match HF * revert to model_name * Update README to point to proper header

Co-authored-by: Cheng Li <pistasable@gmail.com>

This PR sets replace_with_kernel_inject=True in the BERT fill-mask inference example.

* initial commit * update random-ltd * add vit * vision transformer * update-name * saving without randomltd * update naming * update for dynamic train * update for dynamic train * checking kernel implementation * check kernel acc * update json * fix for cifar randomltd * vit-finetuning * refactor * refacrtor * refactor * update readme * update readme * update readme * update readme * move to bash * training log * training log * clean and update gpt * output * rename dir * cleanup * fix * fix Co-authored-by: xiaoxiawu <xiaoxiawu@microsoft.com> Co-authored-by: xiaoxiawu <xiaoxiawu>

Co-authored-by: molly-smith <mosm@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Add bandwidth and throughput test to inference-test * Print per token latency * Remove 'dict' replace method option * Update inference/huggingface/text-generation/inference-test.py Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> * Refactor with Mike's suggestion * Accidentally removed printing output * Create num_bytes variable --------- Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* data efficiency example update * data efficiency update

This PR adds a DeepSpeed Stable Diffusion example using the prompthero/midjourney-v4-diffusion model.

This PR updates how the enable_cuda_graph param is set depending on the world_size i.e. CUDA graphs should only be enabled when world_size==1.

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Shuaiwen Leon Song <124002815+leonsongmsft@users.noreply.github.com> Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

fix URLs

* add domino * use transformer from deepspeed * clean args * mega opt * add opt & timer * add opt * fix loss * folder name * Change arguent in pretrain script * Add readme for domino * Update readme for domino * Fixing usage issues * update dataset * megatron dependencies * path * Update README.md * remove imports * update import * Update README.md * Minor example script changes * train bash * require * Update README.md --------- Co-authored-by: chengming-zhang <chengming.zhang@anl.gov> Co-authored-by: Zheyu SHEN <zyshen@umd.edu> Co-authored-by: root <root@ecehpavw1202b.umd.edu> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

…for Domino (#939)

* add benchmarking for offloading states * fix api names

* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat. * Add training scripts for step2 DPO in DeepSpeed-Chat. * Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat. * Update training scripts of step2 DPO in DeepSpeed-Chat. * Follow upstream fixes. * Update README.md for Step2 DPO finetuning. * Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat. * Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Signed-off-by: Logan Adams <loadams@microsoft.com>

* Update weights_only due to change in default in torch>=2.6 Signed-off-by: Logan Adams <loadams@microsoft.com> * formatting Signed-off-by: Logan Adams <loadams@microsoft.com> --------- Signed-off-by: Logan Adams <loadams@microsoft.com>

* moved example from DeepSpeed PR #7104 to this repo * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * replaced T by S for sequence length * replaced T by S for sequence length * replaced T by S for sequence length * more detailed explanation * --pipeline-num-stages is now a comd line argument * cleaner syntax * Update training/data_efficiency/variable_batch_size_and_lr/README.md --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

* import files for deepcompile benchmark Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add figures Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add figures Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * update document Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * fix links to images Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add images Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * specify deepspeed version Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* update description of versions for deepcompile * Update to match specific tag name Signed-off-by: Logan Adams <loadams@microsoft.com> --------- Signed-off-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Logan Adams <loadams@microsoft.com>

* update description of versions for deepcompile * fix deepcompile benchmark script Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * fix benchmark for z1 Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add options for deepcompile bench Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* update tp example Signed-off-by: inkcherry <mingzhi.liu@intel.com> * update Signed-off-by: inkcherry <mingzhi.liu@intel.com> * add length bench file Signed-off-by: inkcherry <mingzhi.liu@intel.com> --------- Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

MerHS and others added 30 commits September 21, 2022 07:22

typo on zero_stage_3_config (#200)

9d12ff9

clarify old megatron code with warning

e15de09

Remove Megatron fork form DeepSpeedExamples repository (#205)

e8d8546

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

Update test-gptj.py

25b71ff

Adding the baseline gpt2 example

433deee

Fix warning in GPT-Neo example (#210)

7b5ab16

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

Update README.md

96a9b6d

Add presharded model support to BLOOM example (#213)

baed2bb

Add presharded model support to BLOOM example Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>

Add device comprehension to BLOOM Pipeline utility class (#217)

df0beb7

This PR adds device comprehension to the BLOOM Pipeline utility class to expand support for devices and also support the case where the DeepSpeed init_inference API isn't used.

unify the text-generation inference examples with more parameters (#218)

aef0e0d

Fix the inference-test.py script and update text-generation README (#219

f8931eb

) * Fix inference-test.py script and update README * change init arg name to model to match HF * revert to model_name * Update README to point to proper header

add zeroquant-lkd example (#214)

92bfd93

Co-authored-by: Cheng Li <pistasable@gmail.com>

Set replace_with_kernel_inject=True in BERT fill-mask example (#225)

5125cb2

This PR sets replace_with_kernel_inject=True in the BERT fill-mask inference example.

fix version requirement (#231)

041e65d

Meta inference for different models (#223)

4353421

Co-authored-by: molly-smith <mosm@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Data efficiency library finetuning example update (#238)

99569ea

* data efficiency example update * data efficiency update

Check checkpoint exists before loading them

b9fe01e

Add DeepSpeed Stable Diffusion Example (#233)

50e015e

This PR adds a DeepSpeed Stable Diffusion example using the prompthero/midjourney-v4-diffusion model.

DeepSpeedExamples major restructure (#245)

5894f48

Handle enable CUDA graph param in SD example (#246)

3afe24f

This PR updates how the enable_cuda_graph param is set depending on the world_size i.e. CUDA graphs should only be enabled when world_size==1.

Create README.md (#255)

d12328e

Add benchmarks (#254)

5bebc8f

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>

Update README.md (#256)

7c50819

Add triton==2.0.0.dev20221202 to SD requirements (#258)

88f935a

Update README.md

94bbb42

fix URLs

zhangsmallshark and others added 25 commits November 7, 2024 13:27

Update DeepSpeed version requirement to >=0.16.0 in requirements.txt …

73e5474

…for Domino (#939)

Example and benchmark of APIs to offload states (#942)

223a1f3

* add benchmarking for offloading states * fix api names

remove-redundant-code (#947)

e2a7aa6

Update references to torchvision (#949)

1cd0fa8

Cleanup CODEOWNERS (#953)

112d83d

fix: the json format of the training imagenet configuration file (#954)

dd0a578

Update references to deepspeedai GH org (#955)

a76683f

Signed-off-by: Logan Adams <loadams@microsoft.com>

Fix: Add output_folder parameter and correct print statement (#962)

2a575e5

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

run domino example on amd (#958)

748c16c

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

update runner image (#968)

474e15f

fix links (#970)

e4a6841

remove files

0df7125

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

add megatron submodule

eb1b6b7

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

add temp changes; doesn't work

549154b

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

working example

379f572

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

polish code and doc

d13d9c8

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

update

9478a6f

Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>

hwchen2017 requested a review from loadams June 8, 2025 20:10

hwchen2017 requested a review from tjruwase as a code owner June 8, 2025 20:10

hwchen2017 closed this Jun 8, 2025

hwchen2017 force-pushed the hongwei_domino_example branch from c806b91 to 9478a6f Compare June 8, 2025 20:22

hwchen2017 deleted the hongwei_domino_example branch June 12, 2025 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update domino example#975

Update domino example#975
hwchen2017 wants to merge 338 commits intomasterfrom
hongwei_domino_example

hwchen2017 commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

hwchen2017 commented Jun 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants