Skip to content

Update domino example#975

Closed
hwchen2017 wants to merge 338 commits intomasterfrom
hongwei_domino_example
Closed

Update domino example#975
hwchen2017 wants to merge 338 commits intomasterfrom
hongwei_domino_example

Conversation

@hwchen2017
Copy link
Contributor

  1. Fix the performance issue of the previous example
  2. Update documentation

MerHS and others added 30 commits September 21, 2022 07:22
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR reorganizes and refactors the DeepSpeed huggingface inference examples.

Changes in this PR:
- Remove Transformers folder
- Add README(s)
  - pip install requirements.txt
  - Code example
  - Point to benchmarking and other resources in DeepSpeed repo
- Normalize all names i.e. test-[model_name].py
- Add T5 translation task (English to French)
- Add huggingface pipeline() object and refactor test-gptj.py
- Create folders for different types of ML tasks (text-generation, fill-mask, etc)
- Add BERT fill-mask example
- Update queries to something more sensible
- TODO: Add test-bloom to text-generation folder
- Fix typos in code comments

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR adds a bloom inference example (bigscience/bloom-3b) and a corresponding helper Pipeline class meant to mimic the functionality and API of the huggingface pipelines. This class was added in order to comprehend bloom meta tensors and checkpoint loading in a more organized way, that closely matched the existing examples.

This PR also cleans up extra whitespace across the inference examples.

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Add presharded model support to BLOOM example

Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR adds device comprehension to the BLOOM Pipeline utility class to expand support for devices and also support the case where the DeepSpeed init_inference API isn't used.
)

* Fix inference-test.py script and update README

* change init arg name to model to match HF

* revert to model_name

* Update README to point to proper header
Co-authored-by: Cheng Li <pistasable@gmail.com>
This PR sets replace_with_kernel_inject=True in the BERT fill-mask inference example.
* initial commit

* update random-ltd

* add vit

* vision transformer

* update-name

* saving without randomltd

* update naming

* update for dynamic train

* update for dynamic train

* checking kernel implementation

* check kernel acc

* update json

* fix for cifar randomltd

* vit-finetuning

* refactor

* refacrtor

* refactor

* update readme

* update readme

* update readme

* update readme

* move to bash

* training log

* training log

* clean and update gpt

* output

* rename dir

* cleanup

* fix

* fix

Co-authored-by: xiaoxiawu <xiaoxiawu@microsoft.com>
Co-authored-by: xiaoxiawu <xiaoxiawu>
Co-authored-by: molly-smith <mosm@microsoft.com>
Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Add bandwidth and throughput test to inference-test

* Print per token latency

* Remove 'dict' replace method option

* Update inference/huggingface/text-generation/inference-test.py

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* Refactor with Mike's suggestion

* Accidentally removed printing output

* Create num_bytes variable

---------

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
* data efficiency example update

* data efficiency update
This PR adds a DeepSpeed Stable Diffusion example using the prompthero/midjourney-v4-diffusion model.
This PR updates how the enable_cuda_graph param is set depending on the world_size i.e. CUDA graphs should only be enabled when world_size==1.
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Shuaiwen Leon Song <124002815+leonsongmsft@users.noreply.github.com>
Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
zhangsmallshark and others added 25 commits November 7, 2024 13:27
* add domino

* use transformer from deepspeed

* clean args

* mega opt

* add opt & timer

* add opt

* fix loss

* folder name

* Change arguent in pretrain script

* Add readme for domino

* Update readme for domino

* Fixing usage issues

* update dataset

* megatron dependencies

* path

* Update README.md

* remove imports

* update import

* Update README.md

* Minor example script changes

* train bash

* require

* Update README.md

---------

Co-authored-by: chengming-zhang <chengming.zhang@anl.gov>
Co-authored-by: Zheyu SHEN <zyshen@umd.edu>
Co-authored-by: root <root@ecehpavw1202b.umd.edu>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* add benchmarking for offloading states

* fix api names
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat.

* Add training scripts for step2 DPO in DeepSpeed-Chat.

* Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat.

* Update training scripts of step2 DPO in DeepSpeed-Chat.

* Follow upstream fixes.

* Update README.md for Step2 DPO finetuning.

* Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat.

* Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
* Update weights_only due to change in default in torch>=2.6

Signed-off-by: Logan Adams <loadams@microsoft.com>

* formatting

Signed-off-by: Logan Adams <loadams@microsoft.com>

---------

Signed-off-by: Logan Adams <loadams@microsoft.com>
* moved example from DeepSpeed PR #7104 to this repo

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* replaced T by S for sequence length

* replaced T by S for sequence length

* replaced T by S for sequence length

* more detailed explanation

* --pipeline-num-stages is now a comd line argument

* cleaner syntax

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* import files for deepcompile benchmark

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* add figures

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* add figures

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* update document

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* fix links to images

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* add images

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* specify deepspeed version

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

---------

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* update description of versions for deepcompile

* Update to match specific tag name

Signed-off-by: Logan Adams <loadams@microsoft.com>

---------

Signed-off-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
* update description of versions for deepcompile

* fix deepcompile benchmark script

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* fix benchmark for z1

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

* add options for deepcompile bench

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>

---------

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
* update tp example

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

* update

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

* add length bench file

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

---------

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
@hwchen2017 hwchen2017 requested a review from loadams June 8, 2025 20:10
@hwchen2017 hwchen2017 requested a review from tjruwase as a code owner June 8, 2025 20:10
@hwchen2017 hwchen2017 closed this Jun 8, 2025
@hwchen2017 hwchen2017 force-pushed the hongwei_domino_example branch from c806b91 to 9478a6f Compare June 8, 2025 20:22
@hwchen2017 hwchen2017 deleted the hongwei_domino_example branch June 12, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.