Closed
Conversation
Contributor
hwchen2017
commented
Jun 8, 2025
- Fix the performance issue of the previous example
- Update documentation
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR reorganizes and refactors the DeepSpeed huggingface inference examples. Changes in this PR: - Remove Transformers folder - Add README(s) - pip install requirements.txt - Code example - Point to benchmarking and other resources in DeepSpeed repo - Normalize all names i.e. test-[model_name].py - Add T5 translation task (English to French) - Add huggingface pipeline() object and refactor test-gptj.py - Create folders for different types of ML tasks (text-generation, fill-mask, etc) - Add BERT fill-mask example - Update queries to something more sensible - TODO: Add test-bloom to text-generation folder - Fix typos in code comments Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR adds a bloom inference example (bigscience/bloom-3b) and a corresponding helper Pipeline class meant to mimic the functionality and API of the huggingface pipelines. This class was added in order to comprehend bloom meta tensors and checkpoint loading in a more organized way, that closely matched the existing examples. This PR also cleans up extra whitespace across the inference examples. Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Add presharded model support to BLOOM example Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
This PR adds device comprehension to the BLOOM Pipeline utility class to expand support for devices and also support the case where the DeepSpeed init_inference API isn't used.
Co-authored-by: Cheng Li <pistasable@gmail.com>
This PR sets replace_with_kernel_inject=True in the BERT fill-mask inference example.
* initial commit * update random-ltd * add vit * vision transformer * update-name * saving without randomltd * update naming * update for dynamic train * update for dynamic train * checking kernel implementation * check kernel acc * update json * fix for cifar randomltd * vit-finetuning * refactor * refacrtor * refactor * update readme * update readme * update readme * update readme * move to bash * training log * training log * clean and update gpt * output * rename dir * cleanup * fix * fix Co-authored-by: xiaoxiawu <xiaoxiawu@microsoft.com> Co-authored-by: xiaoxiawu <xiaoxiawu>
Co-authored-by: molly-smith <mosm@microsoft.com> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Add bandwidth and throughput test to inference-test * Print per token latency * Remove 'dict' replace method option * Update inference/huggingface/text-generation/inference-test.py Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> * Refactor with Mike's suggestion * Accidentally removed printing output * Create num_bytes variable --------- Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
* data efficiency example update * data efficiency update
This PR adds a DeepSpeed Stable Diffusion example using the prompthero/midjourney-v4-diffusion model.
This PR updates how the enable_cuda_graph param is set depending on the world_size i.e. CUDA graphs should only be enabled when world_size==1.
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Shuaiwen Leon Song <124002815+leonsongmsft@users.noreply.github.com> Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
fix URLs
* add domino * use transformer from deepspeed * clean args * mega opt * add opt & timer * add opt * fix loss * folder name * Change arguent in pretrain script * Add readme for domino * Update readme for domino * Fixing usage issues * update dataset * megatron dependencies * path * Update README.md * remove imports * update import * Update README.md * Minor example script changes * train bash * require * Update README.md --------- Co-authored-by: chengming-zhang <chengming.zhang@anl.gov> Co-authored-by: Zheyu SHEN <zyshen@umd.edu> Co-authored-by: root <root@ecehpavw1202b.umd.edu> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* add benchmarking for offloading states * fix api names
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat. * Add training scripts for step2 DPO in DeepSpeed-Chat. * Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat. * Update training scripts of step2 DPO in DeepSpeed-Chat. * Follow upstream fixes. * Update README.md for Step2 DPO finetuning. * Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat. * Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
* Update weights_only due to change in default in torch>=2.6 Signed-off-by: Logan Adams <loadams@microsoft.com> * formatting Signed-off-by: Logan Adams <loadams@microsoft.com> --------- Signed-off-by: Logan Adams <loadams@microsoft.com>
* moved example from DeepSpeed PR #7104 to this repo * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * replaced T by S for sequence length * replaced T by S for sequence length * replaced T by S for sequence length * more detailed explanation * --pipeline-num-stages is now a comd line argument * cleaner syntax * Update training/data_efficiency/variable_batch_size_and_lr/README.md --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* import files for deepcompile benchmark Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add figures Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add figures Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * update document Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * fix links to images Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add images Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * specify deepspeed version Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* update description of versions for deepcompile * Update to match specific tag name Signed-off-by: Logan Adams <loadams@microsoft.com> --------- Signed-off-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Logan Adams <loadams@microsoft.com>
* update description of versions for deepcompile * fix deepcompile benchmark script Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * fix benchmark for z1 Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> * add options for deepcompile bench Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
* update tp example Signed-off-by: inkcherry <mingzhi.liu@intel.com> * update Signed-off-by: inkcherry <mingzhi.liu@intel.com> * add length bench file Signed-off-by: inkcherry <mingzhi.liu@intel.com> --------- Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
Signed-off-by: Hongwei Chen <hongweichen@microsoft.com>
c806b91 to
9478a6f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.