diff --git a/02_Using_the_LUMI_web_interface/Clone_with_JupyterLab.md b/02_Using_the_LUMI_web_interface/Clone_with_JupyterLab.md deleted file mode 100644 index 50af268..0000000 --- a/02_Using_the_LUMI_web_interface/Clone_with_JupyterLab.md +++ /dev/null @@ -1,20 +0,0 @@ -# Cloning the course git repository using JupyterLab UI - -1. Open a JupyterLab session using the Jupyter app on the LUMI web interface [www.lumi.csc.fi](https://www.lumi.csc.fi) - - Follow the instructions in the second part of the exercise for this session. You can then keep using the session - for the rest of the exercise. - -2. Once you have opened JupyterLab and opened your own folder in the navigation panel to the left, your browser should present a view like this (in this case for user `lukaspre`): - - ![After starting JupyterLab and opening your own folder, the navigation panel shows an empty list and the main screen a selection of apps to use in JupyterLab.](images/step0.png) - -4. Use the highlighted button to open the UI popup for cloning a git repository: - - ![The button for cloning a git repository is in the top-left corner, just above the file search input.](images/step1.png) - -5. Enter the repository URL ( [https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop](https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop) ) and press the "Clone" button. - - ![The repository URL should be entered in the opening popup.](images/step2.png) - - This will clone the respository in a new folder "Getting_Started_with_AI_workshop" in your directory on the course project scratch filesystem. diff --git a/02_Using_the_LUMI_web_interface/GPT-neo-IMDB-introduction.ipynb b/02_Using_the_LUMI_web_interface/GPT-neo-IMDB-introduction.ipynb index 229c9c0..eb7e40c 100644 --- a/02_Using_the_LUMI_web_interface/GPT-neo-IMDB-introduction.ipynb +++ b/02_Using_the_LUMI_web_interface/GPT-neo-IMDB-introduction.ipynb @@ -39,7 +39,7 @@ "outputs": [], "source": [ "import os\n", - "os.environ[\"HF_HOME\"] = \"/flash/project_465002178/hf-cache\"" + "os.environ[\"HF_HOME\"] = \"/flash/project_465002757/hf-cache\"" ] }, { diff --git a/02_Using_the_LUMI_web_interface/README.md b/02_Using_the_LUMI_web_interface/README.md index a9bec70..3107cc8 100644 --- a/02_Using_the_LUMI_web_interface/README.md +++ b/02_Using_the_LUMI_web_interface/README.md @@ -7,12 +7,10 @@ In this exercise you will gain first experience with using the LUMI web interface to navigate files and directories on the LUMI supercomputer. You will also set up your own copy of the exercise repository on the system, so that you can work on them without interfering with the other course participants. 1. Log in to the LUMI web interface: https://www.lumi.csc.fi - 2. Create your own subdirectory in `/project/project_465002178/` and `/scratch/project_465002178/`. Use your username for the directory name. You can either + 2. Create your own subdirectory in `/project/project_465002757/` and `/scratch/project_465002757/`. Use your username for the directory name. You can either - Use the built-in file explorer ("Home Directory"), or - Use the login node shell app in the webinterface - 3. Clone the [exercise repository](https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop) to your folder in `/project/project_465002178/`. You can either - - use the login node shell app in the webinterface, or - - start a Jupyter lab job and use the Jupyter lab UI for cloning Git repositories, see [Clone_with_JupyterLab.md](./Clone_with_JupyterLab.md) for an illustrated step-by-step guide for this. + 3. Clone the [exercise repository](https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop) to your folder in `/project/project_465002757/`. You can use the login node shell app in the webinterface for that. 4. Get familiar with the exercise repository layout. 2. Start an interactive Jupyter lab job and run inference with GPT-neo. @@ -20,15 +18,16 @@ In this exercise you will learn how to reserve resources for and start an interactive job to run a Jupyter notebook via the LUMI web interface. The notebook itself introduces you to our running example of finetuning a language model using PyTorch and the training libraries provided by Huggingface. In this exercise you will not do any training, but familiarise yourself a bit with the software and the base model. 1. Start an interactive Jupyter session: Open the Jupyter app (! not "Jupyter for Courses" !) in the LUMI webinterface and set the following settings before pressing `Launch` - - Project: `project_465002178 (LUST Training ...)` + - Project: `project_465002757 (LUST Training ...)` - Reservation: Use the course reservation `AI_workshop_Day1` (there should only be one available option) - Partition: `small-g` - Number of CPU cores: `7` - Memory (GB): `16` - Time: `0:30:00` - Working directory: `/project/$PROJECT` - - Python: `pytorch (Via CSC stack, limited support available)` - - Virtual environment path: leave empty + - Python: `lumi-multitorch (PyTorch, LUMI AI Factory)` + - Module version: You can use the default here. + - Enable virtual environment: Do not select this 2. Wait for the session to start, then press `Connect to Jupyter` > **Note** diff --git a/02_Using_the_LUMI_web_interface/images/step0.png b/02_Using_the_LUMI_web_interface/images/step0.png deleted file mode 100644 index ac4c4ff..0000000 Binary files a/02_Using_the_LUMI_web_interface/images/step0.png and /dev/null differ diff --git a/02_Using_the_LUMI_web_interface/images/step1.png b/02_Using_the_LUMI_web_interface/images/step1.png deleted file mode 100644 index 58de26c..0000000 Binary files a/02_Using_the_LUMI_web_interface/images/step1.png and /dev/null differ diff --git a/02_Using_the_LUMI_web_interface/images/step2.png b/02_Using_the_LUMI_web_interface/images/step2.png deleted file mode 100644 index 146ab64..0000000 Binary files a/02_Using_the_LUMI_web_interface/images/step2.png and /dev/null differ diff --git a/03_Your_first_AI_training_job_on_LUMI/GPT-neo-IMDB-finetuning.py b/03_Your_first_AI_training_job_on_LUMI/GPT-neo-IMDB-finetuning.py index b5f42d9..2a66437 100644 --- a/03_Your_first_AI_training_job_on_LUMI/GPT-neo-IMDB-finetuning.py +++ b/03_Your_first_AI_training_job_on_LUMI/GPT-neo-IMDB-finetuning.py @@ -73,13 +73,17 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -99,6 +103,7 @@ # Let's print one sample from the dataset. print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration train_batch_size = 32 # This just about fits into the VRAM of a single MI250x GCD with 16-bit floats @@ -140,6 +145,7 @@ print("Length of input_ids:", len(b["input_ids"])) break print("Length of dataset (tokenized)", len(train_dataset_tokenized)) + print ("\n" * 4) # #### Training # We use the Hugging Face trainer instead of a manual training loop. @@ -156,7 +162,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -167,6 +173,7 @@ print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/03_Your_first_AI_training_job_on_LUMI/README.md b/03_Your_first_AI_training_job_on_LUMI/README.md index bf736a2..62b4457 100644 --- a/03_Your_first_AI_training_job_on_LUMI/README.md +++ b/03_Your_first_AI_training_job_on_LUMI/README.md @@ -39,7 +39,7 @@ - `--model-name` (a name under which the model produced by the run will be stored; optional) - `--num-workers` (optional, is used to set the number of PyTorch dataloader processes) - Please set the paths to some destination of your choice within your `/scratch/project_465002178/` directory. + Please set the paths to some destination of your choice within your `/scratch/project_465002757/` directory. > **Tip** > diff --git a/03_Your_first_AI_training_job_on_LUMI/reference_solution/GPT-neo-IMDB-finetuning.py b/03_Your_first_AI_training_job_on_LUMI/reference_solution/GPT-neo-IMDB-finetuning.py index b5f42d9..2a66437 100644 --- a/03_Your_first_AI_training_job_on_LUMI/reference_solution/GPT-neo-IMDB-finetuning.py +++ b/03_Your_first_AI_training_job_on_LUMI/reference_solution/GPT-neo-IMDB-finetuning.py @@ -73,13 +73,17 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -99,6 +103,7 @@ # Let's print one sample from the dataset. print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration train_batch_size = 32 # This just about fits into the VRAM of a single MI250x GCD with 16-bit floats @@ -140,6 +145,7 @@ print("Length of input_ids:", len(b["input_ids"])) break print("Length of dataset (tokenized)", len(train_dataset_tokenized)) + print ("\n" * 4) # #### Training # We use the Hugging Face trainer instead of a manual training loop. @@ -156,7 +162,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -167,6 +173,7 @@ print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/GPT-neo-IMDB-finetuning.py b/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/GPT-neo-IMDB-finetuning.py index 04dbee2..5db36cc 100644 --- a/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/GPT-neo-IMDB-finetuning.py +++ b/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/GPT-neo-IMDB-finetuning.py @@ -79,13 +79,17 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -105,6 +109,7 @@ # Let's print one sample from the dataset. print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration train_batch_size = 32 # This just about fits into the VRAM of a single MI250x GCD with 16-bit floats @@ -147,6 +152,7 @@ print("Length of input_ids:", len(b["input_ids"])) break print("Length of dataset (tokenized)", len(train_dataset_tokenized)) + print ("\n" * 4) # #### Training # We use the Hugging Face trainer instead of a manual training loop. @@ -163,7 +169,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -174,6 +180,7 @@ print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/run.sh b/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/run.sh index b23c7b8..0c2a8f2 100644 --- a/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/run.sh +++ b/03_Your_first_AI_training_job_on_LUMI/reference_solution/resume_from_checkpoint/run.sh @@ -1,5 +1,5 @@ #!/bin/bash -#SBATCH --account=project_465002178 +#SBATCH --account=project_465002757 #SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available #SBATCH --partition=small-g #SBATCH --gpus-per-node=1 @@ -10,14 +10,14 @@ # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}" @@ -35,7 +35,7 @@ export OUTPUT_DIR=$SCRATCH/$USER/data/ export LOGGING_DIR=$SCRATCH/$USER/runs/ set -xv # print the command so that we can verify setting arguments correctly from the logs -srun singularity exec $CONTAINER \ +srun singularity run $CONTAINER \ python GPT-neo-IMDB-finetuning.py \ --model-name gpt-imdb-model \ --output-path $OUTPUT_DIR \ diff --git a/03_Your_first_AI_training_job_on_LUMI/reference_solution/run.sh b/03_Your_first_AI_training_job_on_LUMI/reference_solution/run.sh index 92c5a13..8d7120e 100644 --- a/03_Your_first_AI_training_job_on_LUMI/reference_solution/run.sh +++ b/03_Your_first_AI_training_job_on_LUMI/reference_solution/run.sh @@ -1,5 +1,5 @@ #!/bin/bash -#SBATCH --account=project_465002178 +#SBATCH --account=project_465002757 #SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available #SBATCH --partition=small-g #SBATCH --gpus-per-node=1 @@ -10,14 +10,14 @@ # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}" @@ -35,7 +35,7 @@ export OUTPUT_DIR=$SCRATCH/$USER/data/ export LOGGING_DIR=$SCRATCH/$USER/runs/ set -xv # print the command so that we can verify setting arguments correctly from the logs -srun singularity exec $CONTAINER \ +srun singularity run $CONTAINER \ python GPT-neo-IMDB-finetuning.py \ --model-name gpt-imdb-model \ --output-path $OUTPUT_DIR \ diff --git a/03_Your_first_AI_training_job_on_LUMI/run.sh b/03_Your_first_AI_training_job_on_LUMI/run.sh index fab3d08..9a69772 100644 --- a/03_Your_first_AI_training_job_on_LUMI/run.sh +++ b/03_Your_first_AI_training_job_on_LUMI/run.sh @@ -1,19 +1,19 @@ #!/bin/bash -#SBATCH --account=project_465002178 +#SBATCH --account=project_465002757 #SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available #SBATCH --partition=... ## # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}" diff --git a/08_Scaling_to_multiple_GPUs/GPT-neo-IMDB-finetuning.py b/08_Scaling_to_multiple_GPUs/GPT-neo-IMDB-finetuning.py index ffa2817..295a3b2 100644 --- a/08_Scaling_to_multiple_GPUs/GPT-neo-IMDB-finetuning.py +++ b/08_Scaling_to_multiple_GPUs/GPT-neo-IMDB-finetuning.py @@ -75,13 +75,17 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -101,6 +105,7 @@ # Let's print one sample from the dataset. print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration # @@ -143,6 +148,7 @@ print("Length of input_ids:", len(b["input_ids"])) break print("Length of dataset (tokenized)", len(train_dataset_tokenized)) + print ("\n" * 4) # #### Training # We use the Hugging Face trainer instead of a manual training loop. @@ -155,7 +161,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -166,6 +172,7 @@ print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/08_Scaling_to_multiple_GPUs/README.md b/08_Scaling_to_multiple_GPUs/README.md index 0d8fa9b..1e88c0d 100644 --- a/08_Scaling_to_multiple_GPUs/README.md +++ b/08_Scaling_to_multiple_GPUs/README.md @@ -63,16 +63,11 @@ 3. (Optional/Bonus): Set up CPU bindings. - In order to achieve optimal CPU-GPU data transfer performance we can ensure that each script runs on the CPU cores closest to the respective GPU. - As we are using torchrun to manage the worker processes, we cannot handle these CPU bindings via slurm but must set them up in our Python training script. + In order to achieve optimal CPU-GPU data transfer performance we can ensure that each script runs on the CPU cores closest to the respective GPU. You can find a [figure showing which cores ae closest to which GCD](https://docs.lumi-supercomputer.eu/assets/images/lumig-cpu-gpu-links.svg) on the [LUMI Docs LUMI-G page](https://docs.lumi-supercomputer.eu/hardware/lumig/). - 1. Edit [08_Scaling_to_multiple_GPUs/GPT-neo-IMDB-finetuning.py](GPT-neo-IMDB-finetuning.py) to set up the correct CPU-GPU bindings based on the processes rank. + 1. Edit [08_Scaling_to_multiple_GPUs/run.sh](run.sh) to set up the correct CPU-GPU bindings based on the processes rank. - You can find a [figure showing which cores are closest to which GCD](https://docs.lumi-supercomputer.eu/assets/images/lumig-cpu-gpu-links.svg) on the [LUMI Docs LUMI-G page](https://docs.lumi-supercomputer.eu/hardware/lumig/). - - > **Tip** - > - > Use the `psutil.Process().cpu_affinity(...)` function to set the binding from inside the Python script. + When torchrun is used, we can rely on [NUMA binding](https://docs.pytorch.org/docs/2.11/elastic/numa.html#module-torch.numa) with `--numa-binding=exclusive` to set this automatically for us. 4. (Optional/Bonus): Running without torchrun. @@ -90,8 +85,8 @@ > hostname > ``` - In this setting you could then also do the CPU bindings from the slurm batch file instead of Python, to keep the training script free of system specific setup. + In this setting you need to set the CPU-GPU bindings a bit differently. This is tricky, but you can check out the reference solution if you struggle with this part of the exercise. ## Solutions -The folder `reference_solution/` contains an example solution for this exercise parts 1, 2 and 4. `reference_solution/prints_only_from_single_process` extends this to ensure that `print` statements in the code are run only by a single process. `reference_solution/with_cpu_bindings` shows how CPU bindings can be used both from within Python (when using torchrun) and directly via SLURM (exercise part 3). +The folder `reference_solution/` contains an example solution for this exercise parts 1, 2 and 4. `reference_solution/prints_only_from_single_process` extends this to ensure that `print` statements in the code are run only by a single process. `reference_solution/with_cpu_bindings` shows how CPU bindings can be used both when using torchrun and directly via SLURM (exercise part 3). diff --git a/08_Scaling_to_multiple_GPUs/reference_solution/GPT-neo-IMDB-finetuning.py b/08_Scaling_to_multiple_GPUs/reference_solution/GPT-neo-IMDB-finetuning.py index 3c2ef17..ae67574 100644 --- a/08_Scaling_to_multiple_GPUs/reference_solution/GPT-neo-IMDB-finetuning.py +++ b/08_Scaling_to_multiple_GPUs/reference_solution/GPT-neo-IMDB-finetuning.py @@ -84,13 +84,17 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -110,6 +114,7 @@ # Let's print one sample from the dataset. print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration global_train_batch_size = 32 # We keep the overall batch size (across all GPUs) the same as before ... @@ -153,6 +158,7 @@ print("Length of input_ids:", len(b["input_ids"])) break print("Length of dataset (tokenized)", len(train_dataset_tokenized)) + print ("\n" * 4) # #### Training # We use the Hugging Face trainer instead of a manual training loop. @@ -165,7 +171,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -176,6 +182,7 @@ print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/GPT-neo-IMDB-finetuning.py b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/GPT-neo-IMDB-finetuning.py index dd8fa66..0b88e9a 100644 --- a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/GPT-neo-IMDB-finetuning.py +++ b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/GPT-neo-IMDB-finetuning.py @@ -85,14 +85,18 @@ print("Loading model and tokenizer") start = time.time() tokenizer = AutoTokenizer.from_pretrained(pretrained_model, use_fast=True) - tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = 50256 # adjusting tokenizer and model # Load the actual base model from Hugging Face model = AutoModelForCausalLM.from_pretrained(pretrained_model) + # adjusting tokenizer and model + model.config.pad_token_id = 50256 + model.generation_config.pad_token_id = 50256 model.to(device) stop = time.time() if rank == 0: print(f"Loading model and tokenizer took: {stop-start:.2f} seconds") + print ("\n" * 4) # #### Loading the IMDb data set # @@ -113,6 +117,7 @@ if rank == 0: print("Sample from dataset") pprint(train_dataset[200]) + print ("\n" * 4) # #### Setting up the training configuration global_train_batch_size = 32 # We keep the overall batch size (across all GPUs) the same as before ... @@ -156,6 +161,7 @@ pprint(b, compact=True) print("Length of input_ids:", len(b["input_ids"])) break + print ("\n" * 4) print("Length of dataset (tokenized)", len(train_dataset_tokenized)) # #### Training @@ -169,7 +175,7 @@ trainer = Trainer( model=model, args=training_args, - tokenizer=tokenizer, + processing_class=tokenizer, data_collator=collator, train_dataset=train_dataset_tokenized, eval_dataset=validate_dataset_tokenized, @@ -181,6 +187,7 @@ if rank == 0: print() print("Training done, you can find all the model checkpoints in", output_dir) + print ("\n" * 4) # #### Evaluating the finetuned model with torch.no_grad(): diff --git a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_no_torchrun.sh b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_no_torchrun.sh index 67581e1..99a40ae 100644 --- a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_no_torchrun.sh +++ b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_no_torchrun.sh @@ -1,5 +1,5 @@ #!/bin/bash -#SBATCH --account=project_465002178 +#SBATCH --account=project_465002757 #SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available #SBATCH --partition=standard-g #SBATCH --nodes=1 @@ -11,14 +11,14 @@ # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}" @@ -46,7 +46,7 @@ export LOCAL_WORLD_SIZE=$SLURM_GPUS_PER_NODE # As opposed to the example in `run_torchrun.sh`, we can set the CPU binds directly via the slurm command, since we have # one task per GPU. In this case we do NOT need to set them from within the Python code itself. -srun singularity exec $CONTAINER \ +srun singularity run $CONTAINER \ bash -c "RANK=\$SLURM_PROCID \ LOCAL_RANK=\$SLURM_LOCALID \ python GPT-neo-IMDB-finetuning.py \ diff --git a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_torchrun.sh b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_torchrun.sh index 5bc4378..f4df77e 100644 --- a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_torchrun.sh +++ b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/run_torchrun.sh @@ -1,5 +1,5 @@ #!/bin/bash -#SBATCH --account=project_465002178 +#SBATCH --account=project_465002757 #SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available #SBATCH --partition=standard-g #SBATCH --nodes=1 @@ -11,14 +11,14 @@ # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}" @@ -41,7 +41,7 @@ set -xv # print the command so that we can verify setting arguments correctly fr # Since we start only one task with slurm which then starts subprocesses, we cannot use slurm to configure CPU binds. # Therefore we need to set them up in the Python code itself. -srun singularity exec $CONTAINER \ +srun singularity run $CONTAINER \ torchrun --standalone \ --nnodes=1 \ --nproc-per-node=${SLURM_GPUS_PER_NODE} \ diff --git a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/slurm-9304946.out b/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/slurm-9304946.out deleted file mode 100644 index ce48815..0000000 --- a/08_Scaling_to_multiple_GPUs/reference_solution/prints_only_from_single_process/slurm-9304946.out +++ /dev/null @@ -1,157 +0,0 @@ -The following modules were not unloaded: - (Use "module --force purge" to unload all): - - 1) ModuleLabel/label 6) libfabric/1.15.2.0 - 2) lumi-tools/24.05 7) craype-network-ofi - 3) init-lumi/0.2 8) xpmem/2.8.2-1.0_5.1__g84a27a5.shasta - 4) craype-x86-trento 9) CrayEnv - 5) craype-accel-amd-gfx90a - -The following sticky modules could not be reloaded: - - 1) lumi-tools - -# Set up variables to control distributed PyTorch training -export MASTER_ADDR=$(hostname) -++ hostname -+ export MASTER_ADDR=nid005527 -+ MASTER_ADDR=nid005527 -export MASTER_PORT=25900 -+ export MASTER_PORT=25900 -+ MASTER_PORT=25900 -export WORLD_SIZE=$SLURM_NPROCS -+ export WORLD_SIZE=8 -+ WORLD_SIZE=8 -export LOCAL_WORLD_SIZE=$SLURM_GPUS_PER_NODE -+ export LOCAL_WORLD_SIZE=8 -+ LOCAL_WORLD_SIZE=8 - -# As opposed to the example in `run_torchrun.sh`, we can set the CPU binds directly via the slurm command, since we have -# one task per GPU. In this case we do NOT need to set them from within the Python code itself. -srun singularity exec $CONTAINER \ - bash -c "RANK=\$SLURM_PROCID \ - LOCAL_RANK=\$SLURM_LOCALID \ - python GPT-neo-IMDB-finetuning.py \ - --model-name $MODEL_NAME \ - --output-path $OUTPUT_DIR \ - --logging-path $LOGGING_DIR \ - --num-workers ${SLURM_CPUS_PER_TASK}" -+ srun singularity exec /project/project_465001707/containers/pytorch_transformers.sif bash -c 'RANK=$SLURM_PROCID LOCAL_RANK=$SLURM_LOCALID python GPT-neo-IMDB-finetuning.py --model-name gpt-imdb-model-multigpu-no-torchrun --output-path /scratch/project_465001707/lukaspre/data/ --logging-path /scratch/project_465001707/lukaspre/runs/ --num-workers 7' -Using PyTorch version: 2.5.1+rocm6.2 -Rank 6 of 8 (local: 6) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 1 of 8 (local: 1) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 4 of 8 (local: 4) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 7 of 8 (local: 7) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 3 of 8 (local: 3) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 5 of 8 (local: 5) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 2 of 8 (local: 2) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Using PyTorch version: 2.5.1+rocm6.2 -Rank 0 of 8 (local: 0) sees 8 devices -Using GPU, device name: AMD Instinct MI250X -Loading model and tokenizer -Loading model and tokenizer took: 21.01 seconds -Sample from dataset -{'label': 0, - 'text': 'This is an action Western. James Steart leads an all star cast in ' - 'the scenic Northwest, which is filmed in great splendor. The scenery ' - 'and costumes are great. There is action and adventure. Stewart plays ' - 'a wealthy cattleman who runs afoul of a crooked government in the ' - 'old Nothwest.

The main drawback is the stereotypical ' - 'cynic that Hollywood has always made into a hero. Even when this ' - 'movie was made, the cynic was the stereotypical hero, and the one ' - 'Stewart portrays really has few saving graces. He is kind to his two ' - 'partners, and that does give him an extra dimension of credibility ' - 'and likability.

However, he is so piggish to everyone ' - 'else, it is hard to really care for him, or to accept him. He is ' - 'much like the one dimensional spaghetti Western characters (cut not ' - 'that bad).

Still, the minor characters are quite ' - 'enjoyable. Walter Brennan, Royal Dano, Harry Morgan, and others make ' - 'this worth watching.'} - Map (num_proc=7): 0%| | 0/75000 [00:00
The main drawback is the stereotypical ' - 'cynic that Hollywood has always made into a hero. Even when this ' - 'movie was made, the cynic was the stereotypical hero, and the one ' - 'Stewart portrays really has few saving graces. He is kind to his two ' - 'partners, and that does give him an extra dimension of credibility ' - 'and likability.

However, he is so piggish to everyone ' - 'else, it is hard to really care for him, or to accept him. He is ' - 'much like the one dimensional spaghetti Western characters (cut not ' - 'that bad).

Still, the minor characters are quite ' - 'enjoyable. Walter Brennan, Royal Dano, Harry Morgan, and others make ' - 'this worth watching.'} - Map (num_proc=7): 0%| | 0/75000 [00:00 # Set up the software environment # NOTE: the loaded module makes relevant filesystem locations available inside the singularity container -# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI +# (/scratch, /project, etc) # If you are interested, you can check the exact paths being mounted from -# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua +# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.1.lua module purge -module use /appl/local/containers/ai-modules -module load singularity-AI-bindings +module use /appl/local/laifs/modules +module load lumi-aif-singularity-bindings -CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif +CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r70f21m50t210-20260513_121430/lumi-multitorch-full-u24r70f21m50t210-20260513_121430.sif # Some environment variables to set up cache directories SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"