Machine learning workshop as an introduction to the Parallel Works ACTIVATE user experience. ACTIVATE is a single control plane for cloud and on-premise high performance resources.
The main activities of this workshop are to:
- Start a personal cloud cluster
- Start notebook session on cluster
- Download notebook from public repository to cluster
- Run notebook on cluster
- Copy files to different storage (bucket, workspace)
- Track cost in near real time
- Launch MPI job via script_submitter (optional)
- Log into the platform by going to hpcmp-cloud.parallel.works.
- Change your password. Initial login can be complicated by:
- delayed/filtered password reset messages and
- cannot use PED for MFA in certain locations.
- On the
Homepage, go to theComputetile and click on theOn buttonfor your default cluster. - Cloud cluster startup takes ~2-5 minutes.
- Please explore - but do not change - the configuration with the
ibutton. - In particular, note that the cluster has the following parts:
- On the ACTIVATE Home page, click on the
JupyterLabworkflow tile - There is no need to change the fields on the workflow launch page - your one running cluster autopopulates.
- Default settings include using cached software on
/pw/appsto minimize JupyterLab startup time and bypass the need to install TensorFlow. - You are welcome to explore the options at some other time.
- Click on the
Executebutton. - You can stay on the workflow launch status page, but it's more interesting to:
- go to your Home page
- notice your session is starting up (
Sessionstile)... - ...and the workflow is running (
Workflow Runstile) - A workflow is just a series of automated steps.
- An interactive session is a special type of workflow whose steps include the setup for sending graphics from the cloud cluster to your ACTIVATE workspace.
- Workflows can also be purely computational (i.e. running a simulation) or even a mix of non-graphical and graphical applications.
- Workflows are defined in an easy to use
.yamlformat; this is beyond the scope of the workshop. - click on the run number of the JupyterLab workflow (i.e.
00001) to view the workflow progress and logs - JupyterLab is ready when
Create sessionhas a green checkmark in the workflow viewer or there is a green light for the entry in theSessionstile on ACTIVATEHome.
- Access your JupyterLab session on the head node of the cluster by clicking on its session in the
Sessiontile on ACTIVATEHome. - Often, it's convenient to use the
Open in new tabbutton to place the session in its own browser tab. - Use the JupyterLab launcher tab to start a terminal (you may need to scroll down)
- In the terminal in your JupyterLab session, please run
git clone https://github.com/parallelworks/ml-workshopto place a copy of this repository on your cluster. - You should see
ml-workshopin the file browser portion (left sidebar) of JupyterLab - You are also welcome to run simple Linux terminal commands like
hostname,whoami,sinfo, andsqueueto verify that you are on the head node of a SLURM cluster.
- Start the notebook by clicking on
ml-workshopin the JupyterLab file browser and thencvae_example.ipynb. - The notebook stores code, output, and error messages all in the same file.
- The error messages here are totally normal.
- Notebook cells can be run individually by selecting them and clicking on the
Playicon (right pointing arrow head). - Or, you can go to the top menu and select
Kernel > Restart Kernel and Run Allto engage all the cells. - While running the steps of the notebook:
- This small example of generative AI trains a neural network to recognize handwritten digits (0-9).
- A citation, summary of the job, and an example of extending this approach to a bigger science application is presented at the top of the notebook.
- The training and visualization steps will each take a few minutes
- While they run, if you have opened the JupyterLab session in its own tab, you can go back to the ACTIVATE
Homepage on your original browser tab to verify your session/workflow is still running. - You can monitor CPU/RAM/disk usage in near real time by selecting the
ibutton on the line of your cluster in theComputetile. - Or, you can monitor resource usage in the terminal with
htop, etc.
- There are several persistent storage options integrated with your ephemeral cloud cluster.
/pw/bbbis a shared cloud bucket mounted to the cluster.- For simplicity, this bucket is shared among all workshop participants; you can overwrite each other's files here! E.g. rename your notebook to your username and then copy it to
/pw/bbb. - You can get short term credentials to the bucket and examples for use with standard CSP CLI tools by clicking on the
Bucketstab on the left sidebar of your ACTIVATEHome, selecting thebbbbucket, and then clicking on theCredentialsbutton in the upper right corner. - The home directory of your cluster is also mounted into your persistent ACTIVATE workspace. You can view the files by clicking on the
Editortab on the left sidebar. TheEditortab also opens an integrated development environment (IDE) associated with your private workspace on ACTIVATE.
- Go back to the ACTIVATE Home page.
- Click on the
$ Costmenu item on the left sidebar. - You may need to set the group to
ml-workshopin the ribbon/filter bar across the top of the cost dashboard.
- OpenMPI is already installed at
/pw/apps/ompi. - If you run
run_mpitest.shin this repository, it will:- set up the system paths to access OpenMPI,
- compile the hello world MPI source code provided here, and
- run the code over 4 CPUs distributed over two worker nodes.
- You can check for the status of this multiple node job with
sinfoandsqueuein another terminal.
- You can also copy and paste the contents of
run_mpitest.shinto thescript_submitterworkflow's launch page to run the script on the cluster as if it were a formal workflow.
