Switch from Colossal Launcher to torchrun. 

### 🐛 Describe the bug

Solves key problems: 
* Too many WandB loggers per node. Switch from 1 per gpu to one per node (since each process can see all 4 GPUs). 
* Makes launching easier: single loop over the nodes in the slum job.

Refactor this line in slurm launcher.
```bash
ssh "$local_node_hostname" \
            "export DATA=$DATA; conda activate $CONDA_ENV_NAME; python $TRAIN_FILEPATH --config $CONFIG_FILEPATH --host $MAIN_HOST --port $MAIN_HOST_PORT --world_size $WORLD_SIZE --rank $localrank" &
        
```

### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch from Colossal Launcher to torchrun. #1

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Switch from Colossal Launcher to torchrun. #1

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions