Got stuck when debugging a multi-process program in PyTorch

Before creating a new issue, please check the [FAQ](https://github.com/microsoft/debugpy/wiki/FAQ) to see if your question is answered there.

## Environment data

- debugpy version: 1.8.1 (run `import debugpy; print(debugpy.__version__)` if uncertain)
- OS and version: Centos
- Python version (& distribution if applicable, e.g. Anaconda): 3.10.13
- Using VS Code or Visual Studio: VS Code 1.90.2 

## Actual behavior

When I initiate debugging and press F10, the program gets stuck at the line with DDP (`ddp_model = DDP(model, device_ids=[device_id])`)

- launch.json

```json
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Attach Test",
            "type": "debugpy",
            "request": "attach",
            "listen": {
                "port": 5678,
            },
            "justMyCode": false,
        },
   ],
}
```

- example.py

```python
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.optim as optim
from torch.nn.parallel import DistributedDataParallel as DDP

class ToyModel(nn.Module):
    def __init__(self):
        super(ToyModel, self).__init__()
        self.net1 = nn.Linear(10, 10)
        self.relu = nn.ReLU()
        self.net2 = nn.Linear(10, 5)

    def forward(self, x):
        return self.net2(self.relu(self.net1(x)))

def demo_basic():
    dist.init_process_group("nccl")
    rank = dist.get_rank()
    print(f"Start running basic DDP example on rank {rank}.")

    # create model and move it to GPU with id rank
    device_id = rank % torch.cuda.device_count()
    model = ToyModel().to(device_id)

    import debugpy
    debugpy.connect(5678)
    debugpy.wait_for_client()
    debugpy.breakpoint()

    ddp_model = DDP(model, device_ids=[device_id])

    loss_fn = nn.MSELoss()
    optimizer = optim.SGD(ddp_model.parameters(), lr=0.001)
    optimizer.zero_grad()
    outputs = ddp_model(torch.randn(20, 10))
    labels = torch.randn(20, 5).to(device_id)
    loss_fn(outputs, labels).backward()
    optimizer.step()
    dist.destroy_process_group()

if __name__ == "__main__":
    demo_basic()
```

- launch command:

```bash
torchrun --nproc_per_node=2 example.py
```

**Note**: The configuration above was working before, but it suddenly stopped working recently. Additionally, I can debug normally by configuring 'connect' in launch.json and using 'listen' in the code, but this method requires setting different ports to avoid port conflicts, which is not very convenient.

## Expected  behavior

It will not get stuck.

## Steps to reproduce:
1. XXX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Got stuck when debugging a multi-process program in PyTorch #1615

Environment data

Actual behavior

Expected behavior

Steps to reproduce:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Got stuck when debugging a multi-process program in PyTorch #1615

Description

Environment data

Actual behavior

Expected behavior

Steps to reproduce:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions