Support for multiple OSDs on a single testnode

When I ran a test with config which had multiple OSDs on a single testnode, I got the following error:
```
2024-05-15T12:29:41.414 INFO:teuthology.orchestra.run.7fa8c2843ce2.stdout:Created osd(s) 0 on host '7fa8c2843ce2'
2024-05-15T12:29:42.108 DEBUG:teuthology.orchestra.run.7fa8c2843ce2:osd.0> sudo journalctl -f -n 0 -u ceph-1b2586ac-12b6-11ef-945e-d6d5f423fdc9@osd.0.service
2024-05-15T12:29:42.110 INFO:tasks.cephadm:{Remote(name='ubuntu@7fa8c2843ce2'): [], Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'], Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'], Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}
2024-05-15T12:29:42.110 INFO:tasks.cephadm:ubuntu@7fa8c2843ce2
2024-05-15T12:29:42.110 INFO:tasks.cephadm:[]
2024-05-15T12:29:42.110 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/teuthology/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/root/src/github.com_vallariag_ceph_0c8b425a40783ee42c035ea9fbe29647e90f007f/qa/tasks/cephadm.py", line 1072, in ceph_osds
    assert devs   ## FIXME ##
AssertionError
```
Each testnode had one loop device: https://pastebin.com/raw/8z5gj0CU (`ls /dev` output)

The above problem happens because my job config has multiple osds on 1 node (osd.0 and osd.1 deployed on same host) and there is only 1 device available on each testnode container that can be zapped for osd deployment.
Using ceph-devstack setup, the teuthology function [get_scratch_devices()](https://github.com/ceph/teuthology/blob/1ae7ad82388e92a475afff437d49054826c019a1/teuthology/misc.py#L790) returned 1 devices for each testnode. So the mapping ([devs_by_remote](https://github.com/ceph/ceph/blob/c2fb93636ebe325b2c93b199bfbcf8e59451a24d/qa/tasks/cephadm.py#L1061)) looks like this :
```
{Remote(name='ubuntu@7fa8c2843ce2'): ['/dev/loop2'], 
Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'], 
Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'], 
Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}
```
And because we [pop the loop device](https://github.com/ceph/ceph/blob/c2fb93636ebe325b2c93b199bfbcf8e59451a24d/qa/tasks/cephadm.py#L1081) from the above `devs_by_remote` after 1st osd is deployed, the 2nd osd on same testnode has no more available devices to deploy 2nd osd on.
I rerun my test with 1 osd/node config and that worked (test went through the ceph setup okay). 

As for a proper solution... does this mean we should create more loop devices per testnode in ceph-devstack?
Let me know, I'll love to pick this issue. It'll be a good gateway to understand more of ceph-devstack. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for multiple OSDs on a single testnode #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for multiple OSDs on a single testnode #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions