-
Notifications
You must be signed in to change notification settings - Fork 11
Description
When I ran a test with config which had multiple OSDs on a single testnode, I got the following error:
2024-05-15T12:29:41.414 INFO:teuthology.orchestra.run.7fa8c2843ce2.stdout:Created osd(s) 0 on host '7fa8c2843ce2'
2024-05-15T12:29:42.108 DEBUG:teuthology.orchestra.run.7fa8c2843ce2:osd.0> sudo journalctl -f -n 0 -u ceph-1b2586ac-12b6-11ef-945e-d6d5f423fdc9@osd.0.service
2024-05-15T12:29:42.110 INFO:tasks.cephadm:{Remote(name='ubuntu@7fa8c2843ce2'): [], Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'], Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'], Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}
2024-05-15T12:29:42.110 INFO:tasks.cephadm:ubuntu@7fa8c2843ce2
2024-05-15T12:29:42.110 INFO:tasks.cephadm:[]
2024-05-15T12:29:42.110 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/teuthology/teuthology/contextutil.py", line 30, in nested
vars.append(enter())
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/root/src/github.com_vallariag_ceph_0c8b425a40783ee42c035ea9fbe29647e90f007f/qa/tasks/cephadm.py", line 1072, in ceph_osds
assert devs ## FIXME ##
AssertionError
Each testnode had one loop device: https://pastebin.com/raw/8z5gj0CU (ls /dev output)
The above problem happens because my job config has multiple osds on 1 node (osd.0 and osd.1 deployed on same host) and there is only 1 device available on each testnode container that can be zapped for osd deployment.
Using ceph-devstack setup, the teuthology function get_scratch_devices() returned 1 devices for each testnode. So the mapping (devs_by_remote) looks like this :
{Remote(name='ubuntu@7fa8c2843ce2'): ['/dev/loop2'],
Remote(name='ubuntu@c3bfc4209056'): ['/dev/loop3'],
Remote(name='ubuntu@db02dd5eef59'): ['/dev/loop0'],
Remote(name='ubuntu@de36ba4bccc7'): ['/dev/loop1']}
And because we pop the loop device from the above devs_by_remote after 1st osd is deployed, the 2nd osd on same testnode has no more available devices to deploy 2nd osd on.
I rerun my test with 1 osd/node config and that worked (test went through the ceph setup okay).
As for a proper solution... does this mean we should create more loop devices per testnode in ceph-devstack?
Let me know, I'll love to pick this issue. It'll be a good gateway to understand more of ceph-devstack.