Even though the method in the orch module exit_host_maintenance(self, hostname: str, force: bool = False, offline: bool = False) [1] takes the offline and force parameters, the command line parse does not accept them
$ ceph orch host maintenance exit --offline --force node.example.com
Invalid command: Unexpected argument '--offline'
orch host maintenance exit <hostname> : Return a host from maintenance, restarting all Ceph daemons (cephadm only)
The reason this is discovered is that when setting a host in maintenance, rebooting that host and trying to leave maintenance again ends up in an uncought Exception
$ ceph orch host maintenance exit node.example.com
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1907, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 186, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 526, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 122, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 111, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 804, in _host_maintenance_exit
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 240, in raise_if_exception
e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
re-executing the same command another time resolves the issue.
$ ceph orch host maintenance exit node.example.com
Ceph cluster .... on node.example.com has exited maintenance mode
reproducer
-
set a host into maintenance
ceph orch host maintenance enter node.example.com
-
verify the host is reported in maintenance
$ ceph orch host ls --host_status maintenance
HOST ADDR LABELS STATUS
node.example.com 192.168.192.210 osd,mgr Maintenance
-
reboot the host
ssh node.example.com "sudo reboot"
-
verify the host is reported offline
$ ceph orch host ls --host_status offline
HOST ADDR LABELS STATUS
node.example.com 192.168.192.210 osd,mgr Offline
-
after the host is back online leave maintenance mode
ceph orch host maintenance exit node.example.com
-
exception returned
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1907, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 186, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 526, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 122, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 111, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 812, in _host_rescan
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 240, in raise_if_exception
e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr'
-
also happens on other commands like ceph orch host rescan node.example.com
[1] https://github.com/ceph/ceph/blob/9336c9e5e8cca58aadc3271f872e586061d07198/src/pybind/mgr/cephadm/module.py#L2219
Even though the method in the orch module
exit_host_maintenance(self, hostname: str, force: bool = False, offline: bool = False)[1] takes theofflineandforceparameters, the command line parse does not accept themThe reason this is discovered is that when setting a host in maintenance, rebooting that host and trying to leave maintenance again ends up in an uncought Exception
re-executing the same command another time resolves the issue.
reproducer
set a host into maintenance
ceph orch host maintenance enter node.example.comverify the host is reported in maintenance
reboot the host
ssh node.example.com "sudo reboot"verify the host is reported offline
after the host is back online leave maintenance mode
ceph orch host maintenance exit node.example.comexception returned
also happens on other commands like
ceph orch host rescan node.example.com[1] https://github.com/ceph/ceph/blob/9336c9e5e8cca58aadc3271f872e586061d07198/src/pybind/mgr/cephadm/module.py#L2219