Skip to content

Commit c2fe916

Browse files
committed
Wait for compute service to check in
With cell v2, on initial bring up, discover hosts can't run unless all the compute nodes have checked in. The documentation says that you should run ``nova service-list --binary nova-compute`` and see all your hosts before running discover hosts. This isn't really viable in a multinode devstack because of how things are brought up in parts. We can however know that stack.sh will not complete before the compute node is up by waiting for the compute node to check in before completing. This happens quite late in the stack.sh run, so shouldn't add any extra time in most runs. Cells v1 and Xenserver don't use real hostnames in the service table (they encode complex data that is hostname like to provide more topology information than just hostnames). They are exempted from this check. Related-Bug: #1708039 Change-Id: I32eb59b9d6c225a3e93992be3a3b9f4b251d7189
1 parent f7c2501 commit c2fe916

3 files changed

Lines changed: 49 additions & 0 deletions

File tree

functions

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,26 @@ EOF
407407
return $rval
408408
}
409409

410+
function wait_for_compute {
411+
local timeout=$1
412+
local rval=0
413+
time_start "wait_for_service"
414+
timeout $timeout bash -x <<EOF || rval=$?
415+
ID=""
416+
while [[ "\$ID" == "" ]]; do
417+
sleep 1
418+
ID=\$(openstack --os-cloud devstack-admin --os-region "$REGION_NAME" compute service list --host `hostname` --service nova-compute -c ID -f value)
419+
done
420+
EOF
421+
time_stop "wait_for_service"
422+
# Figure out what's happening on platforms where this doesn't work
423+
if [[ "$rval" != 0 ]]; then
424+
echo "Didn't find service registered by hostname after $timeout seconds"
425+
openstack --os-cloud devstack-admin --os-region "$REGION_NAME" compute service list
426+
fi
427+
return $rval
428+
}
429+
410430

411431
# ping check
412432
# Uses globals ``ENABLED_SERVICES``, ``TOP_DIR``, ``MULTI_HOST``, ``PRIVATE_NETWORK``

lib/nova

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -944,6 +944,28 @@ function start_nova_conductor {
944944
done
945945
}
946946

947+
function is_nova_ready {
948+
# NOTE(sdague): with cells v2 all the compute services must be up
949+
# and checked into the database before discover_hosts is run. This
950+
# happens in all in one installs by accident, because > 30 seconds
951+
# happen between here and the script ending. However, in multinode
952+
# tests this can very often not be the case. So ensure that the
953+
# compute is up before we move on.
954+
if is_service_enabled n-cell; then
955+
# cells v1 can't complete the check below because it munges
956+
# hostnames with cell information (grumble grumble).
957+
return
958+
fi
959+
# TODO(sdague): honestly, this probably should be a plug point for
960+
# an external system.
961+
if [[ "$VIRT_DRIVER" == 'xenserver' ]]; then
962+
# xenserver encodes information in the hostname of the compute
963+
# because of the dom0/domU split. Just ignore for now.
964+
return
965+
fi
966+
wait_for_compute 60
967+
}
968+
947969
function start_nova {
948970
# this catches the cells v1 case early
949971
_set_singleconductor

stack.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1433,6 +1433,13 @@ fi
14331433
# Sanity checks
14341434
# =============
14351435

1436+
# Check that computes are all ready
1437+
#
1438+
# TODO(sdague): there should be some generic phase here.
1439+
if is_service_enabled n-cpu; then
1440+
is_nova_ready
1441+
fi
1442+
14361443
# Check the status of running services
14371444
service_check
14381445

0 commit comments

Comments
 (0)