Skip to content

Reconfigurator: Add live test that executes expunged zones that were never in service#10081

Open
jgallagher wants to merge 14 commits intomainfrom
john/execute-expunged-live-test
Open

Reconfigurator: Add live test that executes expunged zones that were never in service#10081
jgallagher wants to merge 14 commits intomainfrom
john/execute-expunged-live-test

Conversation

@jgallagher
Copy link
Contributor

This is an attempt to catch other issues like #10025, and implements the reproduction steps described there as a live test, but applied to most zone types and not just Nexus. (We skip multinode clickhouse, because those aren't deployed by default, and internal DNS, because the planner can't replace it without execution running first anyway).

This took a bunch of tries to get passing, and I'd be very unsurprised if there are other kinds of flakes still lurking here. We're not running live tests as a part of CI, so I'm not sure how worried to be about this.

Includes a few bits of live test housekeeping (updates to the README and racklette serial number sets).

@jgallagher
Copy link
Contributor Author

Running this test on london against a branch that did not have the fix for #10025, we see the failure we'd expect: the test times out waiting for blueprint execution to succeed:

  stderr ---
    log file: /var/tmp/test_execute_expunged_zone-78ccd669b96cb1ec-test_execute_expunged_zone.11905.0.log
    note: configured to log to "/var/tmp/test_execute_expunged_zone-78ccd669b96cb1ec-test_execute_expunged_zone.11905.0.log"
    note: using DNS from system config (typically /etc/resolv.conf)

    thread 'test_execute_expunged_zone' (2) panicked at live-tests/tests/test_execute_expunged_zone.rs:367:6:
    waited for successful execution: TimedOut(180.373621863s)
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

and the log file shows the exact error we saw in #10025:

17:21:17.796Z WARN test_execute_expunged_zone: execution had an error
    error = step failed: Ensure external networking resources: Internal Error: unexpected database error: Record not found

Running against a branch that does have that fix (as well as #10072, which fell out of developing this test), we get a pass but it takes a while; most of this time is waiting for cockroach to be healthy again after expunging one of its nodes:

    Starting 1 test across 3 binaries (2 tests skipped)
        SLOW [> 60.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>120.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>180.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>240.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>300.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>360.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>420.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>480.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>540.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>600.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>660.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>720.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>780.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>840.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        SLOW [>900.000s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
        PASS [ 949.973s] omicron-live-tests::test_execute_expunged_zone test_execute_expunged_zone
------------
     Summary [ 949.974s] 1 test run: 1 passed (1 slow), 2 skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant