Skip to content

feat: [MR-144] Reuse unchanged canister digests across HashTrees#10532

Open
alin-at-dfinity wants to merge 78 commits into
masterfrom
alin/DSM-144-tree_hash-stubbing
Open

feat: [MR-144] Reuse unchanged canister digests across HashTrees#10532
alin-at-dfinity wants to merge 78 commits into
masterfrom
alin/DSM-144-tree_hash-stubbing

Conversation

@alin-at-dfinity

Copy link
Copy Markdown
Contributor

Summary

hash_lazy_tree re-hashes every canister's subtree each round, even though almost all canisters are untouched. This PR lets a new HashTree reuse subtree digests from a previous one, keyed by the canister's backing Arc<CanisterState> (shared copy-on-write, replaced only on mutation).

What changed

  • Stub nodes. A new NodeKind::Stub collapses a reusable subtree to a digest-only node. A LazyFork opts in via subtree_source(); CanisterFork returns a SubtreeSource over its Arc<CanisterState>. Other forks are still materialized inline.
  • SubtreeSource. A type-erased, owned handle (Arc<dyn Any> + expander fn pointer) that keeps the source alive (no ABA on the address) and re-expands the stub for witnesses. Eq is a conservative, false-negative-only reuse gate: same source allocation and same expander.
  • Version in the expander. expand_canister::<const V: u32> bakes the certification version into the function pointer, so it isn't stored per stub and reuse can't mix versions.
  • hash_lazy_tree_with_baseline. Lockstep-joins the new tree against a baseline; matching SubtreeSources reuse the baseline digest, otherwise rebuild. Witnesses expand stubs via SubtreeSource::expand(), so the tree stays self-contained.
  • Adaptive parallelization. Forks build sequentially, then hand the remaining children to the thread pool once the observed rebuild rate (after a warmup) projects more than PARALLEL_MIN_CHILDREN rebuilds. The common reuse-heavy path stays sequential; large/from-scratch forks parallelize.

Tests

New tests/subtree.rs: stubbing, witness/absence-proof expansion from source, baseline builds (full and partial change) matching from-scratch, and reuse_is_by_identity_not_by_value (shared Arcs reuse, replaced Arcs rebuild).

Follow-up

Actually invoke hash_lazy_tree_with_baseline from production code, with the latest available HashTree. For now, it is only invoked from the benchmark and tests.

Stubbing is implemented for /canisters only; ingress /messages are left for later.

alin-at-dfinity and others added 30 commits June 18, 2026 16:16
* Use precomputed empty leaf hash instead of computing it from scratch every time.
* Pass the key to `MapTransformFork::mk_tree` by reference, don't clone it (especially since it's only used in a couple of cases).
* Simplidy CanisterFork.
Looks like it was actually dm-2 (store-crypt) and/or dm-4
(store-shared--data, more likely) suffering from read amplification, not
vda. They're probably all the same (virtual?) device underneath, which
is why reads appear to be spiking on all 3 at once. But it's likely the
shared data partition that is actually being pounded.

Reduce the read-ahead for all these devices similar to what was done for
vda, just to be safe.
…10422)

## Summary

`ic-admin get-elected-guestos-versions --json` (alias
`get-blessed-replica-versions`) stopped emitting JSON. The handler used
to print JSON via the generic `print_and_get_last_value` helper, but
#10301 rewrote it to enumerate `ReplicaVersionRecord` entries with a
plain `println!` loop and never re-wired `opts.json`. Since `--json` is
a global flag, it is still accepted but silently ignored, so the command
always prints plaintext. `get-elected-hostos-versions` had the same
plaintext-only behavior.

This broke downstream automation in `dre-airflow`, which parsed the JSON
output and started failing with `JSONDecodeError: Extra data: line 1
column 7 (char 6)` (the first version hash's leading digits parsed as an
integer).

This PR restores `--json` for both subcommands:
- with `--json`: print the elected version IDs as a JSON array
(`["<hash>", ...]`)
- without `--json`: unchanged, newline-separated version IDs

## How was this tested?

`bazel build //rs/registry/admin:ic-admin` builds successfully.

Manual shape (with `--json`):
```json
[
  "abc...def",
  "012...789"
]
```

---------

Co-authored-by: IDX GitHub Automation <infra+github-automation@dfinity.org>
Update mainnet revisions file to include the latest version released on
the mainnet.

This PR is created automatically using
[`mainnet_revisions.py`](https://github.com/dfinity/ic/blob/master/ci/src/mainnet_revisions/mainnet_revisions.py)

Co-authored-by: CI Automation <infra+github-automation@dfinity.org>
This PR adds a PocketIC operation to delete a subnet:
- the subnet's `StateMachine` is properly dropped from the PocketIC
state (including state directory removal);
- the registry is updated accordingly (including registry changes in the
registry canister if the registry canister has been bootstrapped by
PocketIC, i.e., not manually).

Note. This PR does not implement subnet deletion in DSM! Stale artifacts
(such as stale streams and hanging unbounded-wait calls) might persist.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…0426)

Following the introduction of structured `CanisterStates` (hot vs cold
canisters), almost half of the remaining DSM overhead is not covered by
`mr_process_batch_phase_duration_seconds`, so it can only be bundled
under "other" (the difference to `mr_process_batch_duration_seconds`).

Explicitly track `read_registry()` duration (including blockmaker
metrics updates, likely insignificant); and schedule/queues garbage
collection plus (heavy) `canister_state_bytes` computation.
…gine (#10427)

This PR extends the migration canister to reject canister migration
requests in which the migrated or replaced canister is on a cloud
engine. Currently, such a canister migration is rejected because xnet
calls to a cloud engine are not allowed. This PR makes the requirement
explicit so that canister migration from/to a cloud engine keeps being
rejected once xnet calls to a cloud engine are allowed.
## Background
Currently, the price for an HTTP outcall is calculated and charged
upfront. In the future, we want to implement a "pay-as-you-go" pricing
model. This model assigns a fraction (budget) of the total attached
cycles to each participating replica (per-replica allowance).

As the request passes through the target server and the transform
function, the HTTP adapter of each replica subtracts cycles from its own
budget, based on the amount of resources it consumed (download time,
downloaded bytes, transform instructions). In the end, the amount of
remaining cycles (refund) is passed back to the replica together with
the HTTP response.

## Proposed Changes
With this PR, we let the replica include (and sign) this refund as part
of the gossiped response share.

Before, each replica only signed the response metadata which should be
equal across all responses in order to reach consensus. Now, the signed
messages may be different even across agreeing replicas, as each of them
could sign a different refund share.

Therefore, we change the structure of the aggregated HTTP response
proof. Previously, this was just the metadata of the response and a
collection of individual signatures over it. Now, it is the metadata and
a collection of individual signatures and refund receipts of each
replica. In order to verify the proof, the individual messages
consisting of both the metadata and the refund share have to be
reconstructed, such that the signature can be verified.

For that reason we additionally switch signature verification of
aggregated responses to use the new batch verification API for multiple
messages, similar to what was done in
#10345. As a side effect, this comes
with some performance improvements for payload validation:
```
canister_http_payload_verification/mixed_subnet34
                        time:   [146.72 ms 148.46 ms 150.26 ms]
                        change: [-14.887% -13.555% -12.243%] (p = 0.00 < 0.05)
                        Performance has improved.
canister_http_payload_verification/many_replicated_responses_subnet34
                        time:   [225.27 ms 227.44 ms 229.63 ms]
                        change: [-17.525% -16.435% -15.417%] (p = 0.00 < 0.05)
                        Performance has improved.
canister_http_payload_verification/many_non_replicated_responses_subnet34
                        time:   [11.124 ms 11.263 ms 11.406 ms]
                        change: [-61.100% -60.382% -59.642%] (p = 0.00 < 0.05)
                        Performance has improved.
canister_http_payload_verification/many_divergence_responses_subnet34
                        time:   [116.12 ms 117.52 ms 118.94 ms]
                        change: [-4.0609% -2.4873% -0.9067%] (p = 0.00 < 0.05)
                        Change within noise threshold.
canister_http_payload_verification/many_flexible_responses_subnet34
                        time:   [253.98 ms 256.09 ms 258.22 ms]
                        change: [-3.1565% -2.0740% -0.8815%] (p = 0.00 < 0.05)
                        Change within noise threshold.
```

Additionally, during payload validation, we verify that none of the
refund shares exceed the per-replica allowance. Note that in the current
(legacy) pricing model, nothing is refunded yet, so the allowance, and
the returned refund is always 0.

---------

Co-authored-by: IDX GitHub Automation <infra+github-automation@dfinity.org>
This PR enables the canister log memory store feature on all subnets.
It seems that Ubuntu 26 upgrade changed the way the network is
initialized by `systemd` which breaks our intermediate DHCP step. It's
hard to debug especially in combination with SELinux policies.

* Use `dhcpcd` instead of `systemd-networkd` to obtain a temporary DHCP
address. This avoids problems with new Ubuntu 26 init order
(`systemd-networkd` is started now by `systemd` early it seems and if we
launch a 2nd instance - it falls apart).
* Remove `console=tty0` from kernel boot args. It was added to support
logging on serial-port-less clouds, but the problem is that when both
`ttyX` and `ttySX` are present - the kernel makes the normal (graphical
/ ttyX) console the main one (regardless of their order) and serial
console doesn't get any recovery shell or proper logging. We can try
later to improve that somehow to get the best of the both worlds, though
it's not easy. Also see systemd/systemd#9899
about that.
* Alter the `icos_logging` so that it logs to both `stderr` and
`journald` by default to ease debugging
* Remove timeout for `init-config` service since it anyway is critical
and it makes not much sense to kill it if it hangs around longer
* Add timeouts to cloud-related HTTP calls, add more verbose logging
…10435)

## Summary
- Raises `MAX_ENVIRONMENT_VARIABLES` in
`rs/config/src/execution_environment.rs` from 20 to 32.

## Test plan
- [x] `rustfmt` clean
- [x] `cargo clippy -p ic-config` clean
- [x] `bazel build //rs/config:config`
- [x] Ran directly-affected bazel tests (depth 2); env-var specific test
(`test_environment_variable_system_api`) passes. Unrelated darwin-local
flakes were observed in stable-memory / NNS / state_manager / consensus
integration tests; none reference env vars.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Bjoern Tackmann <bjoern@dfinity.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This bumps our bazel version to the latest available version. Some
dependencies had to be bumped for 9.X support, but overall very little
changed.
## Root Cause

The `//rs/tests/networking:canister_http_*` tests fail in `setup` with:

```
Orchestrator rule did not appear in time.: Timed out waiting for orchestrator rule.
...
Output:  Err: Error: No such file or directory
list chain ip6 filter OUTPUT
```

`canister_http::setup` → `start_httpbin_on_uvm` →
`wait_for_orchestrator_fw_rules` polls every node for the
orchestrator-applied `ip6 filter OUTPUT` chain. In the analyzed failure
the **entire `ip6 filter` table** was missing on one node and never
appeared during the whole 60s retry window — long after all replicas
reported healthy. That's a *permanent* condition, not a slow one.

The GuestOS firewall application pipeline is:

1. The orchestrator generates the nftables ruleset and writes it to
`/run/ic-node/nftables-ruleset/nftables.conf` — but **only when its
content changes** (or on the very first check after start).
2. `reload_nftables.path` (`PathChanged=`) triggers
`reload_nftables.service`, which ran:
   - `ExecStartPre=/usr/sbin/nft flush ruleset`
- `ExecStart=systemctl reload nftables.service` (→ `nft -f
/etc/nftables.conf`)

This sequence is **neither atomic nor retried**:

- The flush runs *outside* the nft transaction. If the subsequent load
fails for any transient reason, the node is left with an **empty
ruleset**.
- The canonical transient failure is hostname resolution of `hostos` in
the template rule `ip6 saddr { hostos } ct state { new } tcp dport {
42372 } accept`, which is resolved at `nft` load time (via the
`nss_icos` NSS plugin, falling through to DNS) and can fail with `Error:
Could not resolve hostname: Temporary failure in name resolution` early
after boot.
- Nothing retries: the service is `Type=oneshot` without restart, and
the path unit only re-fires when the orchestrator rewrites the file —
which it effectively never does again on a short-lived testnet.

A single transient `nft` load failure therefore permanently disables the
firewall on that node, and the test times out waiting for a chain that
will never exist. The `canister_http_*` family is the canary because
it's the only test family that explicitly waits for this rule on
**every** node of its testnet.

PR #9857 previously narrowed this race by adding
`After=nss-lookup.target` ordering to `reload_nftables.service`, but
`After=` is pure start ordering — by the time the path unit fires, those
targets are long active, so it doesn't guarantee that resolving `hostos`
succeeds. The race window was narrowed, not closed.

## Fix

Make the firewall apply **atomic** and **retried**:

- Prepend `flush ruleset` to the orchestrator-generated nftables config
(both the replica and the boundary-node firewall templates in
`ic.json5.template`), so flushing the old ruleset and loading the new
one happen in a single `nft` transaction. A failed load now leaves the
previously active ruleset in effect instead of leaving the node without
a firewall.
- In `reload_nftables.service`, drop the non-atomic `ExecStartPre`
flush, run `nft -f /etc/nftables.conf` directly, and add
`Restart=on-failure` / `RestartSec=2` (bounded by
`StartLimitIntervalSec=300` / `StartLimitBurst=30`, i.e. ~60s of
retries). Since the load is atomic, retrying is safe. If it keeps
failing, the unit ends up in a failed state, making the problem visible
(e.g. to `//rs/tests/node:guestos_no_failed_systemd_units`).

The 6 nftables golden files are updated accordingly.

## Verification

- `bazel test //rs/orchestrator:orchestrator_test` (nftables golden
tests) passes.
- `bazel test --runs_per_test=//rs/tests/networking:canister_http_test@3
//rs/tests/networking:canister_http_test
//rs/tests/node:guestos_no_failed_systemd_units`

---
This PR was created following the steps in
`.claude/skills/fix-flaky-tests/SKILL.md`.
Updating base container image references.

Run URL: https://github.com/dfinity/ic/actions/runs/27337008974

Co-authored-by: github-merge-queue <118344674+github-merge-queue@users.noreply.github.com>
This moves two further IC-OS tools, dosfstools and mtools, out of the
Dockerfile and into the Bazel build. This is similar to what we did with
e2fsprogs. This makes the build more portable.

See also: #10369
…#10445)

## Root Cause

On PRs where testnet allocation is pinned to the local DC (#10436),
`//rs/tests/networking:canister_http_socks_test` fails with every
outcall attempt erroring as:

```
direct connect ... ConnectionRefused
and connect through socks "... Connector(ConnectError(\"tcp connect error\", ... ConnectionRefused))"
```

The *direct* refusal is intentional (the test injects an egress-reject
rule for httpbin first). The real failure is the SOCKS leg: the nested
`Connector(ConnectError(...))` means the TCP connection **to the SOCKS
proxy itself** — API boundary node port 1080 — was refused, on every
attempt for the entire run, on **both** API BNs.

A "connection refused" (RST) pins this down precisely:

- The API BN firewall uses `policy drop` with an explicit `accept` for
node IPs on port 1080, so a firewall problem would cause *timeouts*, not
RSTs.
- An RST means the BN was up, its global IPv6 was configured in the
kernel, the packet passed the firewall — and **nothing was listening on
:1080**.

`danted.conf` configured the listener as:

```
internal: enp1s0 port = 1080
```

Dante resolves an interface name to its addresses **once at startup**
and never re-binds. GuestOS receives its global IPv6 via SLAAC. If
`danted.service` starts while `enp1s0` only has its link-local address
(router advertisement not yet processed), danted binds the link-local
scope only and the global `[...]:1080` endpoint stays closed forever —
`Restart=always` never kicks in because danted keeps running happily.

This is confirmed directly by the journald logs of one of the failing
API BNs:

```
13:57:58.543  enp1s0: Gained IPv6LL
13:57:58.547  Finished systemd-networkd-wait-online.service - Wait for Network to be Online.
13:57:58.549  Reached target network-online.target - Network is Online.
13:57:58.553  Started danted.service - SOCKS (v4 and v5) proxy daemon (danted).
...
13:57:58.637  danted[1030]: info: Dante/server[1/1] v1.4.4 running
```

`systemd-networkd-wait-online` completed 4 ms after the interface gained
only its **link-local** address — `network-online.target` does not
guarantee a global SLAAC address — and danted started 6 ms later,
binding the link-local address only.

This race got amplified by DC pinning: in the failing run all nine VMs
of the testnet (5 replicas, 2 API BNs, 2 UVMs) were packed onto a single
host, slowing down boot and RA/SLAAC delivery enough to hit the race on
both API BNs at once. The bug is pre-existing on `master`; the DC
pinning only widened the window. The same fragility was previously
patched around in #4658 (`PartOf=systemd-networkd.service` to restart
danted when networkd restarts).

## Fix

Bind the wildcard address instead of an interface name:

```
internal: :: port = 1080
```

A wildcard bind does not depend on address assignment timing —
connections to the global address succeed as soon as the address exists.
Access to the SOCKS proxy remains restricted through the firewall, which
only whitelists node IPs on port 1080 (the config already noted "Allow
everyone - this is already restricted through the firewall").

`external: enp1s0` (the outgoing side) is unchanged.

## Verification

- `bazel test --runs_per_test=3
//rs/tests/networking:canister_http_socks_test` passes 3/3 (avg 186 s)
on the DC-pinned branch that previously reproduced the failure.
This commit introduces XNet reject signals that will fire when messages
are sent to/from engines that are not allowed. The current PR only
introduces the ability to read these signals, without writing them
themselves. The actual implementation of XNet with engines will come in
a follow-up, and will start sending these signals.
…tem-tests (#10436)

# What

Force Farm testnet allocation to the same DC as the GitHub runner
executing the test (which is also the DC holding the just-built images)
— for **every** system-test. This generalizes the opt-in
`.allocate_testnet_to_local_dc()` mechanism introduced in #10122.

# Why

#10122 showed that cross-DC transfers of large images (e.g. 2.6G SetupOS
images from `dm1` to `zh1` or vice versa) cause download timeouts and
flaky tests. The same applies to all system-tests, so testnets should
always be allocated in the DC where the test runs and the images live.

# How

* Replace the `allocate_testnet_to_local_dc` bool on `SystemTestGroup`
(and its builder method) with the `ALLOCATE_TESTNET_TO_LOCAL_DC`
environment variable, read by the test driver in `create_group_setup`
(accepted values: `1`/`true`/`0`/`false`).
* Set `ALLOCATE_TESTNET_TO_LOCAL_DC=1` unconditionally from the
`system_test` macro in `rs/tests/system_tests.bzl`, covering both the
plain and the `_colocate` targets. This remains a no-op when the `DC`
volatile status variable is unknown (e.g. local runs without
`NODE_NAME`).
* Drop the now-redundant `.allocate_testnet_to_local_dc()` calls from
the 7 nested system-tests.
* Replace `dep_download_url` in `rs/tests/upload_systest_dep.sh` to no
longer go via the dc_http_proxy but point directly at the DC-local bazel
cache: `https://artifacts.$cluster.dfinity.network/cas/$dep_sha256`.
* Force the `release-system-tests` job in `release-testing.yml` and the
`system-tests-benchmarks-nightly` job to run in runner group `dm1` (the
`&dind-large-setup` anchor moved to `setup-guest-os-qualification`,
whose jobs keep their current runners).

# Notes for reviewers

* Pinning all tests to the runner's DC concentrates Farm load in `dm1`
where most runners live; watch for allocation failures after rollout.
Extra charts have been added to the [Farm
Dashboard](https://grafana.dm1-idx1.dfinity.network/d/uwEFG_yGk/farm-dashboard?from=now-3h&to=now&timezone=utc&refresh=10s)
for monitoring dm1 and zh1.
* Farm hosts/UVMs now fetch deps from
`artifacts.<cluster>.dfinity.network` over HTTPS (previously plain http
via proxy-global:8080); the redirect server already returns URLs of this
form.
…ly (#10431)

The users should be able to control their own settings for the engines
they use. For now we are scoping that to only `subnet_admins` and
`replica_version_id` but in the future we might want to allow more
things.

---------

Co-authored-by: IDX GitHub Automation <infra+github-automation@dfinity.org>
Co-authored-by: pietrodimarco-dfinity <124565147+pietrodimarco-dfinity@users.noreply.github.com>
This introduces a skill to fix determinism issues, as well as two other
skills that can be used by agents and humans alike to perform IC builds
on non-DFINITY infrastructure.
In theory, at this point, we have already collected all of the logs we
need. Copying the logs here saves the last few logs before a shutdown,
so a handful of the most recent logfiles should be fine.
…10455)

## Problem

Since the migration of the XNet payload builder to HTTP/2 (#1506), a
dead or stalled pooled connection to an XNet endpoint is reused
**indefinitely**:

- Cancelling a request (e.g. due to the 5-second query timeout in
`XNetClientImpl::query`) only resets the respective h2 *stream*; it does
not affect the underlying connection.
- The connection pool only drops connections that report themselves as
closed; a stalled-but-established h2 connection never does.

A connection that ends up dead/stalled (e.g. due to packet loss or CPU
starvation while under load) therefore black-holes **all** pulls from
the respective subnet pair until the replica restarts. Under the
previous HTTP/1.1 client this could not happen, because cancelling a
request closes its connection.

### Observed impact

`//rs/tests/message_routing/xnet:xnet_slo_120_subnets_staging_test`
failed reproducibly: the synchronized startup of 120×119 TLS connections
(while replicas were CPU-starved) wedged 191 directed subnet pairs. Each
wedged pair timed out on every pull (at the 5s timeout, ~2.3/s) for the
entire run with zero recovery, pushing best-effort calls on 20 subnets
past their 30s deadline (5.7%–13.7% failed calls vs. the 5% threshold).
Per-subnet error rate was predicted by the number of wedged pairs
involving that subnet with correlation 0.997.

## Fix

Enable HTTP/2 keep-alive pings on the XNet client: 10s interval, 5s
timeout, also while idle (`pool_max_idle_per_host(1)` keeps an idle
connection pooled for up to 600s). A dead connection is now detected and
closed within ~15s — well below the 30s best-effort deadline — after
which the next query establishes a fresh connection.

Note: this requires supplying an h2 timer via
`.timer(TokioTimer::new())` in addition to the existing
`.pool_timer(...)`; without it hyper panics at runtime once keep-alive
is enabled.

As defense in depth, also set a 5s TCP connect timeout on the
`HttpConnector` underlying `TlsConnector` (both `new` and
`new_for_tests`), so a stuck handshake fails fast rather than relying
solely on the per-query timeout. This does not address the observed
failure on its own — the wedged connections had completed TCP+TLS and
stalled at the h2 layer — but it bounds connection-setup stalls.

Cost: at most ~119 connections per node → ~12 PING frames/s/node; both
endpoints are replica-controlled (hyper's h2 server ACKs pings
natively).

## Verification

On CI the `Release System Tests` job
[succeeds](https://github.com/dfinity/ic/actions/runs/27438248671/job/81105591015)
including
`//rs/tests/message_routing/xnet:xnet_slo_120_subnets_staging_test_colocate`
([BuildBuddy](https://dash.dm1-idx1.dfinity.network/invocation/2d53507f-fe7d-45a8-a006-1fecfd20666e?target=%2F%2Frs%2Ftests%2Fmessage_routing%2Fxnet%3Axnet_slo_120_subnets_staging_test_colocate&targetStatus=5)).
Manual runs (from `dm1`) succeed as well:

```
bazel test //rs/tests/message_routing/xnet:xnet_slo_120_subnets_staging_test \
  --cache_test_results=no --test_output=errors \
  --test_arg=--set-required-host-features=dc=dm1
```

- Before: FAILED — 20/120 subnets above the 5% failed-call threshold
(5.7%–13.7%).
- After: **PASSED in 745.8s — 0 failed calls on all 120 subnets** (worst
subnet: 0/56049).
- `//rs/xnet/payload_builder:payload_builder_test` and
`:payload_builder_integration` pass (the latter exercises real client
queries and would catch the missing-timer panic).

After reducing the ping frequency to a 10s interval and adding the
connect timeout (review follow-up), `bazel test
//rs/tests/message_routing/xnet:xnet_slo_120_subnets_staging_test
--cache_test_results=no` still succeeds, and the unit/integration tests
above still pass.
…istry versions are empty (#10408)

The peer manager maintains connections to all peers that are in the
subnet record in any registry version between the "oldest version in
use" and the latest locally available registry version.

The "oldest registry version in use" is determined by the CUP. In case
of a subnet creation, this CUP is a registry CUP that contains the
registry version just _before_ the subnet was created (see
[here](https://sourcegraph.com/r/github.com/dfinity/ic@8cf338f5ef95432dc61233df889d98837d780466/-/blob/rs/registry/canister/src/mutations/do_create_subnet.rs?L63)).

This means that when the peer manager attempts to make connections to
peers according to the "oldest registry version in use", the subnet
doesn't actually exist yet, which then triggers the
`empty_list_of_node_records` error. This is fine however, because it
will find the correct peers in the subsequent registry version (which is
guaranteed to exist, otherwise we would not have the registry CUP).

To improve the error signal of this metric, we change it such that it is
only incremented if _all_ registry versions do not contain transport
infos, meaning there are no peers shared with the new subnet topology.
…v4 for UVMs (#10464)

All `//rs/tests/networking:canister_http...` tests are [very
flaky](https://dash.dm1-idx1.dfinity.network/invocation/34f3e50b-0713-40e2-a983-b71373683092)
due to their `httpbin` UVMs failing to acquire an IPv4 address in time
in the `dm1` DC.

To get a bit more insight into why this is the case we log `networkctl
status enp2s0` on timeout.
@alin-at-dfinity alin-at-dfinity requested review from a team as code owners June 22, 2026 09:51

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

@mbjorkqvist mbjorkqvist left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeFi-related changes seem to be coming from a sub-optimal merge of master, but approving nonetheless!

@alin-at-dfinity alin-at-dfinity removed request for a team June 22, 2026 10:45
@alin-at-dfinity alin-at-dfinity dismissed github-actions[bot]’s stale review June 22, 2026 10:46

Does not touch any Governance code. This was simply a bad merge (or rather a git pull of changes from GitHub without an explicit git merge).

Lesson learned.

@alin-at-dfinity

Copy link
Copy Markdown
Contributor Author

FWIW, in bench_traversal with 500k canisters with ExecutionState this change has virtually no effect on building a cold HashTree (480 ms -> 490 ms).

But it improves the hot path (reusing subtree digests from an earlier HashTree) by 35%+ (to just under 300 ms).

Comment thread rs/canonical_state/tree_hash/src/lazy_tree.rs Outdated
Comment thread rs/canonical_state/tree_hash/src/lazy_tree.rs Outdated
impl PartialEq for SubtreeSource {
/// A conservative, false-negative-only reuse-gate; see the type-level note.
fn eq(&self, other: &Self) -> bool {
self.addr() == other.addr() && std::ptr::fn_addr_eq(self.expander, other.expander)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need && std::ptr::fn_addr_eq(self.expander, other.expander)? Isn't self.addr() == other.addr() sufficient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. We need the same data AND the same certification version. Equal function pointers implies the same certification version. (The reverse is not true, you could have two functions that produce output with the same certification version, but we don't have that and we wouldn't care anyway.)

It's described in the SubtreeSource doc comment. If that's not clear or it's too far from the spot, I can move it / duplicate it here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if

    // The bare address, used for identity comparison.
    fn addr(&self) -> *const () {
        Arc::as_ptr(&self.) as *const ()
    }

(I thought this is what self.addr() returns when writing this comment.)

Comment thread rs/canonical_state/tree_hash/src/hash_tree.rs Outdated
Comment thread rs/canonical_state/tree_hash/src/hash_tree.rs Outdated
Comment on lines +1371 to +1372
// stand-alone subtree is just `hash_lazy_tree` on that subtree's root, which
// is in turn materialized for the same reason.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash_lazy_tree is not called recursively though so not sure what this means to say

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to (temporarily) building stand-alone subtrees in order to compute the root hash of stub nodes. Let me know if you have a better wording suggestion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want to point out that we use build_tree as opposed to build_child here? If so, I'm not actually sure why we couldn't use build_child here, too.

Or you want to point out that we need to assign the root field? If so, I'd justify it by the fact that the recursive build_child/build_tree functions cannot set the root (because they are recursive).

Comment thread rs/canonical_state/tree_hash/src/hash_tree.rs
Comment thread rs/canonical_state/tree_hash/src/hash_tree.rs Outdated
let children_range = if root.kind() == NodeKind::Stub {
NodeIndexRange::default()
} else {
ht.root_labels_range.clone()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a "hack", right? we set the parent as "root" (via NodeId::empty()) in the above call to build_child so that its labels range is set as ht.root_labels_range (over-writen for every new child which is fine since splice_subtree ignores root_labels_range anyway)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that on the one hand it's just how you build an independent HashTree (you start with the root); and on the other hand, you then need to record the range of its children. This code was already here, see a couple of lines below.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was already here, see a couple of lines below.

Indeed, and I believe your change makes sense, I just wanted to confirm my understanding of using ht.root_labels_range.clone() repeatedly on a shared tree.

Comment thread rs/canonical_state/tree_hash/src/hash_tree.rs Outdated
@daniel-wong-dfinity-org-twin

Copy link
Copy Markdown
Contributor

AFAICT, Governance approval is not needed here and I'm being emailed in error. If that's not right, please, contact me in Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.