Skip to content

Comments

Allow Datil restore test to work with unequal node counts#83

Open
GTC6244 wants to merge 7 commits intokeysetsfrom
feature/node-4952-adjust-datil-restore-test-to-for-unequal-node-counts
Open

Allow Datil restore test to work with unequal node counts#83
GTC6244 wants to merge 7 commits intokeysetsfrom
feature/node-4952-adjust-datil-restore-test-to-for-unequal-node-counts

Conversation

@GTC6244
Copy link
Contributor

@GTC6244 GTC6244 commented Jan 20, 2026

No description provided.

@GTC6244 GTC6244 self-assigned this Jan 20, 2026
Copilot AI review requested due to automatic review settings January 20, 2026 01:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the Datil restore test to support scenarios where the number of nodes differs from the recovery party size, allowing more flexible test configurations.

Changes:

  • Added graceful handling for missing backup files and blinders when node count exceeds recovery party size
  • Modified assertions to verify successful uploads based on recovery party count rather than expecting all nodes to succeed
  • Added logging for initial node count in version upgrade tests

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
rust/lit-node/lit-node/tests/upgrades/version_upgrades.rs Added logging statement to display initial node count for better test observability
rust/lit-node/lit-node/tests/integration/backup_datil_long.rs Modified backup and blinder upload functions to handle unequal node counts, added error handling for missing files, changed assertions from individual to count-based verification
rust/lit-node/lit-node/tests/common/lit_actions.rs Prefixed unused variables with underscores to suppress compiler warnings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let file = match file {
Ok(file) => file,
Err(e) => {
error!("No file for: {}", e);
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "No file for: {}" is unclear and unhelpful. The error variable 'e' represents the file open error, not the file itself. Consider changing this to something more descriptive like "Failed to open backup file: {}" to accurately reflect what went wrong.

Suggested change
error!("No file for: {}", e);
error!("Failed to open backup file {}: {}", tar_file.display(), e);

Copilot uses AI. Check for mistakes.
success_count += 1;
}
}
assert!(success_count == recovery_party_size);
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion assert!(success_count == recovery_party_size) should use the equality assertion assert_eq!(success_count, recovery_party_size) for better error messages. When the assertion fails, assert_eq! will show the actual values of both sides, making debugging easier.

Suggested change
assert!(success_count == recovery_party_size);
assert_eq!(success_count, recovery_party_size);

Copilot uses AI. Check for mistakes.
info!("Response: {}", response);
public_address
});
};
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semicolon after the closing brace on this line is unnecessary. An 'if' statement does not require a semicolon after its closing brace unless it's part of an expression context. Remove the semicolon for cleaner code.

Suggested change
};
}

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

PASS [ 43.215s] (3/3) lit_node::test toxiproxy::perf_tests::load_with_no_latency
PASS [ 43.282s] (2/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_single_link
PASS [ 87.388s] (1/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_all_links

Base automatically changed from feature/node-4947-set-default-keyset-on-contract-deployment to keysets January 21, 2026 04:17
client: &Client,
validator_collection: &ValidatorCollection,
backup_directory: &PathBuf,
recovery_party_size: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the number of key shares being restored or the number of parties being restored to? Can we restore 4 key shares to 6 parties? If so is recovery_party_size here 4 or 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. Each backup share ( ie, the ones downloaded by the node ops ) can only go to a single new node. In this test, things are a little convoluted since the backup parties and the node ops with backups are equivalent.

In your example, if we have 4 backup shares and 6 nodes, and run this test it will pass. But only because the network receiving the shares will kick the 2 nodes that don't have them - because those 2 nodes can't participate in the DKG. It's all relatively clever. ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't have the test where the backups < recovery parties so no one gets kicked in the existing test. Can we add this additional test as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Naga Prod we only have 6 backup shares, from what I understand. This means we'll need to reduce the network size, regardless of the number of backup parties.

However, I can write a test that uses 5 backup shares, 5 new nodes and 3 recovery parties. This should work as I believe we only need a threshold of the recovery parties to authorize the recovery to proceed. It won't be a short task though - most of these tests to Mike and Ege days to write, and they had prior knowledge of the backup system. I'm going in cold ;-)

Copy link
Contributor

@DashKash54 DashKash54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm we already supported restoring to a larger set of nodes than the backup keys? So we're not doing any actual node or contracts code changes?

info!("Response: {}", response);
public_address
});
if downloaded_blinders.contains_key(&public_address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the test to simulate a 4 -> 6 restore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all pre-existing tests. I just made them work when the backup ##s don't match the restore nodes.

Currently we don't have enough blinders - I'm trying to generate more tomorrow, and can update the test, but there isn't any good reason to do so. 3 -> 5 is no different than 4 -> 6 in terms of what the software does - it's just slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but we should still have a test that validates the kicking of the remaining nodes and the eventual success of the DKG which seems to be missing rn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test does just that .... though you'd have to change the paramaters. I flipped them back to "normal" since it's more efficient to run in CI.

@GTC6244
Copy link
Contributor Author

GTC6244 commented Jan 21, 2026

Just to confirm we already supported restoring to a larger set of nodes than the backup keys? So we're not doing any actual node or contracts code changes?

Sort of ? We restore to a larger set by uploading what we have and kicking the remaining nodes because the DKG can't be processed by nodes without backups. This test proves that this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants