Skip to content

[FEATURE][Timpani-n] Store full task list and add workload polling loop#131

Open
jayakrishnan04b wants to merge 6 commits into
eclipse-timpani:development_0.5from
jayakrishnan04b:development_0.5
Open

[FEATURE][Timpani-n] Store full task list and add workload polling loop#131
jayakrishnan04b wants to merge 6 commits into
eclipse-timpani:development_0.5from
jayakrishnan04b:development_0.5

Conversation

@jayakrishnan04b
Copy link
Copy Markdown
Contributor

Summary

Store the full task list in timpani-n Rust schedule state and add a 2-second polling loop to detect workload updates from GetSchedInfo.

Changes

  • Added TaskInfo domain type and full task-list storage in SchedInfo.
  • Added content-based schedule comparison that ignores received_at.
  • Added task_from_proto conversion at the proto/domain boundary.
  • Replaced cancellation-only wait with periodic polling.
  • Kept the current schedule running on NotReady responses.
  • Added unit tests for schedule change-detection behavior.

Issues

Notes

Full teardown and reinitialization for schedule replacement is intentionally left as follow-up work.

- Add TaskInfo domain type mirroring struct task_info (schedinfo.h),
  without runtime fields (pid/pidfd); derives Debug, Clone, PartialEq
- Expand SchedInfo with tasks: Vec<TaskInfo> and received_at: Instant, as a substitute for sequence number.
  drop redundant task_count field
- Implement PartialEq for SchedInfo excluding received_at (content-only
  equality); add content_changed(), is_full_replacement(), task_count()
- Add task_from_proto() converter at the proto/domain boundary in lib.rs
- Convert and store the full task list from GetSchedInfo response
- Replace cancel.cancelled().await stub with a 2s polling loop
  that re-fetches GetSchedInfo and detects workload changes; NotReady
  keeps the current schedule running (no teardown); full teardown+reinit
  is a TODO pending the task module
- Add unit tests for SchedInfo change-detection semantics (100% coverage)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Timpani-N’s runtime schedule state to store the full per-node task list (as a proto-free domain type) and replaces the shutdown-only wait with a periodic (2s) polling loop against GetSchedInfo to detect workload replacements/updates via content comparison (ignoring received_at).

Changes:

  • Introduces TaskInfo and stores the full Vec<TaskInfo> in SchedInfo, adding content-based comparison helpers and unit tests.
  • Adds a proto→domain boundary conversion (task_from_proto) and updates startup logging to use the domain task list.
  • Implements a 2-second workload polling loop that keeps the current schedule on NotReady and updates runtime schedule state on detected changes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
timpani_rust/timpani-n/src/lib.rs Adds proto→domain task conversion and a periodic polling loop to detect and apply schedule changes.
timpani_rust/timpani-n/src/context/mod.rs Adds TaskInfo, expands SchedInfo to store full tasks, implements content equality helpers, and adds unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread timpani_rust/timpani-n/src/lib.rs Outdated
Comment thread timpani_rust/timpani-n/src/lib.rs Outdated
Comment thread timpani_rust/timpani-n/src/lib.rs Outdated
Comment thread timpani_rust/timpani-n/src/lib.rs
Comment thread timpani_rust/timpani-n/src/context/mod.rs Outdated
- Add task/mod.rs: BpfState, TimeTrigger, init_task_list/teardown_task_list
  implementing Linux RT scheduling setup (SCHED_DEADLINE / SCHED_FIFO),
  CPU affinity via CpuSet, and BPF fd management per task
- Add sched/mod.rs extensions: set_cpu_affinity, get_cpu_affinity, set_sched_attr,
  get_sched_attr with 15 unit tests (affinity A1-A2, schedattr B1-B3,
  init_task_list C1-C7, polling D1-D4)
- Apply code review fixes to lib.rs and context/mod.rs:
  * Replace .expect() in polling loop with recoverable warn+continue
  * Handle None sched_info in polling loop (restore from latest poll)
  * Advance received_at even when workload content is unchanged
  * Fix misleading cpu_affinity doc comment (uint64 on wire, not int64)
  * Fix TaskInfo::name doc (limit enforced by helpers, not by the type)
The CI coverage check was failing with 76.0% combined (below 80%).
Two root causes:

1. Untestable entry points counted in denominator:
   - timpani-n/src/main.rs: 0/11 lines (binary entry point, no unit-testable logic)
   - timpani-o/src/main.rs: 2/85 lines (server startup, cannot be unit-tested)
   - timpani-n/tests/*: ignored integration tests inflate miss count
   Excluding these from coverage: 872/1047 → 873/962 lines in scope.

2. Parallel test execution caused ptrace to miss covered lines:
   Sched tests manipulate CPU affinity on the current PID; running them
   concurrently in ptrace mode causes instrumentation to drop hits.
   Fix: add args = ["--test-threads", "1"] in tarpaulin.toml.

Result: local combined coverage 90.75% (was 76.0% in CI, 83.29% without
the args fix). Adds --config tarpaulin.toml to test_coverage.sh.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants