Skip to content

Introduce Spectral Sentinel — an online subspace anomaly detector#827

Open
da2ce7 wants to merge 1 commit into
torrust:developfrom
da2ce7:20260219_sentinel
Open

Introduce Spectral Sentinel — an online subspace anomaly detector#827
da2ce7 wants to merge 1 commit into
torrust:developfrom
da2ce7:20260219_sentinel

Conversation

@da2ce7
Copy link
Copy Markdown
Contributor

@da2ce7 da2ce7 commented Feb 19, 2026

A sentinel is not a judge. It stands watch, keeps its bearings, and reports what has changed.

That is the idea behind Spectral Sentinel. It does not decide whether a pattern is dangerous, important, or actionable. It measures structure in a stream, learns what has been ordinary so far, and returns statistical readouts when new observations depart from the learned geometry.

The "spectral" part is literal: each selected region is modelled with low-rank subspace trackers, and a second tier models the spectrum of those scores across related cells. The "sentinel" part is restraint: the crate observes, scores, and reports, but policy stays with the host.


Summary

This PR adds torrust-sentinel to the workspace: a library crate for hierarchical online subspace anomaly detection over positionally structured observation streams.

It is built on top of Mudlark. Mudlark provides the adaptive spatial substrate — regions that receive more observation volume earn finer resolution, quiet regions remain coarse. Spectral Sentinel uses that structure to decide where statistical trackers are worth maintaining. It selects significant V-Tree entries, closes them under G-tree ancestry so every selected cell has a complete ancestor chain back to the root, and scores incoming batches against learned subspace models at every selected scale.

The simplest way to think about it: Mudlark decides where the stream has shape; Sentinel measures whether the recent shape still looks like what that region has learned to expect.

The crate is deliberately policy-free. Reports carry raw measurements — four scoring axes (novelty, displacement, surprise, coherence), maturity, baselines, CUSUM drift accumulators, geometry, contour summaries, and health snapshots. They do not encode threat levels, recommended actions, or decisions. The host reads the measurements and decides what they mean.

The core invariant is feed-forward: every input value updates Mudlark with exactly one unit of observation volume. Anomaly scores never flow back into the spatial index. That keeps spatial adaptation driven by traffic structure, not by the detector's own conclusions. Temporal policy is host-controlled: Sentinel never applies decay automatically.

A second analysis tier — coordination trackers — runs at internal G-tree nodes whose subtrees both contribute competitive cells. It scores cross-cell patterns of the four axes, so a coordinated shift that no single cell would flag still surfaces in the report.

Contents

The addition is substantial, but almost entirely self-contained within packages/sentinel.

  • A new torrust-sentinel crate (0.1.0) with a narrow public surface exposed through flat crate-root re-exports
  • SpectralSentinel<C, V, N> as the generic engine, with Sentinel128 and Sentinel64 aliases for the common domain widths
  • SentinelConfig, NoiseSchedule, and SvdStrategy for host-controlled measurement parameters, with structured ConfigError / ConfigErrors / ConfigWarning validation rather than panics
  • Public readout types for batch, cell, coordination, contour, health, maturity, geometry, baseline, score, and analysis-set summaries — ordered deterministically by GNodeId
  • Analysis-set selection over Mudlark's G-V Graph: top-K competitive V-entries plus ancestor closure into the investment set
  • Per-cell subspace trackers scoring novelty, displacement, surprise, and coherence; EWMA baselines with upper-tail clipping; CUSUM drift accumulators with a separate slow EWMA
  • Hierarchical coordination trackers scoring cross-cell score patterns at G-tree internal nodes
  • Automatic synthetic noise warm-up for every newly created tracker, plus a deferred staging area and optional background warming thread so cell creation does not block the ingest hot path
  • Brand incremental SVD for subspace evolution, with a naive thin-SVD path used both as a fallback for small/numerically sensitive cases and as a debug-mode oracle
  • Optional serde support (off by default; pulls in torrust-mudlark/serde)
  • A Criterion benchmark suite (15 bench functions across 8 groups: encoding, ingest, auxiliary, convergence, scaling, temporal, analysis)
  • 22 architecture decision records covering measurement-not-opinions, the feed-forward invariant, Mudlark integration, deterministic ordering, analysis-set recomputation, automatic noise injection, scoring geometry, decay semantics, routing, degenerate-dimension guards, test budgets, warm-up convergence, visibility, cell-creation performance, Brand SVD, deferred warm-up, generic domain parameters, investment-set terminology, and the clip-pressure / mean-centred-variance EWMA refinements
  • README, public API reference, algorithm document, and implementation notes (~4.7k lines of crate-level documentation total)
  • ~560 #[test] functions across crate-level tests (src/tests/) and integration tests (tests/), plus the README compiled as a doc-test via #[cfg(doctest)] include_str!
  • Two pedagogy integration tests (pedagogy.rs, pedagogy_advanced.rs) written to be read end-to-end as a walkthrough of the public surface

Changes outside sentinel

  • Cargo.toml / Cargo.lockpackages/sentinel added as a workspace member; lock file regenerated for the new dependency closure (faer, rand_distr, plus dev-only criterion and tracing-subscriber)
  • AGENTS.md — adds Sentinel's S- cross-reference prefix to the package table and ADR examples

Reviewing this

The best starting point is the public surface:

packages/sentinel/src/lib.rspackages/sentinel/docs/api.mdpackages/sentinel/README.md

From there, the main implementation path is src/sentinel/mod.rs for the orchestrator, src/analysis_set.rs for competitive selection and ancestor closure, src/sentinel/tracker.rs for per-cell scoring, src/sentinel/{cusum,staging,warming_thread}.rs for drift and warm-up, and src/maths/ for the SVD plumbing.

For a focused review, I would look at:

  • the public API shape and the flat crate-root re-exports
  • configuration validation, defaults, and the structured error/warning types
  • the feed-forward Mudlark integration (Δ = 1 per observation, scores never feed back)
  • report semantics, deterministic ordering, and what is and isn't part of the public surface
  • tracker warm-up, the deferred staging area, and the optional background warming thread
  • the Brand SVD fallback boundary (small d, narrow rank gaps) and the debug-mode oracle path
  • integration tests that assert invariants through the public API only

The pedagogy tests are intended to be readable end-to-end; running cargo test -p torrust-sentinel --test pedagogy -- --nocapture produces a narrated walk through the public surface.

Notes

  • Ships at 0.1.0. The public surface is intentional, but follows pre-1.0 SemVer rules until the crate reaches 1.0.
  • MSRV 1.88, inherited from the workspace.
  • No unsafe code; #![forbid(unsafe_code)] at the crate root.
  • AGPL-3.0-only, inherited from the workspace. Unlike Mudlark, no linking exception is shipped with this crate.
  • Default features: none. serde is opt-in.
  • The crate measures only. Interpretation and response remain external host policy.
  • Temporal policy is host-controlled: Sentinel never applies decay automatically.
  • Configuration prefers structured errors over panics.
  • Sentinel docs and ADRs use the S- cross-reference prefix added in this PR.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 91.76115% with 218 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.37%. Comparing base (843aaff) to head (bab66fa).

Files with missing lines Patch % Lines
packages/sentinel/src/sentinel/mod.rs 91.34% 53 Missing and 21 partials ⚠️
packages/sentinel/src/config.rs 78.02% 49 Missing ⚠️
packages/sentinel/src/maths/bench_tracing.rs 0.00% 45 Missing ⚠️
packages/sentinel/src/sentinel/tracker.rs 96.13% 13 Missing and 4 partials ⚠️
packages/sentinel/src/maths/mod.rs 91.25% 13 Missing and 1 partial ⚠️
packages/sentinel/src/maths/brand_svd.rs 91.46% 6 Missing and 1 partial ⚠️
packages/sentinel/src/sentinel/staging.rs 98.56% 3 Missing and 3 partials ⚠️
packages/sentinel/src/sentinel/warming_thread.rs 96.05% 3 Missing ⚠️
packages/sentinel/src/analysis_set.rs 98.33% 2 Missing ⚠️
packages/sentinel/src/maths/naive_svd.rs 97.61% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #827      +/-   ##
===========================================
+ Coverage    68.52%   72.37%   +3.85%     
===========================================
  Files          161      175      +14     
  Lines        13111    15757    +2646     
  Branches     13111    15757    +2646     
===========================================
+ Hits          8984    11404    +2420     
- Misses        3853     4048     +195     
- Partials       274      305      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 98b82af to 0d4bf03 Compare March 25, 2026 05:05
@da2ce7 da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Apr 23, 2026
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 0d4bf03 to 84defab Compare April 23, 2026 10:36
@da2ce7 da2ce7 added Needs Rebase Base Branch has Incompatibilities and removed Needs Rebase Base Branch has Incompatibilities labels Apr 23, 2026
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 84defab to 7d2fd0c Compare May 1, 2026 12:50
@da2ce7 da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label May 1, 2026
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 7d2fd0c to 57ac0f5 Compare May 12, 2026 17:37
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 57ac0f5 to 3972d50 Compare May 12, 2026 22:39
@da2ce7 da2ce7 changed the title 20260219 sentinel Introduce Sentinel — an online subspace anomaly detector May 12, 2026
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 3972d50 to afde2c7 Compare May 12, 2026 23:01
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from afde2c7 to 022a672 Compare May 12, 2026 23:17
@da2ce7 da2ce7 marked this pull request as ready for review May 12, 2026 23:26
Copilot AI review requested due to automatic review settings May 12, 2026 23:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 022a672 to 83c74ef Compare May 13, 2026 18:39
@da2ce7 da2ce7 changed the title Introduce Sentinel — an online subspace anomaly detector Introduce Spectral Sentinel — an online subspace anomaly detector May 14, 2026
@da2ce7 da2ce7 force-pushed the 20260219_sentinel branch from 83c74ef to a46db44 Compare May 15, 2026 09:44
Introduce the new sub-package called "sentinel", that observes positionally structured observation streams.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants