Skip to content

fix: remove static PreprocessingRegistry — prevents concurrent build race conditions#935

Merged
ooples merged 6 commits intomasterfrom
fix/remove-static-preprocessing-registry
Mar 7, 2026
Merged

fix: remove static PreprocessingRegistry — prevents concurrent build race conditions#935
ooples merged 6 commits intomasterfrom
fix/remove-static-preprocessing-registry

Conversation

@ooples
Copy link
Copy Markdown
Owner

@ooples ooples commented Mar 7, 2026

Summary

Fixes #931PreprocessingRegistry<T, TInput> was a static global singleton that stored the "current" preprocessing pipeline. When multiple AiModelBuilder instances built models concurrently, they overwrote each other's pipeline, causing race conditions and cross-contamination.

  • Removed the 3 PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline assignments from AiModelBuilder.ConfigurePreprocessing() (lines 391, 435, 480). The pipeline was already stored as an instance field and flows to AiModelResult via PreprocessingInfo.
  • Removed the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing() (same pattern, same race condition).
  • Deprecated PreprocessingRegistry<T, TInput> with [Obsolete] attribute — not deleted, in case external code references it.
  • Replaced PreprocessingRegistry usage in DocumentNeuralNetworkBase.PreprocessDocument() with a new instance-level PreprocessingTransformer property (thread-safe, no global state).
  • Added 4 integration tests verifying concurrent builds don't cross-contaminate and the deprecated registry still compiles.

Test plan

  • PreprocessingRegistryIntegrationTests — 4 tests all pass
    • ConcurrentBuilds_WithDifferentPreprocessing_DoNotCrossContaminate
    • BuildAsync_WithPreprocessing_Succeeds_WithoutRegistry
    • DeprecatedRegistry_StillFunctionsForBackwardCompatibility
    • SequentialBuilds_DoNotLeaveStaleRegistryState
  • LinearRegressionIntegrationTests — all 24 existing tests pass
  • dotnet build src/AiDotNet.csproj -f net10.0 — succeeds
  • dotnet build tests/AiDotNet.Tests/AiDotNetTests.csproj -f net10.0 — succeeds

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-model preprocessing and postprocessing transformer support for individual model builds.
  • Improvements

    • Stopped automatic global registry wiring to prevent cross-model contamination and race conditions; pipelines are now applied per-instance during build and inference.
  • Deprecations

    • Static preprocessing/postprocessing registries marked obsolete; prefer per-instance configuration via the model builder.
  • Tests

    • Integration tests added to verify isolation and deprecated-registry backward compatibility.

)

Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline`
assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous
PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline
is already stored as an instance field on the builder and flows to AiModelResult
via PreprocessingInfo — the global static registry was unnecessary and caused
race conditions when multiple builders ran concurrently.

Changes:
- Remove 6 static registry assignments from AiModelBuilder.cs
- Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class
- Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with
  an instance-level PreprocessingTransformer property
- Add 4 integration tests verifying concurrent builds don't cross-contaminate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 13:47
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aidotnet_website Ready Ready Preview, Comment Mar 7, 2026 4:45pm
aidotnet-playground-api Ready Ready Preview, Comment Mar 7, 2026 4:45pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces global static preprocessing/postprocessing registry wiring with per-instance transformers: AiModelBuilder no longer mutates static registries, DocumentNeuralNetworkBase gains instance-level PreprocessingTransformer/PostprocessingTransformer used at runtime, both registries marked [Obsolete], and integration tests assert no cross-model contamination.

Changes

Cohort / File(s) Summary
Builder edits
src/AiModelBuilder.cs
Removed writes to PreprocessingRegistry<T, TInput>.Current and PostprocessingRegistry<T, TOutput>.Current; builder now retains pipelines per-instance and wires them into models during build. BLOCKING: verify every Configure* overload consistently sets the builder fields and that no global side-effects remain.
Model instance transformers
src/Document/DocumentNeuralNetworkBase.cs
Added protected IDataTransformer<T, Tensor<T>, Tensor<T>>? PreprocessingTransformer and PostprocessingTransformer plus internal void ConfigureTransformers(...); PreprocessDocument/PostprocessOutput prefer instance transformers when set and enforce IsFitted. BLOCKING: check IsFitted checks, lifecycle, null handling, error messages, and flag any TODOs/placeholders.
Registry deprecation
src/Preprocessing/PreprocessingRegistry.cs, src/Postprocessing/PostprocessingRegistry.cs
Marked static registries with [Obsolete(...)] and updated docs to recommend builder-based configuration; runtime behavior retained for backward compatibility. BLOCKING: confirm obsoletion messages direct users and ensure retained API cannot be implicitly re-populated by other code paths.
Tests
tests/.../PreprocessingRegistryIntegrationTests.cs
Added integration tests validating concurrent/sequential builds do not configure the static registry and that deprecated registry still functions for backward compatibility. BLOCKING: ensure tests properly isolate/teardown global state, avoid flakiness, and obsolete-warning pragmas are correctly scoped.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer/Test
    participant Builder as AiModelBuilder
    participant Model as DocumentNeuralNetworkBase
    participant Registry as Pre/PostprocessingRegistry (obsolete)

    Dev->>Builder: ConfigurePreprocessing(pre) / ConfigurePostprocessing(post)
    Builder-->>Builder: store pipelines on builder instance
    Dev->>Builder: BuildSupervisedAsync(...) / BuildProgramSynthesisInferenceOnlyResult
    Builder->>Model: instantiate model
    Builder->>Model: call ConfigureTransformers(pre, post)
    alt PreprocessingTransformer present & fitted
        Model->>Model: Preprocess using instance transformer
    else
        Model->>Model: ApplyDefaultPreprocessing
    end
    Model->>Model: Run training/inference
    alt PostprocessingTransformer present & fitted
        Model->>Model: Postprocess using instance transformer
    else
        Model->>Model: ApplyDefaultPostprocessing
    end
    Note over Registry: Registry marked obsolete and no longer written by Builder\n(kept readable for backward compatibility)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

Pipelines moved into each builder's care,
Threads no longer tangle in global air,
Obsolete whispers guide the slow repair,
Tests stand guard so builds remain fair,
Small changes, cleaner flows — all handled with care.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly summarizes the main change: removing static PreprocessingRegistry to fix concurrent build race conditions, which is the primary objective of the PR.
Linked Issues check ✅ Passed Changes fully implement issue #931 objectives: removed static registry assignments, deprecated registry classes, introduced instance-level transformers, prevented race conditions, and added validation tests.
Out of Scope Changes check ✅ Passed All changes directly address the race condition issue; no unrelated modifications or scope creep detected across modified files.
Docstring Coverage ✅ Passed Docstring coverage is 88.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/remove-static-preprocessing-registry

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #931 by removing reliance on static/global preprocessing (and postprocessing) registries that can be overwritten during concurrent AiModelBuilder usage, aiming to prevent cross-contamination between parallel model builds.

Changes:

  • Removed PreprocessingRegistry<T,TInput>.Current = ... assignments from AiModelBuilder.ConfigurePreprocessing() overloads.
  • Marked PreprocessingRegistry<T,TInput> as [Obsolete] and added deprecation guidance in its XML docs.
  • Replaced DocumentNeuralNetworkBase preprocessing from static registry lookup to an instance-level PreprocessingTransformer property.
  • Added integration tests asserting the static preprocessing registry is not set by model building and remains usable as a deprecated API.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs Adds integration coverage for concurrent/sequential builds and deprecated registry behavior.
src/Preprocessing/PreprocessingRegistry.cs Deprecates the static preprocessing registry with [Obsolete] and updated documentation.
src/Document/DocumentNeuralNetworkBase.cs Switches document preprocessing from static registry usage to an instance-level transformer property.
src/AiModelBuilder.cs Removes global registry writes for preprocessing and postprocessing configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Document/DocumentNeuralNetworkBase.cs
Comment thread src/AiModelBuilder.cs
Comment thread src/AiModelBuilder.cs
Comment thread src/AiModelBuilder.cs
Comment thread src/Preprocessing/PreprocessingRegistry.cs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Document/DocumentNeuralNetworkBase.cs (1)

239-249: ⚠️ Potential issue | 🟠 Major

Inconsistency: PostprocessOutput still uses static PostprocessingRegistry.

The PR objectives explicitly state removing PostprocessingRegistry assignments from ConfigurePostprocessing() to prevent the same race conditions. However, this method still reads from the static registry:

if (PostprocessingRegistry<T, Tensor<T>>.IsConfigured)
{
    return PostprocessingRegistry<T, Tensor<T>>.Transform(modelOutput);
}

This creates the same TOCTOU race condition for postprocessing that you fixed for preprocessing. For consistency, this should use an instance-level PostprocessingTransformer property analogous to PreprocessingTransformer.

,

Proposed fix

Add an instance-level postprocessing transformer similar to preprocessing:

+    /// <summary>
+    /// Gets or sets the instance-level postprocessing transformer for this document model.
+    /// </summary>
+    protected IDataTransformer<T, Tensor<T>, Tensor<T>>? PostprocessingTransformer { get; set; }

Then update PostprocessOutput:

     protected Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
     {
-        // Priority 1: User-configured pipeline via AiModelBuilder
-        if (PostprocessingRegistry<T, Tensor<T>>.IsConfigured)
+        // Priority 1: Instance-level transformer (set explicitly on this model)
+        var transformer = PostprocessingTransformer;
+        if (transformer is not null && transformer.IsFitted)
         {
-            return PostprocessingRegistry<T, Tensor<T>>.Transform(modelOutput);
+            return transformer.Transform(modelOutput);
         }
 
         // Priority 2: Model-specific industry-standard defaults
         return ApplyDefaultPostprocessing(modelOutput);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Document/DocumentNeuralNetworkBase.cs` around lines 239 - 249,
PostprocessOutput currently reads the static PostprocessingRegistry<T,
Tensor<T>> (PostprocessingRegistry.IsConfigured / Transform) which reintroduces
the TOCTOU race you eliminated for preprocessing; replace use of the static
registry with an instance-level PostprocessingTransformer property (mirror how
PreprocessingTransformer is used), set that transformer in
ConfigurePostprocessing instead of assigning the static registry, and have
PostprocessOutput call PostprocessingTransformer.Transform(modelOutput) when
non-null (falling back to ApplyDefaultPostprocessing when null) so all
postprocessing is instance-scoped and consistent with preprocessing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 64-73: Add positive assertions to verify instance-level
preprocessing on each result: for both results (results[0] and results[1] or the
loop variable), assert that result.PreprocessingInfo is not null and that
result.PreprocessingInfo.IsFitted is true (use the null-conditional check like
result.PreprocessingInfo?.IsFitted ?? false if needed). Place these assertions
immediately after the existing NotNull checks so the test confirms preprocessing
flows via the instance-level PreprocessingInfo rather than the static
PreprocessingRegistry.

---

Outside diff comments:
In `@src/Document/DocumentNeuralNetworkBase.cs`:
- Around line 239-249: PostprocessOutput currently reads the static
PostprocessingRegistry<T, Tensor<T>> (PostprocessingRegistry.IsConfigured /
Transform) which reintroduces the TOCTOU race you eliminated for preprocessing;
replace use of the static registry with an instance-level
PostprocessingTransformer property (mirror how PreprocessingTransformer is
used), set that transformer in ConfigurePostprocessing instead of assigning the
static registry, and have PostprocessOutput call
PostprocessingTransformer.Transform(modelOutput) when non-null (falling back to
ApplyDefaultPostprocessing when null) so all postprocessing is instance-scoped
and consistent with preprocessing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: faf8dd41-26e8-4977-ac42-41533098adcd

📥 Commits

Reviewing files that changed from the base of the PR and between 640251b and 0f885f6.

📒 Files selected for processing (4)
  • src/AiModelBuilder.cs
  • src/Document/DocumentNeuralNetworkBase.cs
  • src/Preprocessing/PreprocessingRegistry.cs
  • tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs
💤 Files with no reviewable changes (1)
  • src/AiModelBuilder.cs

…ty, docs

- Fix PreprocessingTransformer docs: clarify protected accessibility
- Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase
- Replace PostprocessingRegistry static usage with instance-based transformer
- Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry)
- Update PreprocessingRegistry.Current docs to reflect no longer auto-set
- Fix concurrent test to use separate data loaders (thread-safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (1)

72-81: ⚠️ Potential issue | 🔴 Critical

Blocking: these tests still don't prove the new instance-local preprocessing path works.

Assert.NotNull(...) plus IsConfigured == false only proves the old global path stayed empty. These tests still pass if preprocessing is silently dropped altogether. Add positive assertions on AiModelResult.PreprocessingInfo so the test fails when the fitted pipeline is not carried through the new instance-local path.

✅ Concrete assertions to add
         Assert.NotNull(results[0]);
+        Assert.NotNull(results[0].PreprocessingInfo);
+        Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false);
         Assert.NotNull(results[1]);
+        Assert.NotNull(results[1].PreprocessingInfo);
+        Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false);
@@
         Assert.NotNull(result);
+        Assert.NotNull(result.PreprocessingInfo);
+        Assert.True(result.PreprocessingInfo?.IsFitted ?? false);
@@
         Assert.NotNull(result1);
+        Assert.NotNull(result1.PreprocessingInfo);
+        Assert.True(result1.PreprocessingInfo?.IsFitted ?? false);
         Assert.NotNull(result2);
+        Assert.NotNull(result2.PreprocessingInfo);
+        Assert.True(result2.PreprocessingInfo?.IsFitted ?? false);

As per coding guidelines, "Tests MUST be production-quality" and "Trivial assertions: Tests that only check Assert.NotNull when they should verify actual behavior/values" are blocking issues.

Also applies to: 114-121, 185-193

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`
around lines 72 - 81, The tests currently only assert results[0]/results[1] are
non-null and that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is
false; add positive assertions that the fitted preprocessing pipeline is passed
via the instance-local path by checking AiModelResult.PreprocessingInfo on each
result (e.g., results[0].PreprocessingInfo and results[1].PreprocessingInfo) is
not null and contains expected metadata/transform count or types from the
training run so the test fails if preprocessing was dropped; update the
assertions around the same test cases (including the other occurrences at the
referenced ranges) to validate concrete properties of PreprocessingInfo rather
than only NotNull.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Preprocessing/PreprocessingRegistry.cs`:
- Around line 10-16: Update the XML documentation in PreprocessingRegistry (and
any remaining remarks in this file) to remove the old singleton/default
guidance: delete or rewrite any text that claims this registry is the active
global preprocessing path or that configured preprocessing is automatically
applied to all models; instead clearly state the API is obsolete, refer callers
to AiModelBuilder.ConfigurePreprocessing() for per-builder pipelines, and
mention that preprocessing is carried forward via AiModelResult and
PreprocessingInfo rather than applied globally by PreprocessingRegistry.

---

Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 72-81: The tests currently only assert results[0]/results[1] are
non-null and that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is
false; add positive assertions that the fitted preprocessing pipeline is passed
via the instance-local path by checking AiModelResult.PreprocessingInfo on each
result (e.g., results[0].PreprocessingInfo and results[1].PreprocessingInfo) is
not null and contains expected metadata/transform count or types from the
training run so the test fails if preprocessing was dropped; update the
assertions around the same test cases (including the other occurrences at the
referenced ranges) to validate concrete properties of PreprocessingInfo rather
than only NotNull.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eba46efd-bea3-447f-b68e-32f802c2777b

📥 Commits

Reviewing files that changed from the base of the PR and between 0f885f6 and 1f1d751.

📒 Files selected for processing (4)
  • src/Document/DocumentNeuralNetworkBase.cs
  • src/Postprocessing/PostprocessingRegistry.cs
  • src/Preprocessing/PreprocessingRegistry.cs
  • tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs

Comment thread src/Preprocessing/PreprocessingRegistry.cs
Changed PreprocessingTransformer and PostprocessingTransformer from
protected to protected internal so AiModelBuilder can set them.
BuildSupervisedInternalAsync now wires the preprocessing pipeline onto
DocumentNeuralNetworkBase models, replacing the static registry approach
that caused race conditions during concurrent builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/AiModelBuilder.cs Outdated
Comment thread src/Document/DocumentNeuralNetworkBase.cs Outdated
Comment thread src/Preprocessing/PreprocessingRegistry.cs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Document/DocumentNeuralNetworkBase.cs`:
- Around line 201-206: The code currently ignores a configured but unfitted
transformer; update the preprocessing and postprocessing methods that reference
PreprocessingTransformer and PostprocessingTransformer so that if a transformer
is attached but transformer.IsFitted is false you fail fast (throw a clear
InvalidOperationException or similar) instead of falling back to the default
pipeline; specifically, change the logic from "if (transformer is not null &&
transformer.IsFitted) return transformer.Transform(...)" to first check "if
(transformer is not null) { if (!transformer.IsFitted) throw new
InvalidOperationException(...); return transformer.Transform(...); }" for both
the preprocessing and postprocessing code paths so misconfiguration is surfaced
immediately.
- Around line 126-139: The properties PreprocessingTransformer and
PostprocessingTransformer on DocumentNeuralNetworkBase are currently declared
protected internal, exposing builder plumbing to external inheritors; change
their accessibility to protected only and add an internal wiring method (e.g.,
ConfigureTransformers) that accepts the two IDataTransformer<T, Tensor<T>,
Tensor<T>>? parameters and assigns them to PreprocessingTransformer and
PostprocessingTransformer so AiModelBuilder (same assembly) can set them without
widening the public inheritance surface.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6df55347-4489-4d7d-a5ec-22e39a96e792

📥 Commits

Reviewing files that changed from the base of the PR and between 1f1d751 and 4abcfe0.

📒 Files selected for processing (2)
  • src/AiModelBuilder.cs
  • src/Document/DocumentNeuralNetworkBase.cs

Comment thread src/Document/DocumentNeuralNetworkBase.cs Outdated
Comment thread src/Document/DocumentNeuralNetworkBase.cs
…essing

- Changed PreprocessingTransformer/PostprocessingTransformer back to protected
  and added internal ConfigureTransformers() method for AiModelBuilder wiring
- PreprocessDocument/PostprocessOutput now throw if transformer is set but not
  fitted instead of silently falling back to defaults
- Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase
- Updated PreprocessingRegistry docs to reflect deprecated/removed status
- Added [Collection] to registry tests and try/finally cleanup
- Fixed "override" wording to "assign" in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/Preprocessing/PreprocessingRegistry.cs (1)

17-20: ⚠️ Potential issue | 🟡 Minor

Finish the obsolete-registry doc cleanup.

The class-level remarks are now correct, but Transform() still says models can safely call this API when preprocessing is missing. That leaves the old global-registry guidance in place and contradicts the new backward-compatibility-only message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Preprocessing/PreprocessingRegistry.cs` around lines 17 - 20, Update the
XML doc on PreprocessingRegistry.Transform() to remove the outdated guidance
that models can safely call this API when preprocessing is missing and instead
state that the method exists only for backward compatibility with external code
and will be removed in a future version; ensure the Transform() summary/remarks
align with the class-level remarks in PreprocessingRegistry and mention
deprecation/backward-compatibility status rather than recommending new callers
use it.
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (2)

75-84: ⚠️ Potential issue | 🟠 Major

Assertions are insufficient to verify the test's stated goal.

The test claims to verify "DoNotCrossContaminate" but only checks:

  1. Results are not null (trivial)
  2. Static registry was not set (proves registry isn't used)

This does NOT verify that preprocessing actually worked correctly through the instance-level path. If preprocessing is completely broken (neither registry nor instance-level), this test still passes. Add assertions on PreprocessingInfo to confirm the instance-level flow:

         // Assert: both builds succeed
         Assert.NotNull(results[0]);
         Assert.NotNull(results[1]);
+
+        // Assert: preprocessing flowed through instance-level PreprocessingInfo
+        Assert.NotNull(results[0].PreprocessingInfo);
+        Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false,
+            "Preprocessing should be fitted via instance-level flow");
+        Assert.NotNull(results[1].PreprocessingInfo);
+        Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false,
+            "Preprocessing should be fitted via instance-level flow");

To truly verify "no cross-contamination", consider also asserting that each model's scaler type is preserved (e.g., by checking pipeline step types or transformed output characteristics).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`
around lines 75 - 84, The current assertions only check non-null results and
that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false; update
the test to also validate instance-level preprocessing by inspecting each
result's PreprocessingInfo (e.g., results[0].PreprocessingInfo and
results[1].PreprocessingInfo) to ensure they are not null and contain the
expected pipeline/step types (such as scaler/transformer type) for each model,
and add assertions that the two PreprocessingInfo instances preserve their
respective scaler/transform types (or produce expected transformed-output
characteristics) to prove preprocessing flowed instance-locally and did not
cross-contaminate.

193-200: ⚠️ Potential issue | 🟠 Major

Same trivial assertion pattern - doesn't verify preprocessing actually worked.

Both builds could produce garbage results and this test would still pass. Add assertions verifying PreprocessingInfo was properly configured for each result:

         // Assert: both succeeded
         Assert.NotNull(result1);
         Assert.NotNull(result2);
+
+        // Assert: preprocessing flowed through instance-level PreprocessingInfo
+        Assert.NotNull(result1.PreprocessingInfo);
+        Assert.True(result1.PreprocessingInfo?.IsFitted ?? false);
+        Assert.NotNull(result2.PreprocessingInfo);
+        Assert.True(result2.PreprocessingInfo?.IsFitted ?? false);

Without these assertions, the test verifies that the registry remains empty, but not that instance-level preprocessing actually functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`
around lines 193 - 200, The test only checks registry emptiness but not that
each build actually configured preprocessing; update the assertions to verify
instance-level preprocessing on result1 and result2 by asserting their
PreprocessingInfo is not null and indicates configuration (e.g.,
Assert.NotNull(result1.PreprocessingInfo);
Assert.True(result1.PreprocessingInfo.IsConfigured) or equivalent property/check
on PreprocessingInfo), and repeat for result2 so the test validates that
PreprocessingInfo was properly set on each result while
PreprocessingRegistry<double, Matrix<double>> remains unconfigured.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/AiModelBuilder.cs`:
- Around line 1871-1882: Extract the transformer wiring into a private helper
(e.g., ConfigureDocumentTransformers) that accepts the model (IFullModel<T,
TInput, TOutput>? or object) and inside checks "if (model is
Document.DocumentNeuralNetworkBase<T> documentModel)" then casts
_preprocessingPipeline/_postprocessingPipeline to IDataTransformer<T, Tensor<T>,
Tensor<T>> and calls documentModel.ConfigureTransformers(preTransformer,
postTransformer) (allowing nulls to clear prior state); call this helper from
every build path that may return a DocumentNeuralNetworkBase<T> including
BuildProgramSynthesisInferenceOnlyResult() so transformers are always
set/cleared for reused instances.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 117-124: The test currently only asserts the builder returned a
non-null result and that PreprocessingRegistry<double,
Matrix<double>>.IsConfigured is false; instead, verify preprocessing actually
ran by asserting the built model/result contains a non-null PreprocessingInfo
(e.g., result.PreprocessingInfo or result.Model.PreprocessingInfo), that
PreprocessingInfo lists the expected transforms (scaling/normalization entries),
and that the stored scale/shift parameters match the training data statistics
(means/variances) used in the test dataset; keep the original check that the
static PreprocessingRegistry<double, Matrix<double>>.IsConfigured remains false
so the registry wasn't set globally, and replace the lone Assert.NotNull(result)
with these behavioral assertions to prove preprocessing was applied and
configured.

---

Duplicate comments:
In `@src/Preprocessing/PreprocessingRegistry.cs`:
- Around line 17-20: Update the XML doc on PreprocessingRegistry.Transform() to
remove the outdated guidance that models can safely call this API when
preprocessing is missing and instead state that the method exists only for
backward compatibility with external code and will be removed in a future
version; ensure the Transform() summary/remarks align with the class-level
remarks in PreprocessingRegistry and mention deprecation/backward-compatibility
status rather than recommending new callers use it.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 75-84: The current assertions only check non-null results and that
PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false; update the
test to also validate instance-level preprocessing by inspecting each result's
PreprocessingInfo (e.g., results[0].PreprocessingInfo and
results[1].PreprocessingInfo) to ensure they are not null and contain the
expected pipeline/step types (such as scaler/transformer type) for each model,
and add assertions that the two PreprocessingInfo instances preserve their
respective scaler/transform types (or produce expected transformed-output
characteristics) to prove preprocessing flowed instance-locally and did not
cross-contaminate.
- Around line 193-200: The test only checks registry emptiness but not that each
build actually configured preprocessing; update the assertions to verify
instance-level preprocessing on result1 and result2 by asserting their
PreprocessingInfo is not null and indicates configuration (e.g.,
Assert.NotNull(result1.PreprocessingInfo);
Assert.True(result1.PreprocessingInfo.IsConfigured) or equivalent property/check
on PreprocessingInfo), and repeat for result2 so the test validates that
PreprocessingInfo was properly set on each result while
PreprocessingRegistry<double, Matrix<double>> remains unconfigured.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 57bc0144-6e6a-4b17-9661-038cc33b4930

📥 Commits

Reviewing files that changed from the base of the PR and between 4abcfe0 and b2af5e7.

📒 Files selected for processing (4)
  • src/AiModelBuilder.cs
  • src/Document/DocumentNeuralNetworkBase.cs
  • src/Preprocessing/PreprocessingRegistry.cs
  • tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs

Comment thread src/AiModelBuilder.cs Outdated
…sertions

Extract transformer wiring into ConfigureDocumentTransformers() helper method
that always passes current pipeline values (clearing prior state on reuse).
Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult().

Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo
is non-null and fitted, not just Assert.NotNull(result).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Document/DocumentNeuralNetworkBase.cs Outdated
Comment thread src/Document/DocumentNeuralNetworkBase.cs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/AiModelBuilder.cs (1)

1250-1252: ⚠️ Potential issue | 🔴 Critical

Blocking: the streaming supervised path still bypasses the new transformer wiring.

These additions cover the regular supervised and inference-only paths, but BuildStreamingSupervisedAsync() at Line 1487 still never calls ConfigureDocumentTransformers(_model). For Document.DocumentNeuralNetworkBase<T>, that leaves the streaming path able to retain stale preprocess/postprocess transformers from a previous build, or to miss the current builder's pipelines entirely.

🔧 Proposed fix
 private async Task<AiModelResult<T, TInput, TOutput>> BuildStreamingSupervisedAsync(
     IStreamingDataLoader<T, TInput, TOutput> streamingLoader)
 {
     // Apply GPU configuration first
     ApplyGpuConfiguration();

     // Apply memory management configuration (gradient checkpointing, etc.)
     ApplyMemoryConfiguration();

     // Ensure we have a model configured
     if (_model is null)
     {
         throw new InvalidOperationException(
             "Streaming training requires a model to be configured. Use ConfigureModel() before calling BuildAsync().");
     }
+
+    ConfigureDocumentTransformers(_model);

     var numOps = MathHelper.GetNumericOperations<T>();

A regression test that reuses the same Document.DocumentNeuralNetworkBase<T> through the streaming loader path would pin this down.

As per coding guidelines: "Incomplete features: Half-implemented patterns where some code paths work but others silently do nothing" are blocking issues.

Also applies to: 1874-1877, 6294-6308

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/AiModelBuilder.cs` around lines 1250 - 1252, The streaming supervised
path never wires the new transformers: modify BuildStreamingSupervisedAsync to
call ConfigureDocumentTransformers(_model) (same wiring used for the regular
supervised/inference-only paths) after the model is constructed and before it is
used by Document.DocumentNeuralNetworkBase<T>, ensuring _model's
preprocess/postprocess pipelines are replaced; add a regression test that reuses
the same Document.DocumentNeuralNetworkBase<T> via the streaming loader to
assert that transformers are updated (no stale transformers remain) when
BuildStreamingSupervisedAsync runs.
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (1)

75-78: ⚠️ Potential issue | 🟠 Major

Add PreprocessingInfo assertions to verify instance-level preprocessing flow.

The concurrent builds test only verifies that results are non-null, but doesn't confirm preprocessing was actually applied through the instance-level PreprocessingInfo. Without these assertions, the test could pass even if preprocessing is completely broken.

         // Assert: both builds succeed
         Assert.NotNull(results[0]);
         Assert.NotNull(results[1]);
+
+        // Assert: preprocessing flowed through instance-level PreprocessingInfo for both
+        Assert.NotNull(results[0].PreprocessingInfo);
+        Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false,
+            "First build's preprocessing should be fitted via instance-level flow");
+        Assert.NotNull(results[1].PreprocessingInfo);
+        Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false,
+            "Second build's preprocessing should be fitted via instance-level flow");

Per coding guidelines, tests that only check Assert.NotNull when they should verify actual behavior are blocking issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`
around lines 75 - 78, Update the test to assert that instance-level
preprocessing actually ran by checking the PreprocessingInfo on the produced
results instead of only asserting non-null; locate the assertions around
results[0] and results[1] in PreprocessingRegistryIntegrationTests and add
checks that each result.PreprocessingInfo (or the specific property on the
result that records applied preprocessors) is non-null and contains the expected
entries (e.g., contains the instance-level preprocessor ID/name or that the
processed output differs from the raw input), so the test verifies the
preprocessing flow end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 198-206: The test only asserts that result1 and result2 are
non-null but never verifies that preprocessing actually ran; update the
assertions to mirror BuildAsync_WithPreprocessing_Succeeds_WithoutRegistry by
checking the PreprocessingInfo on both results (e.g., assert
result1.PreprocessingInfo and result2.PreprocessingInfo are not null and contain
the expected preprocessing details/steps), and retain the existing assertion
that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false;
reference result1, result2, and PreprocessingInfo to locate and add the new
assertions.

---

Duplicate comments:
In `@src/AiModelBuilder.cs`:
- Around line 1250-1252: The streaming supervised path never wires the new
transformers: modify BuildStreamingSupervisedAsync to call
ConfigureDocumentTransformers(_model) (same wiring used for the regular
supervised/inference-only paths) after the model is constructed and before it is
used by Document.DocumentNeuralNetworkBase<T>, ensuring _model's
preprocess/postprocess pipelines are replaced; add a regression test that reuses
the same Document.DocumentNeuralNetworkBase<T> via the streaming loader to
assert that transformers are updated (no stale transformers remain) when
BuildStreamingSupervisedAsync runs.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 75-78: Update the test to assert that instance-level preprocessing
actually ran by checking the PreprocessingInfo on the produced results instead
of only asserting non-null; locate the assertions around results[0] and
results[1] in PreprocessingRegistryIntegrationTests and add checks that each
result.PreprocessingInfo (or the specific property on the result that records
applied preprocessors) is non-null and contains the expected entries (e.g.,
contains the instance-level preprocessor ID/name or that the processed output
differs from the raw input), so the test verifies the preprocessing flow
end-to-end.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e30f40c2-bf08-454a-bc84-8a89ae6abd79

📥 Commits

Reviewing files that changed from the base of the PR and between b2af5e7 and 2001d9c.

📒 Files selected for processing (2)
  • src/AiModelBuilder.cs
  • tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs

…gthen tests

PostprocessOutput no longer throws on unfitted transformers since
postprocessing transforms (Softmax, LabelDecoder) are often stateless.
Updated XML docs to reflect instance-level transformer priority.
Strengthen concurrent and sequential test assertions to verify
PreprocessingInfo is non-null and fitted on each result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ooples ooples merged commit 61f62e8 into master Mar 7, 2026
35 of 39 checks passed
@ooples ooples deleted the fix/remove-static-preprocessing-registry branch March 7, 2026 21:54
ooples added a commit that referenced this pull request Mar 9, 2026
…race conditions (#935)

* fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931)

Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline`
assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous
PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline
is already stored as an instance field on the builder and flows to AiModelResult
via PreprocessingInfo — the global static registry was unnecessary and caused
race conditions when multiple builders ran concurrently.

Changes:
- Remove 6 static registry assignments from AiModelBuilder.cs
- Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class
- Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with
  an instance-level PreprocessingTransformer property
- Add 4 integration tests verifying concurrent builds don't cross-contaminate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - registry deprecation, thread safety, docs

- Fix PreprocessingTransformer docs: clarify protected accessibility
- Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase
- Replace PostprocessingRegistry static usage with instance-based transformer
- Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry)
- Update PreprocessingRegistry.Current docs to reflect no longer auto-set
- Fix concurrent test to use separate data loaders (thread-safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: wire preprocessing transformer onto document models in buildasync

Changed PreprocessingTransformer and PostprocessingTransformer from
protected to protected internal so AiModelBuilder can set them.
BuildSupervisedInternalAsync now wires the preprocessing pipeline onto
DocumentNeuralNetworkBase models, replacing the static registry approach
that caused race conditions during concurrent builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow transformer surface, fail fast on unfitted, wire postprocessing

- Changed PreprocessingTransformer/PostprocessingTransformer back to protected
  and added internal ConfigureTransformers() method for AiModelBuilder wiring
- PreprocessDocument/PostprocessOutput now throw if transformer is set but not
  fitted instead of silently falling back to defaults
- Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase
- Updated PreprocessingRegistry docs to reflect deprecated/removed status
- Added [Collection] to registry tests and try/finally cleanup
- Fixed "override" wording to "assign" in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: centralize document transformer injection and strengthen test assertions

Extract transformer wiring into ConfigureDocumentTransformers() helper method
that always passes current pipeline values (clearing prior state on reuse).
Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult().

Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo
is non-null and fitted, not just Assert.NotNull(result).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests

PostprocessOutput no longer throws on unfitted transformers since
postprocessing transforms (Softmax, LabelDecoder) are often stateless.
Updated XML docs to reflect instance-level transformer priority.
Strengthen concurrent and sequential test assertions to verify
PreprocessingInfo is non-null and fitted on each result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ooples added a commit that referenced this pull request Mar 10, 2026
…ining directories (#933) (#941)

* fix: prevent data leakage by fitting preprocessing only on training data

In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g.,
StandardScaler) was fitted on the ENTIRE dataset before the train/test
split. This leaked test/validation statistics (mean, std dev) into the
training pipeline, causing artificially inflated metrics.

The fix restructures the method:
1. Split data into train/val/test FIRST using DataSplitter.Split()
2. FitTransform preprocessing pipeline on training data ONLY
3. Transform (not FitTransform) validation and test data

The federated learning path is unchanged — it correctly uses all data
as training data, so FitTransform on everything remains correct.

Fixes #929

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync

- Scope #pragma warning disable CS8600/CS8604 with matching restore
- Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads
- Update PreprocessingRegistry docs to reflect thread-safety and usage pattern
- Rewrite tests to actually detect leakage: verify scaler produces different
  output than all-data scaler, verify transform shape and non-identity
- Add end-to-end predict test with preprocessing
- Add test verifying no-preprocessing path still works with predict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move fitresample after train/test split to prevent data leakage

FitResample (SMOTE, outlier removal, augmentation) was applied to the
full dataset before splitting, allowing synthetic samples derived from
test/validation data to leak into training. Now FitResample runs after
the split on training data only in both standard and AutoML paths.
Also fixed AutoML preprocessing to FitTransform on training only and
Transform on validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update registry docs and use consistent shuffle policy for splits

Updated PreprocessingRegistry docs to clearly warn about concurrent build
safety issues. Applied same time-series non-shuffle policy to the final
supervised training split that AutoML already uses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use post-preparation training data in no-preprocessing else branch

When _dataPreparationPipeline runs (resampling/outlier removal) it modifies
XTrain/yTrain. The else branch (no preprocessing) was incorrectly using
preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing
training-only preparation results for downstream logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make data leakage tests self-sufficient without PR 928

Refactor tests that called result.Predict() (which fails due to the
feature selection bug fixed in PR #928) to verify preprocessing pipeline
state directly via PreprocessingInfo.TransformFeatures() instead.
Tests now verify the data leakage fix independently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: preserve preprocessing pipeline state during serialization (#936)

* fix: add serialization support to preprocessing pipeline fitted state

- Add [JsonProperty] attributes to private fields across all preprocessing
  transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler,
  DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler,
  LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes
  fitted state (means, std devs, min/max, etc.)

- Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for
  proper deserialization of base class state

- Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve
  ambiguous constructor selection during deserialization

- Replace ValueTuple in PreprocessingPipeline._steps with a new
  PipelineStep<T, TInput> class that serializes reliably with
  Newtonsoft.Json (ValueTuples lose field names during serialization)

- Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to
  PreprocessingInfo.Pipeline and TargetPipeline for polymorphic
  deserialization of pipeline step transformers

- Add 16 integration tests verifying serialization round-trips for all
  modified transformers, pipelines, and inverse transforms

Fixes #930

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 13 pr review comments - serialization safety, tests, perf

- Remove null-forgiving operator from PipelineStep (use JsonConstructor)
- Make PipelineStep.Transformer setter private (immutable after construction)
- Fix ColumnIndices deserialization (add private setter for Json.NET)
- Fix Steps property to return cached read-only list (no alloc per call)
- Add explicit range assertions to MinMaxScaler custom range test
- Add transform equivalence check to SimpleImputer test
- Add LpNormScaler serialization round-trip test
- Add Normalizer serialization round-trip test
- Add LogMeanVarianceScaler serialization round-trip test
- TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make pipelinestep immutable, validate constructor, add missing tests

- Made PipelineStep.Name private set to prevent mutation after construction
- PipelineStep constructor now validates name (non-whitespace) and transformer
  (non-null), failing fast on invalid serialized steps
- Added ColumnIndices round-trip test proving non-default indices survive
  serialization and untouched columns remain unchanged
- Added TargetPipeline round-trip tests proving both null and non-null target
  pipelines survive serialization with transform equivalence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling

Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty]
backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer,
DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler,
LogScaler) to prevent Newtonsoft emitting both field and property.

Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and
_finalTransformer — rely on serializer-level settings with SafeSerializationBinder
instead, preventing unsafe TypeNameHandling bypass.

Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove static PreprocessingRegistry — prevents concurrent build race conditions (#935)

* fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931)

Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline`
assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous
PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline
is already stored as an instance field on the builder and flows to AiModelResult
via PreprocessingInfo — the global static registry was unnecessary and caused
race conditions when multiple builders ran concurrently.

Changes:
- Remove 6 static registry assignments from AiModelBuilder.cs
- Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class
- Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with
  an instance-level PreprocessingTransformer property
- Add 4 integration tests verifying concurrent builds don't cross-contaminate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - registry deprecation, thread safety, docs

- Fix PreprocessingTransformer docs: clarify protected accessibility
- Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase
- Replace PostprocessingRegistry static usage with instance-based transformer
- Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry)
- Update PreprocessingRegistry.Current docs to reflect no longer auto-set
- Fix concurrent test to use separate data loaders (thread-safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: wire preprocessing transformer onto document models in buildasync

Changed PreprocessingTransformer and PostprocessingTransformer from
protected to protected internal so AiModelBuilder can set them.
BuildSupervisedInternalAsync now wires the preprocessing pipeline onto
DocumentNeuralNetworkBase models, replacing the static registry approach
that caused race conditions during concurrent builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow transformer surface, fail fast on unfitted, wire postprocessing

- Changed PreprocessingTransformer/PostprocessingTransformer back to protected
  and added internal ConfigureTransformers() method for AiModelBuilder wiring
- PreprocessDocument/PostprocessOutput now throw if transformer is set but not
  fitted instead of silently falling back to defaults
- Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase
- Updated PreprocessingRegistry docs to reflect deprecated/removed status
- Added [Collection] to registry tests and try/finally cleanup
- Fixed "override" wording to "assign" in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: centralize document transformer injection and strengthen test assertions

Extract transformer wiring into ConfigureDocumentTransformers() helper method
that always passes current pipeline values (clearing prior state on reuse).
Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult().

Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo
is non-null and fitted, not just Assert.NotNull(result).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests

PostprocessOutput no longer throws on unfitted transformers since
postprocessing transforms (Softmax, LabelDecoder) are often stateless.
Updated XML docs to reflect instance-level transformer priority.
Strengthen concurrent and sequential test assertions to verify
PreprocessingInfo is non-null and fitted on each result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from video

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from neural network layers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: warn on empty attributions and use guarded locals in jwt attestation verifier

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: include gain layer in backward pass and parameter updates in deepfilternet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rebuild streaming buffers after deserializing fft/hop size in neural noise reducer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from federated learning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from federated learning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: respect bootstrap flag and align estimator probabilities to ensemble label space

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from clustering modules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: throw on invalid ensemble members and cold model in adaptive random forest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify trained-state error messages in ordinal regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unused alpha_s and use guarded local consistently in unipc scheduler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add guard for latent variance in compute elbo

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist cached gradient lookup out of per-sample loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace console.writeline with debug.writeline for checkpoint reporting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cache compressed momentum after resampling projection matrices

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: show total sharded memory independently of per-device memory in tostring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 21 pr review comments for code quality improvements

- CLIPScore: throw on embedding/weight dimension mismatch instead of Math.Min
- Adam8BitOptimizer: remove redundant _vScales null check
- BFGSOptimizer: move _inverseHessian null check before UpdateInverseHessian call
- FTRLOptimizer: use post-update accumulator (nPlusGradSq) in both paths
- MomentumOptimizer: snapshot _velocity before serialization loop
- OptimizerBase: use RequireModel() instead of Model!
- SymbolicPhysicsLearner: Clone throws on null children like Evaluate/Format
- IncrementalPCA: fix error message to "internal state inconsistent"
- ContradictionDetector: wrap JToken.Value<bool>() in try-catch
- CriticModel: wrap JToken.Value<double>() in try-catch
- LogNormalAFT: hoist Coefficients null check outside loops (3 methods)
- WeibullAFT: hoist Coefficients null check outside loops (3 methods)
- NHiTSModel: remove redundant ContainsKey before TryGetValue
- DiffusionTools: support both 1D and 2D audio_data deserialization
- GMMDetector: validate feature count before scoring a point
- TimeSeriesForestClassifier: reject NumTrees <= 0 during training

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use guarded sigmaTs local variable consistently in dpm solver methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply feature selection in Predict() — fixes critical dimension mismatch bug (#928)

* fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug

The NormalOptimizer (default) randomly selects a subset of features during
training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never
applied the same feature selection to new input data. This caused the model
to receive all columns while it was trained on fewer, throwing "Number of
columns must equal length of vector" errors.

Changes:
- Add SelectedFeatureIndices (List<int>) to OptimizationStepData,
  OptimizationResult, and ModelResult to store actual column indices
- Propagate indices through optimizer flow: PrepareAndEvaluateSolution →
  UpdateBestSolution → CreateOptimizationResult
- Apply feature selection in AiModelResult.Predict() after preprocessing
  but before Model.Predict(), using OptimizerHelper.SelectFeatures()
- Add 12 integration tests verifying the full BuildAsync → Predict pipeline

Closes #927

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - feature selection, copy semantics, docs

- Remove unused LooseTolerance constant
- Fix off-by-one: use != instead of > for feature index count check
- Apply feature selection in InferenceSession.Predict() (critical gap)
- Add comprehensive docs to SelectedFeatureIndices property
- Fix WithParameters to deep-copy collections (consistent with DeepCopy)
- Add copy independence assertion to WithParameters test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: support 1D tensor feature selection and add predict integration test

SelectFeaturesTensor previously threw for 1D tensors. Now handles them
by directly selecting elements at the specified indices. Added integration
test proving Predict applies SelectedFeatureIndices correctly with a real
FeedForwardNeuralNetwork.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use identity-check instead of count heuristic for feature selection

Replaced inputSize == featureIndices.Count heuristic with proper
identity-mapping check using SequenceEqual. A reordered selection like
[2, 0, 1] has the same count but still needs reordering. Also validates
indices are in-range before calling SelectFeatures. Renamed misleading
test names and added shape assertion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: fix collinear test fixture and add permuted feature selection test

- Fixed direct-model test to use non-collinear features with correct
  formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges
- Added permutation test proving [2, 0, 1] selection is not treated as
  identity — verifies the SequenceEqual check works correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add null guards to OptimizationResult Clone and WithParameters

Callers that create OptimizationResult via object initializer (bypassing
the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds,
FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null.
Deep-copy calls in Clone/WithParameters would throw NullReferenceException.
Now falls back to empty defaults when properties are null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions

- InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction
  instead of using a count-based check that skipped permuted selections
- Move bounds validation to one-time check (cached _featureIndicesValidated flag)
  instead of scanning on every Predict() call
- Assert permuted predictions actually differ from identity predictions
- Replace fragile selectedCount < 10 with valid range assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: assert feature selection actually occurred in regression test

Add assertion verifying optimizer selected a feature subset before testing
Predict() with full-width input, proving the test exercises the reduced-column
code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cache identity check with volatile int, set deterministic weights in test

Replace per-call SequenceEqual + Enumerable.Range allocation with cached
volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a
simple loop instead of LINQ for zero-allocation identity check.
Thread-safe via volatile read/write semantics.

Set explicit non-uniform weights [1, 2, 3] in permuted feature selection
test to ensure permutation deterministically produces different output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve first batch of pr #941 review comments (10 of 54)

- InterpretabilityDashboard: filter empty attributions instead of using
  console output, use validCount for averaging
- BaggingClassifier: validate sampleSize vs data size when bootstrap=false
- Denclue: hoist _attractors and _attractorDensities guards out of loops
- BIRCH: use Labels ?? Predict(x) instead of Labels! in FitPredict
- SelfOrganizingMap: hoist _weights and _neuronLabels guards out of loops
- CLIQUE: hoist Labels guard out of accumulation loop
- SUBCLU: hoist _trainingData and Labels guards out of loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve second batch of pr #941 review comments (20 of 54)

- GraphIsomorphismLayer: add local variable aliases for _lastMlpHiddenPreRelu
  in Forward and BackwardManual for clearer null safety
- PagedAttentionKernel: add braces to ToFloat/FromFloat if blocks, replace
  null-forgiving with explicit boxing guards
- KnowledgeDistillationTrainerBase: remove Console.WriteLine from library
  checkpoint loading code
- DiffusionMemoryManager: separate PerDeviceMemory and TotalShardedMemory
  formatting into independent checks
- MixtureOfExpertsLayer: add null guard before ApplyActivationToGraph
  instead of null-forgiving operator
- KVCache: replace null-forgiving with explicit boxing guard for Half
  conversion
- CachedMultiHeadAttention: hoist _cache null guard before Append call
- CachedGroupedQueryAttention: hoist _cache null guard before Append call
- DeformableConvolutionalLayer: stale comments (variables already renamed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve build errors from linter's automated null-forgiving removal

Fix syntax and compilation errors introduced by the automated linter that
removed null-forgiving operators without proper replacements:
- Fix missing closing parentheses in foreach statements (COPKMeans, AgglomerativeClustering)
- Fix duplicate variable declarations (AntColonyOptimization, CachedGroupedQueryAttention, CachedMultiHeadAttention)
- Fix dropped underscore prefixes on private field references (KVCache, ConvLSTMLayer, PrincipalNeighbourhoodAggregationLayer)
- Add validated accessor properties for nullable cache arrays (KVCache)
- Fix variable ordering issues (FloraAdapter)
- Fix nullable reference handling (VectorModel, UncertaintyCalibrationData, CrossValidationResult, AiModelResult.Uncertainty)
- Fix Debug.WriteLine() call with no arguments (KnowledgeDistillationTrainerBase)
- Add null guard for ClassLabels in PredictProbabilities (BaggingClassifier)
- Fix nullable permutation array access (ConcatenateLayer)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist null guards for lambdas and alphaTs in DPMSolverMultistepScheduler

Replace null-forgiving operators with proper null-coalescing throws
for _lambdas and _alphaTs in FirstOrderUpdate, SecondOrderUpdate,
and ThirdOrderUpdate methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove redundant null guard in LeafSent140FederatedDatasetLoader

The testFilePath is already validated by IsNullOrWhiteSpace check
in the enclosing if statement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: capture PreprocessingInfo in local variable to prevent TOCTOU

Snapshot PreprocessingInfo once before accessing its members
to avoid potential race conditions with concurrent access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: dispose pre-permutation tensor to prevent memory leak in ConcatenateLayer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: accumulate projector gradients from both views in BarlowTwins, SimCLR, and iBOT

BarlowTwins: capture view1 projector grads before view2 backward clears them,
then combine both views' grads for the projector update.

SimCLR: replay projector forward per-view before backward to use correct caches,
and accumulate projector gradients from both views.

iBOT: backpropagate the full objective (clsLoss1 + clsLoss2 + lambda*(patchLoss1 + patchLoss2))
instead of only clsLoss1. Add ComputeMaskedPatchGradient for MSE gradient on masked patches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct backward pass in RIFE, FILM, and RAFT video models

RIFE: use cached block input shape instead of previous block output shape
for gradient splitting to correctly match the fused input width.

FILM: fail fast on all required backward-pass caches instead of silently
skipping fusion and flow estimator backprop when caches are null.

RAFT: replace incorrect upsampleConv.Backward with bilinear UpsampleFlowBackward
that mirrors the actual forward upsampling, and hoist invariant layer guards
out of the refinement loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: strengthen DataLeakagePrevention tests with behavioral assertions

ScalerStatisticsReflectTrainingDataOnly: add positive assertion that the
pipeline transformation is not identity, confirming it was fitted on data.

StillWorksCorrectly: exercise the model with actual predictions and assert
outputs are finite numbers, not just that objects are non-null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address major PR review comments across multiple files

KNNEvaluator: hoist _trainLabels null check outside the voting loop.
JointMutualInformation: use local miScores instead of field on line 212.
ContradictionDetector: treat malformed contradictory fields as false instead
of falling through to text scan that matches JSON key names.
BFGSOptimizer: guard inverse Hessian update with curvature condition (y*s > 0)
and use validated locals instead of null-forgiving operators.
IncrementalPCA: validate all fitted state upfront in PartialFit and remove
null-forgiving on _explainedVariance.
RealGatedLinearRecurrenceLayer: hoist all cached tensor validation to top of
Backward to prevent partial gradient corruption.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address remaining major PR review comments

PreprocessingRegistry: align documentation — clarify get/set is thread-safe
but concurrent builds still overwrite each other's pipeline.

InMemoryFederatedTrainer: fail fast when HE is enabled but no parameter
indices are resolved instead of silently downgrading. Fix DP accounting:
FedAsync uses 1/totalClients per iteration, FedBuff uses bufferSize/totalClients.

AiModelBuilder: extract IsTimeSeriesTask() helper that checks both AutoML
options and model type name for chronological split decisions. Reject
row-changing data preparation in natural-client federated mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address trivial PR review comments across multiple files

DeformableConvolutionalLayer: rename misleading variable names (mwShape/mbShape
to maskWeights/maskBias, mw2/mwg/mbg to maskW/maskWeightGrad/maskBiasGrad).

S4DLayer: hoist _lastHiddenStatesReal/Imag guards out of the BPTT loop.

OnlineFeatureSelector: pass resolved running arrays into ComputeOnlineCorrelation
instead of re-checking nullable fields inside the hot loop.

SSLPretrainingPipeline: extract encoder to a local variable to avoid duplicate
null-check expressions.

SubpixelConvolutionalLayer: hoist ScalarActivation out of the per-element loop.

BaggingClassifier: replace OrderBy(random.Next()) with Fisher-Yates partial
shuffle for O(sampleSize) unbiased sampling without replacement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist gradient tensor validations in HedgehogLayer.FeatureMapBackward

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 12 pr review comments across multiple files

- FloraAdapter: fix momentum reset bug (was zeroed every step)
- KnowledgeDistillationTrainerBase: remove dead checkpoint branch
- KVCache: hoist loop-invariant lookups outside inner loops
- AiModelResult: capture stable model reference in explainer delegate
- VectorModel: guard empty coefficients, remove dead BoxOrDefault
- ConvLSTMLayer: use local lastInput instead of field reference
- PrincipalNeighbourhoodAggregationLayer: hoist invariant tensors,
  remove null-forgiving operators in fallback path
- BarlowTwins: remove redundant projector null-check
- FILM: use local-cache pattern for cached features
- DeepFilterNet: propagate gain layer backward gradient

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: net471 build errors in baggingclassifier and leafsent140loader

- Replace range syntax pool[..sampleSize] with Array.Copy for net471
  compatibility (RuntimeHelpers.GetSubArray not available)
- Add explicit null check before string.IsNullOrWhiteSpace for net471
  compatibility (missing NotNullWhen attribute)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use correct inner class name and remove null-forgiving in dashboard

Replace InterpretabilityExplanation with Explanation (the actual inner
class name) and replace null-forgiving operators with proper null guards.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address pr #941 review comments

- KDTree: use single guarded local for _data null check in Compare
- BallTree: use single guarded local for _data null check in Sort
- DiffusionMemoryManager: remove duplicate unconditional total sharded line
- InfluenceFunctionExplainer: hoist cachedGradients null check outside loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: static PreprocessingRegistry causes race conditions in concurrent model building

3 participants