fix: preserve preprocessing pipeline state during serialization by ooples · Pull Request #936 · ooples/AiDotNet

ooples · 2026-03-07T13:51:57Z

Summary

Fixes #930 — Preprocessing pipeline fitted state (means, standard deviations, min/max values, etc.) was lost during serialization/deserialization because Newtonsoft.Json's DefaultContractResolver only serializes public properties by default, not private fields where transformers store their fitted parameters.

Changes

Add [JsonProperty] attributes to private fields across 12 preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) and TransformerBase
Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization (both have two constructors)
Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class — ValueTuples don't serialize reliably with Newtonsoft.Json (field names are lost)
Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers
Add 16 integration tests verifying serialization round-trips for all modified transformers, multi-step pipelines, PreprocessingInfo, and inverse transforms after deserialization

Files changed

14 source files in src/Preprocessing/ (attribute additions + PipelineStep class)
1 new test file: tests/.../PreprocessingSerializationIntegrationTests.cs

Test plan

All 16 new serialization round-trip tests pass
All 58 existing PreprocessingPipelineIntegrationTests pass (no regressions)
All 49 existing ScalerIntegrationTests pass
All 24 existing LinearRegressionIntegrationTests pass
Build succeeds on net10.0

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- JSON serialization support added across preprocessing components so fitted state, configurations, named pipeline steps, and nested pipelines can be persisted and restored without changing runtime behavior.
Tests
- Comprehensive integration tests added to verify round-trip serialization/deserialization and correctness of Transform/InverseTransform after restore.

- Add [JsonProperty] attributes to private fields across all preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes fitted state (means, std devs, min/max, etc.) - Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for proper deserialization of base class state - Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization - Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class that serializes reliably with Newtonsoft.Json (ValueTuples lose field names during serialization) - Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers - Add 16 integration tests verifying serialization round-trips for all modified transformers, pipelines, and inverse transforms Fixes #930 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-07T13:52:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
aidotnet_website	Ready	Preview, Comment	Mar 7, 2026 4:37pm
aidotnet-playground-api	Ready	Preview, Comment	Mar 7, 2026 4:37pm

coderabbitai · 2026-03-07T13:52:28Z

Walkthrough

Adds Newtonsoft.Json attributes across preprocessing components to enable (de)serialization of fitted state, replaces tuple-backed pipeline steps with a public serializable PipelineStep<T, TInput>, tightens TransformerBase.ColumnIndices setter to private, and adds extensive integration tests verifying JSON round-trips and inverse transforms.

Changes

Cohort / File(s)	Summary
Scaler JSON serialization `src/Preprocessing/Scalers/DecimalScaler.cs`, `src/Preprocessing/Scalers/GlobalContrastScaler.cs`, `src/Preprocessing/Scalers/LogMeanVarianceScaler.cs`, `src/Preprocessing/Scalers/LogScaler.cs`, `src/Preprocessing/Scalers/LpNormScaler.cs`, `src/Preprocessing/Scalers/MaxAbsScaler.cs`, `src/Preprocessing/Scalers/Normalizer.cs`, `src/Preprocessing/Scalers/StandardScaler.cs`	Added `using Newtonsoft.Json` and `[JsonProperty]` annotations to private fields and `[JsonIgnore]` to public accessors where appropriate so internal scaler state is included/excluded in JSON (no algorithmic changes).
Advanced scaler serialization `src/Preprocessing/Scalers/MinMaxScaler.cs`, `src/Preprocessing/Scalers/RobustScaler.cs`	Annotated internal state with `[JsonProperty]`, added `[JsonIgnore]` to public accessors, and marked constructors with `[JsonConstructor]` to support deserialization while preserving public API signatures.
Base transformer / imputer serialization `src/Preprocessing/Imputers/SimpleImputer.cs`, `src/Preprocessing/TransformerBase.cs`	Applied `[JsonProperty]` to internal fields/properties; tightened `TransformerBase.ColumnIndices` setter to `private set`; annotated `IsFitted` for serialization. No public-signature removals, but mutability changed for `ColumnIndices`.
Pipeline architecture / serialization `src/Preprocessing/PreprocessingPipeline.cs`	Introduced public `PipelineStep<T, TInput>` (serializable) and replaced tuple-backed `_steps` with `List<PipelineStep<T, TInput>>`; updated Steps property, Add methods, iteration, and added `[JsonProperty]` annotations to enable pipeline (de)serialization.
Preprocessing metadata serialization `src/Preprocessing/PreprocessingInfo.cs`	Annotated `Pipeline` and `TargetPipeline` with `[JsonProperty(TypeNameHandling = TypeNameHandling.Auto)]` to enable polymorphic pipeline (de)serialization.
Integration tests `tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`	Added comprehensive round-trip JSON integration tests covering many scalers, `SimpleImputer`, pipelines, `PreprocessingInfo`, column indices handling, inverse-transform verification, and serializer settings to assert fidelity after deserialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

fix: preprocessing pipeline state lost during serialization/deserialization #930: Serialization gaps for preprocessing components — this PR implements JsonProperty/JsonConstructor annotations, adds a serializable PipelineStep type, and configures TypeNameHandling to address missing (de)serialization support described in that issue.

Suggested labels

feature

Poem

🧩 Pipelines learn to pack and send,
Private fields no longer bend,
Steps named clearly, state retained,
Round-trip JSON — no logic strained,
Tests applaud the fix — well-aimed! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: preserve preprocessing pipeline state during serialization' directly and clearly summarizes the main change: enabling JSON serialization of preprocessing transformer state by adding JsonProperty attributes and replacing ValueTuples with a proper PipelineStep class.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/preprocessing-serialization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Preprocessing/PreprocessingPipeline.cs`:
- Around line 35-39: The parameterless PipelineStep constructor sets Transformer
to null! solely for JSON deserialization but can be misused by callers; update
it to discourage manual construction by either marking the constructor internal
(or private) and adding the [JsonConstructor] attribute, or add a clear comment
above the constructor explaining it's only for deserialization and should not be
used directly; reference the PipelineStep() constructor and the Transformer
property so reviewers can locate and apply the change.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`:
- Around line 239-258: The test SimpleImputer_RoundTrip_PreservesFittedState
should also verify that Transform behavior is preserved after serialization:
after fitting imputer with CreateTestMatrix(), capture a small test input (e.g.,
a matrix/row with missing values), call imputer.Transform(testInput) to get
expectedOutput, then call deserialized.Transform(testInput) to get actualOutput
and assert they are equal (element-wise) using existing test helpers or
Assert.Equal for sequences/arrays; update the test to use SimpleImputer<double>,
CreateTestMatrix(), imputer.Transform and deserialized.Transform and compare
outputs to ensure transform equivalence.
- Around line 138-166: The test
MinMaxScaler_CustomRange_RoundTrip_PreservesRange should explicitly assert that
the FeatureRangeMin and FeatureRangeMax values survive serialization: after
deserializing (variable deserialized) add assertions that
deserialized.FeatureRangeMin equals scaler.FeatureRangeMin and
deserialized.FeatureRangeMax equals scaler.FeatureRangeMax (using the same
tolerance as numeric compares if needed); keep the existing transform
comparisons but add these direct property checks to make the intent explicit
(location: inside the test after Assert.NotNull(deserialized) and before
comparing transformed matrices).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ba481d87-22fb-46b3-9ceb-35e04c5a6310

📥 Commits

Reviewing files that changed from the base of the PR and between 640251b and ae9eed6.

📒 Files selected for processing (15)

src/Preprocessing/Imputers/SimpleImputer.cs
src/Preprocessing/PreprocessingInfo.cs
src/Preprocessing/PreprocessingPipeline.cs
src/Preprocessing/Scalers/DecimalScaler.cs
src/Preprocessing/Scalers/GlobalContrastScaler.cs
src/Preprocessing/Scalers/LogMeanVarianceScaler.cs
src/Preprocessing/Scalers/LogScaler.cs
src/Preprocessing/Scalers/LpNormScaler.cs
src/Preprocessing/Scalers/MaxAbsScaler.cs
src/Preprocessing/Scalers/MinMaxScaler.cs
src/Preprocessing/Scalers/Normalizer.cs
src/Preprocessing/Scalers/RobustScaler.cs
src/Preprocessing/Scalers/StandardScaler.cs
src/Preprocessing/TransformerBase.cs
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

Copilot

Pull request overview

This PR addresses issue #930 by ensuring preprocessing transformers and pipelines preserve their fitted/configuration state across Newtonsoft.Json serialization/deserialization, aligning behavior with AiModelResult.Serialize/Deserialize() so persisted models keep producing correct predictions.

Changes:

Added Newtonsoft.Json annotations ([JsonProperty], plus targeted [JsonConstructor]) to persist fitted parameters/configuration stored in non-public members.
Reworked PreprocessingPipeline step storage from ValueTuple to a dedicated PipelineStep<T,TInput> type to improve JSON round-tripping of step name/transformer pairs.
Added integration tests covering JSON round-trips for multiple transformers, single/multi-step pipelines, PreprocessingInfo, and inverse transforms after deserialization.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs	Adds integration tests verifying preprocessing serialization round-trips with AiModelResult-like settings.
src/Preprocessing/TransformerBase.cs	Marks core transformer state (`IsFitted`, `ColumnIndices`) for JSON serialization.
src/Preprocessing/PreprocessingPipeline.cs	Introduces `PipelineStep<T,TInput>`, serializes pipeline internals, and adapts step storage to a serializable form.
src/Preprocessing/PreprocessingInfo.cs	Adds JSON polymorphism handling attributes for pipeline properties.
src/Preprocessing/Scalers/StandardScaler.cs	Serializes fitted parameters and configuration fields.
src/Preprocessing/Scalers/MinMaxScaler.cs	Serializes fitted parameters/config and adds `[JsonConstructor]` for deserialization selection.
src/Preprocessing/Scalers/RobustScaler.cs	Serializes fitted parameters/config and adds `[JsonConstructor]` for deserialization selection.
src/Preprocessing/Scalers/MaxAbsScaler.cs	Serializes fitted parameters.
src/Preprocessing/Scalers/DecimalScaler.cs	Serializes fitted parameters.
src/Preprocessing/Scalers/LogScaler.cs	Serializes fitted parameters.
src/Preprocessing/Scalers/GlobalContrastScaler.cs	Serializes fitted parameters.
src/Preprocessing/Scalers/LogMeanVarianceScaler.cs	Serializes fitted parameters/config (`_epsilon`).
src/Preprocessing/Scalers/LpNormScaler.cs	Serializes fitted parameters/config (`p`, norms, column count).
src/Preprocessing/Scalers/Normalizer.cs	Serializes configuration (`_normType`).
src/Preprocessing/Imputers/SimpleImputer.cs	Serializes fitted parameters/config (strategy/fill settings/statistics).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove null-forgiving operator from PipelineStep (use JsonConstructor) - Make PipelineStep.Transformer setter private (immutable after construction) - Fix ColumnIndices deserialization (add private setter for Json.NET) - Fix Steps property to return cached read-only list (no alloc per call) - Add explicit range assertions to MinMaxScaler custom range test - Add transform equivalence check to SimpleImputer test - Add LpNormScaler serialization round-trip test - Add Normalizer serialization round-trip test - Add LogMeanVarianceScaler serialization round-trip test - TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs (1)
254-285: ⚠️ Potential issue | 🟠 Major

Exercise the imputation path with missing values.

CreateTestMatrix() has no missing entries, so this round-trip only proves the deserialized imputer can pass through already-complete data. If Statistics deserialize to the wrong values, this test can still pass. Use a transform input with missing cells and compare the imputed outputs.

As per coding guidelines, "Good tests should ... assert specific expected values" and "test edge cases and error conditions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`
around lines 254 - 285, The test SimpleImputer_RoundTrip_PreservesFittedState
currently uses CreateTestMatrix() which has no missing values, so it doesn't
exercise imputation; change the Arrange/Act to use input data that contains
missing entries (e.g., a matrix with double.NaN in some cells) before calling
imputer.Fit/Transform so the imputer actually computes Statistics, then after
deserialization assert deserialized.IsFitted and deserialized.Statistics and
verify that for the positions that were missing the imputed values from
deserialized.Transform match those from the original imputer.Transform (and
optionally assert those imputed values equal specific expected numbers), using
the same tolerance as the existing loop; reference SimpleImputer<double>.Fit,
SimpleImputer<double>.Transform and the Statistics property when updating the
test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Preprocessing/PreprocessingPipeline.cs`:
- Around line 23-24: The PipelineStep Name property is mutable and allows
callers to rename steps after addition; change the public setter to a private
setter on the Name property (e.g., public string Name { get; private set; }) so
step identity can only be set at construction/deserialization, preserving the
duplicate-name/whitespace validation enforced by Add() and preventing ambiguity
in GetStep(); keep the existing [JsonProperty] attribute so deserialization
still works and prefer private over internal visibility if adjusting access
elsewhere.
- Around line 41-45: The PipelineStep JsonConstructor currently allows Name to
become string.Empty and accepts a null Transformer, which can introduce invalid
pipeline state; update the PipelineStep(string name, IDataTransformer<T, TInput,
TInput> transformer) constructor to validate inputs and throw immediately: throw
ArgumentNullException for a null transformer and ArgumentException (or
ArgumentNullException) for a null or empty name so Name and Transformer cannot
be set to invalid values; reference the PipelineStep constructor, the Name and
Transformer properties, and the IDataTransformer<T, TInput, TInput> parameter
when making the change.

In `@src/Preprocessing/TransformerBase.cs`:
- Around line 49-50: Add a unit test that verifies non-default ColumnIndices
round-trip and that Transform preserves untouched columns: construct a
TransformerBase (or a concrete subclass used in tests) and set ColumnIndices to
a non-default array (e.g. [0,2,...]); serialize and deserialize it using the
same project serialization helpers (the code paths that exercise TransformerBase
serialization), then assert deserialized.ColumnIndices equals the original array
and call Transform on a sample input row to assert that columns not listed in
ColumnIndices remain unchanged while selected columns are processed; reference
TransformerBase.ColumnIndices and the Transform method (or the concrete subclass
used in existing tests) when locating where to add this case.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`:
- Around line 365-399: The test only validates Pipeline round-trip; add a
non-null TargetPipeline to the PreprocessingInfo and assert its transform
behavior is preserved after serialization. Concretely: create and Fit a
target-side PreprocessingPipeline (e.g., using a scaler), set it on the
PreprocessingInfo.TargetPipeline before serializing, call TransformTargets (or
the appropriate target-transform method) on the original and deserialized info
with a test target vector, and Assert equality of resulting values (with the
same tolerance used for features). Ensure you reference the
PreprocessingInfo.TargetPipeline, PreprocessingPipeline.Add/Fit, and
TransformTargets methods in the change.

---

Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`:
- Around line 254-285: The test SimpleImputer_RoundTrip_PreservesFittedState
currently uses CreateTestMatrix() which has no missing values, so it doesn't
exercise imputation; change the Arrange/Act to use input data that contains
missing entries (e.g., a matrix with double.NaN in some cells) before calling
imputer.Fit/Transform so the imputer actually computes Statistics, then after
deserialization assert deserialized.IsFitted and deserialized.Statistics and
verify that for the positions that were missing the imputed values from
deserialized.Transform match those from the original imputer.Transform (and
optionally assert those imputed values equal specific expected numbers), using
the same tolerance as the existing loop; reference SimpleImputer<double>.Fit,
SimpleImputer<double>.Transform and the Statistics property when updating the
test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f1d0898-a649-4b33-8528-ec19dbf861ca

📥 Commits

Reviewing files that changed from the base of the PR and between ae9eed6 and 5aca2de.

📒 Files selected for processing (3)

src/Preprocessing/PreprocessingPipeline.cs
src/Preprocessing/TransformerBase.cs
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

…ests - Made PipelineStep.Name private set to prevent mutation after construction - PipelineStep constructor now validates name (non-whitespace) and transformer (non-null), failing fast on invalid serialized steps - Added ColumnIndices round-trip test proving non-default indices survive serialization and untouched columns remain unchanged - Added TargetPipeline round-trip tests proving both null and non-null target pipelines survive serialization with transform equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs (1)

138-181: 🧹 Nitpick | 🔵 Trivial

Consider explicit range property assertions.

The behavioral verification (checking transformed column min/max values) is valid, but directly asserting the deserialized properties would make the test intent clearer and catch deserialization issues that happen to produce equivalent transforms.

Optional: Add explicit range assertions

         // Assert
         Assert.NotNull(deserialized);
+        Assert.Equal(-1.0, deserialized.FeatureRangeMin, 1e-10);
+        Assert.Equal(1.0, deserialized.FeatureRangeMax, 1e-10);
 
         // Verify the custom range [-1, 1] is preserved

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`
around lines 138 - 181, After Assert.NotNull(deserialized) add explicit
assertions that the deserialized MinMaxScaler<double> preserved its target range
by checking the scaler range properties on the deserialized instance (e.g.,
assert deserialized.Min == -1.0 and deserialized.Max == 1.0 or, if your API uses
different names, assert the equivalent properties such as LowerBound/UpperBound
or TargetMin/TargetMax equal -1.0 and 1.0); place these checks immediately after
the null check (before the transformed-data min/max and element-wise
comparisons) so deserialization of the range is validated directly on the
deserialized variable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs`:
- Around line 138-181: After Assert.NotNull(deserialized) add explicit
assertions that the deserialized MinMaxScaler<double> preserved its target range
by checking the scaler range properties on the deserialized instance (e.g.,
assert deserialized.Min == -1.0 and deserialized.Max == 1.0 or, if your API uses
different names, assert the equivalent properties such as LowerBound/UpperBound
or TargetMin/TargetMax equal -1.0 and 1.0); place these checks immediately after
the null check (before the transformed-data min/max and element-wise
comparisons) so deserialization of the range is validated directly on the
deserialized variable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f97c7cb8-8e9c-4cf1-8446-7d6234fe9bb1

📥 Commits

Reviewing files that changed from the base of the PR and between 5aca2de and 7dce4cb.

📒 Files selected for processing (2)

src/Preprocessing/PreprocessingPipeline.cs
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ameHandling Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty] backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer, DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler, LogScaler) to prevent Newtonsoft emitting both field and property. Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and _finalTransformer — rely on serializer-level settings with SafeSerializationBinder instead, preventing unsafe TypeNameHandling bypass. Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/Preprocessing/PreprocessingPipeline.cs (1)
407-421: ⚠️ Potential issue | 🟡 Minor

Shallow clone shares transformer state - potential for unexpected mutation.

The Clone() method shares PipelineStep instances (and thus Transformer instances) between original and clone. Since transformers hold fitted state (_isFitted, computed statistics, etc.), calling Fit() on the clone will mutate the shared transformers and affect the original pipeline.

The current comment mentions transformers need ICloneable for deep cloning, but doesn't warn callers about the mutation risk. Consider either:

Adding a warning to the XML doc about mutation side effects, or

Returning an unfitted clone that requires re-fitting (as the current doc suggests with "without fitted state", but the code doesn't enforce this)
📝 Suggested documentation improvement
     /// <summary>
     /// Creates a clone of this pipeline without fitted state.
     /// </summary>
-    /// <returns>A new unfitted pipeline with the same structure.</returns>
+    /// <returns>
+    /// A new pipeline referencing the same transformer instances as the original.
+    /// <para><b>Warning:</b> This is a shallow clone. Fitting the clone will mutate
+    /// the shared transformers and affect the original pipeline. The clone starts
+    /// with <c>_isFitted = false</c>, but the underlying transformers retain their
+    /// fitted state from the original.</para>
+    /// </returns>
     public PreprocessingPipeline<T, TInput, TOutput> Clone()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Preprocessing/PreprocessingPipeline.cs` around lines 407 - 421, The Clone
method currently copies PipelineStep references (via _steps) and
_finalTransformer, sharing fitted transformer state; change Clone() in
PreprocessingPipeline<T,TInput,TOutput> so it returns an unfitted clone: for
each PipelineStep in _steps create a new PipelineStep instance (or otherwise
copy only its configuration, not the fitted Transformer) and add that to
clone._steps, and set clone._finalTransformer to a non-fitted transformer (null
or a new instance without fitted state) so calling Fit() on the clone won't
mutate the original's transformer state; if PipelineStep or transformers expose
a Reset/Unfit method use that to clear _isFitted and any computed statistics,
otherwise copy only immutable configuration fields and omit fitted fields when
constructing the new steps.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Preprocessing/Scalers/Normalizer.cs`:
- Around line 48-55: The backing field _normType is currently marked with
[JsonProperty] without a name which serializes as "_normType" and breaks the
existing "NormType" JSON contract; change the attribute to explicitly pin the
JSON name (e.g., [JsonProperty("NormType")]) on the _normType field so the
serialized key remains "NormType" and add a regression test that deserializes a
payload containing "NormType" into the Normalizer (and verifies NormType
property value) to ensure backward compatibility.

---

Outside diff comments:
In `@src/Preprocessing/PreprocessingPipeline.cs`:
- Around line 407-421: The Clone method currently copies PipelineStep references
(via _steps) and _finalTransformer, sharing fitted transformer state; change
Clone() in PreprocessingPipeline<T,TInput,TOutput> so it returns an unfitted
clone: for each PipelineStep in _steps create a new PipelineStep instance (or
otherwise copy only its configuration, not the fitted Transformer) and add that
to clone._steps, and set clone._finalTransformer to a non-fitted transformer
(null or a new instance without fitted state) so calling Fit() on the clone
won't mutate the original's transformer state; if PipelineStep or transformers
expose a Reset/Unfit method use that to clear _isFitted and any computed
statistics, otherwise copy only immutable configuration fields and omit fitted
fields when constructing the new steps.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c94df883-510b-4a84-97ee-916cbc790bb1

📥 Commits

Reviewing files that changed from the base of the PR and between 7dce4cb and 8a18af4.

📒 Files selected for processing (10)

src/Preprocessing/Imputers/SimpleImputer.cs
src/Preprocessing/PreprocessingPipeline.cs
src/Preprocessing/Scalers/DecimalScaler.cs
src/Preprocessing/Scalers/GlobalContrastScaler.cs
src/Preprocessing/Scalers/LogScaler.cs
src/Preprocessing/Scalers/MaxAbsScaler.cs
src/Preprocessing/Scalers/MinMaxScaler.cs
src/Preprocessing/Scalers/Normalizer.cs
src/Preprocessing/Scalers/RobustScaler.cs
src/Preprocessing/Scalers/StandardScaler.cs

* fix: add serialization support to preprocessing pipeline fitted state - Add [JsonProperty] attributes to private fields across all preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes fitted state (means, std devs, min/max, etc.) - Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for proper deserialization of base class state - Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization - Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class that serializes reliably with Newtonsoft.Json (ValueTuples lose field names during serialization) - Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers - Add 16 integration tests verifying serialization round-trips for all modified transformers, pipelines, and inverse transforms Fixes #930 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 13 pr review comments - serialization safety, tests, perf - Remove null-forgiving operator from PipelineStep (use JsonConstructor) - Make PipelineStep.Transformer setter private (immutable after construction) - Fix ColumnIndices deserialization (add private setter for Json.NET) - Fix Steps property to return cached read-only list (no alloc per call) - Add explicit range assertions to MinMaxScaler custom range test - Add transform equivalence check to SimpleImputer test - Add LpNormScaler serialization round-trip test - Add Normalizer serialization round-trip test - Add LogMeanVarianceScaler serialization round-trip test - TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make pipelinestep immutable, validate constructor, add missing tests - Made PipelineStep.Name private set to prevent mutation after construction - PipelineStep constructor now validates name (non-whitespace) and transformer (non-null), failing fast on invalid serialized steps - Added ColumnIndices round-trip test proving non-default indices survive serialization and untouched columns remain unchanged - Added TargetPipeline round-trip tests proving both null and non-null target pipelines survive serialization with transform equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty] backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer, DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler, LogScaler) to prevent Newtonsoft emitting both field and property. Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and _finalTransformer — rely on serializer-level settings with SafeSerializationBinder instead, preventing unsafe TypeNameHandling bypass. Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ining directories (#933) (#941) * fix: prevent data leakage by fitting preprocessing only on training data In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g., StandardScaler) was fitted on the ENTIRE dataset before the train/test split. This leaked test/validation statistics (mean, std dev) into the training pipeline, causing artificially inflated metrics. The fix restructures the method: 1. Split data into train/val/test FIRST using DataSplitter.Split() 2. FitTransform preprocessing pipeline on training data ONLY 3. Transform (not FitTransform) validation and test data The federated learning path is unchanged — it correctly uses all data as training data, so FitTransform on everything remains correct. Fixes #929 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync - Scope #pragma warning disable CS8600/CS8604 with matching restore - Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads - Update PreprocessingRegistry docs to reflect thread-safety and usage pattern - Rewrite tests to actually detect leakage: verify scaler produces different output than all-data scaler, verify transform shape and non-identity - Add end-to-end predict test with preprocessing - Add test verifying no-preprocessing path still works with predict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move fitresample after train/test split to prevent data leakage FitResample (SMOTE, outlier removal, augmentation) was applied to the full dataset before splitting, allowing synthetic samples derived from test/validation data to leak into training. Now FitResample runs after the split on training data only in both standard and AutoML paths. Also fixed AutoML preprocessing to FitTransform on training only and Transform on validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update registry docs and use consistent shuffle policy for splits Updated PreprocessingRegistry docs to clearly warn about concurrent build safety issues. Applied same time-series non-shuffle policy to the final supervised training split that AutoML already uses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use post-preparation training data in no-preprocessing else branch When _dataPreparationPipeline runs (resampling/outlier removal) it modifies XTrain/yTrain. The else branch (no preprocessing) was incorrectly using preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing training-only preparation results for downstream logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make data leakage tests self-sufficient without PR 928 Refactor tests that called result.Predict() (which fails due to the feature selection bug fixed in PR #928) to verify preprocessing pipeline state directly via PreprocessingInfo.TransformFeatures() instead. Tests now verify the data leakage fix independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: preserve preprocessing pipeline state during serialization (#936) * fix: add serialization support to preprocessing pipeline fitted state - Add [JsonProperty] attributes to private fields across all preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes fitted state (means, std devs, min/max, etc.) - Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for proper deserialization of base class state - Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization - Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class that serializes reliably with Newtonsoft.Json (ValueTuples lose field names during serialization) - Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers - Add 16 integration tests verifying serialization round-trips for all modified transformers, pipelines, and inverse transforms Fixes #930 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 13 pr review comments - serialization safety, tests, perf - Remove null-forgiving operator from PipelineStep (use JsonConstructor) - Make PipelineStep.Transformer setter private (immutable after construction) - Fix ColumnIndices deserialization (add private setter for Json.NET) - Fix Steps property to return cached read-only list (no alloc per call) - Add explicit range assertions to MinMaxScaler custom range test - Add transform equivalence check to SimpleImputer test - Add LpNormScaler serialization round-trip test - Add Normalizer serialization round-trip test - Add LogMeanVarianceScaler serialization round-trip test - TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make pipelinestep immutable, validate constructor, add missing tests - Made PipelineStep.Name private set to prevent mutation after construction - PipelineStep constructor now validates name (non-whitespace) and transformer (non-null), failing fast on invalid serialized steps - Added ColumnIndices round-trip test proving non-default indices survive serialization and untouched columns remain unchanged - Added TargetPipeline round-trip tests proving both null and non-null target pipelines survive serialization with transform equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty] backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer, DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler, LogScaler) to prevent Newtonsoft emitting both field and property. Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and _finalTransformer — rely on serializer-level settings with SafeSerializationBinder instead, preventing unsafe TypeNameHandling bypass. Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove static PreprocessingRegistry — prevents concurrent build race conditions (#935) * fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931) Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline` assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline is already stored as an instance field on the builder and flows to AiModelResult via PreprocessingInfo — the global static registry was unnecessary and caused race conditions when multiple builders ran concurrently. Changes: - Remove 6 static registry assignments from AiModelBuilder.cs - Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class - Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with an instance-level PreprocessingTransformer property - Add 4 integration tests verifying concurrent builds don't cross-contaminate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - registry deprecation, thread safety, docs - Fix PreprocessingTransformer docs: clarify protected accessibility - Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase - Replace PostprocessingRegistry static usage with instance-based transformer - Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry) - Update PreprocessingRegistry.Current docs to reflect no longer auto-set - Fix concurrent test to use separate data loaders (thread-safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wire preprocessing transformer onto document models in buildasync Changed PreprocessingTransformer and PostprocessingTransformer from protected to protected internal so AiModelBuilder can set them. BuildSupervisedInternalAsync now wires the preprocessing pipeline onto DocumentNeuralNetworkBase models, replacing the static registry approach that caused race conditions during concurrent builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow transformer surface, fail fast on unfitted, wire postprocessing - Changed PreprocessingTransformer/PostprocessingTransformer back to protected and added internal ConfigureTransformers() method for AiModelBuilder wiring - PreprocessDocument/PostprocessOutput now throw if transformer is set but not fitted instead of silently falling back to defaults - Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase - Updated PreprocessingRegistry docs to reflect deprecated/removed status - Added [Collection] to registry tests and try/finally cleanup - Fixed "override" wording to "assign" in docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: centralize document transformer injection and strengthen test assertions Extract transformer wiring into ConfigureDocumentTransformers() helper method that always passes current pipeline values (clearing prior state on reuse). Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult(). Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo is non-null and fitted, not just Assert.NotNull(result). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests PostprocessOutput no longer throws on unfitted transformers since postprocessing transforms (Softmax, LabelDecoder) are often stateless. Updated XML docs to reflect instance-level transformer priority. Strengthen concurrent and sequential test assertions to verify PreprocessingInfo is non-null and fitted on each result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from video Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from neural network layers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: warn on empty attributions and use guarded locals in jwt attestation verifier Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: include gain layer in backward pass and parameter updates in deepfilternet Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rebuild streaming buffers after deserializing fft/hop size in neural noise reducer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: respect bootstrap flag and align estimator probabilities to ensemble label space Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from clustering modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: throw on invalid ensemble members and cold model in adaptive random forest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify trained-state error messages in ordinal regression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused alpha_s and use guarded local consistently in unipc scheduler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add guard for latent variance in compute elbo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist cached gradient lookup out of per-sample loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace console.writeline with debug.writeline for checkpoint reporting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache compressed momentum after resampling projection matrices Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: show total sharded memory independently of per-device memory in tostring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 21 pr review comments for code quality improvements - CLIPScore: throw on embedding/weight dimension mismatch instead of Math.Min - Adam8BitOptimizer: remove redundant _vScales null check - BFGSOptimizer: move _inverseHessian null check before UpdateInverseHessian call - FTRLOptimizer: use post-update accumulator (nPlusGradSq) in both paths - MomentumOptimizer: snapshot _velocity before serialization loop - OptimizerBase: use RequireModel() instead of Model! - SymbolicPhysicsLearner: Clone throws on null children like Evaluate/Format - IncrementalPCA: fix error message to "internal state inconsistent" - ContradictionDetector: wrap JToken.Value<bool>() in try-catch - CriticModel: wrap JToken.Value<double>() in try-catch - LogNormalAFT: hoist Coefficients null check outside loops (3 methods) - WeibullAFT: hoist Coefficients null check outside loops (3 methods) - NHiTSModel: remove redundant ContainsKey before TryGetValue - DiffusionTools: support both 1D and 2D audio_data deserialization - GMMDetector: validate feature count before scoring a point - TimeSeriesForestClassifier: reject NumTrees <= 0 during training Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use guarded sigmaTs local variable consistently in dpm solver methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply feature selection in Predict() — fixes critical dimension mismatch bug (#928) * fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug The NormalOptimizer (default) randomly selects a subset of features during training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never applied the same feature selection to new input data. This caused the model to receive all columns while it was trained on fewer, throwing "Number of columns must equal length of vector" errors. Changes: - Add SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult to store actual column indices - Propagate indices through optimizer flow: PrepareAndEvaluateSolution → UpdateBestSolution → CreateOptimizationResult - Apply feature selection in AiModelResult.Predict() after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() - Add 12 integration tests verifying the full BuildAsync → Predict pipeline Closes #927 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - feature selection, copy semantics, docs - Remove unused LooseTolerance constant - Fix off-by-one: use != instead of > for feature index count check - Apply feature selection in InferenceSession.Predict() (critical gap) - Add comprehensive docs to SelectedFeatureIndices property - Fix WithParameters to deep-copy collections (consistent with DeepCopy) - Add copy independence assertion to WithParameters test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support 1D tensor feature selection and add predict integration test SelectFeaturesTensor previously threw for 1D tensors. Now handles them by directly selecting elements at the specified indices. Added integration test proving Predict applies SelectedFeatureIndices correctly with a real FeedForwardNeuralNetwork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use identity-check instead of count heuristic for feature selection Replaced inputSize == featureIndices.Count heuristic with proper identity-mapping check using SequenceEqual. A reordered selection like [2, 0, 1] has the same count but still needs reordering. Also validates indices are in-range before calling SelectFeatures. Renamed misleading test names and added shape assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fix collinear test fixture and add permuted feature selection test - Fixed direct-model test to use non-collinear features with correct formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges - Added permutation test proving [2, 0, 1] selection is not treated as identity — verifies the SequenceEqual check works correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add null guards to OptimizationResult Clone and WithParameters Callers that create OptimizationResult via object initializer (bypassing the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds, FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null. Deep-copy calls in Clone/WithParameters would throw NullReferenceException. Now falls back to empty defaults when properties are null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions - InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction instead of using a count-based check that skipped permuted selections - Move bounds validation to one-time check (cached _featureIndicesValidated flag) instead of scanning on every Predict() call - Assert permuted predictions actually differ from identity predictions - Replace fragile selectedCount < 10 with valid range assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: assert feature selection actually occurred in regression test Add assertion verifying optimizer selected a feature subset before testing Predict() with full-width input, proving the test exercises the reduced-column code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache identity check with volatile int, set deterministic weights in test Replace per-call SequenceEqual + Enumerable.Range allocation with cached volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a simple loop instead of LINQ for zero-allocation identity check. Thread-safe via volatile read/write semantics. Set explicit non-uniform weights [1, 2, 3] in permuted feature selection test to ensure permutation deterministically produces different output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve first batch of pr #941 review comments (10 of 54) - InterpretabilityDashboard: filter empty attributions instead of using console output, use validCount for averaging - BaggingClassifier: validate sampleSize vs data size when bootstrap=false - Denclue: hoist _attractors and _attractorDensities guards out of loops - BIRCH: use Labels ?? Predict(x) instead of Labels! in FitPredict - SelfOrganizingMap: hoist _weights and _neuronLabels guards out of loops - CLIQUE: hoist Labels guard out of accumulation loop - SUBCLU: hoist _trainingData and Labels guards out of loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve second batch of pr #941 review comments (20 of 54) - GraphIsomorphismLayer: add local variable aliases for _lastMlpHiddenPreRelu in Forward and BackwardManual for clearer null safety - PagedAttentionKernel: add braces to ToFloat/FromFloat if blocks, replace null-forgiving with explicit boxing guards - KnowledgeDistillationTrainerBase: remove Console.WriteLine from library checkpoint loading code - DiffusionMemoryManager: separate PerDeviceMemory and TotalShardedMemory formatting into independent checks - MixtureOfExpertsLayer: add null guard before ApplyActivationToGraph instead of null-forgiving operator - KVCache: replace null-forgiving with explicit boxing guard for Half conversion - CachedMultiHeadAttention: hoist _cache null guard before Append call - CachedGroupedQueryAttention: hoist _cache null guard before Append call - DeformableConvolutionalLayer: stale comments (variables already renamed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve build errors from linter's automated null-forgiving removal Fix syntax and compilation errors introduced by the automated linter that removed null-forgiving operators without proper replacements: - Fix missing closing parentheses in foreach statements (COPKMeans, AgglomerativeClustering) - Fix duplicate variable declarations (AntColonyOptimization, CachedGroupedQueryAttention, CachedMultiHeadAttention) - Fix dropped underscore prefixes on private field references (KVCache, ConvLSTMLayer, PrincipalNeighbourhoodAggregationLayer) - Add validated accessor properties for nullable cache arrays (KVCache) - Fix variable ordering issues (FloraAdapter) - Fix nullable reference handling (VectorModel, UncertaintyCalibrationData, CrossValidationResult, AiModelResult.Uncertainty) - Fix Debug.WriteLine() call with no arguments (KnowledgeDistillationTrainerBase) - Add null guard for ClassLabels in PredictProbabilities (BaggingClassifier) - Fix nullable permutation array access (ConcatenateLayer) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist null guards for lambdas and alphaTs in DPMSolverMultistepScheduler Replace null-forgiving operators with proper null-coalescing throws for _lambdas and _alphaTs in FirstOrderUpdate, SecondOrderUpdate, and ThirdOrderUpdate methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove redundant null guard in LeafSent140FederatedDatasetLoader The testFilePath is already validated by IsNullOrWhiteSpace check in the enclosing if statement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: capture PreprocessingInfo in local variable to prevent TOCTOU Snapshot PreprocessingInfo once before accessing its members to avoid potential race conditions with concurrent access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: dispose pre-permutation tensor to prevent memory leak in ConcatenateLayer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: accumulate projector gradients from both views in BarlowTwins, SimCLR, and iBOT BarlowTwins: capture view1 projector grads before view2 backward clears them, then combine both views' grads for the projector update. SimCLR: replay projector forward per-view before backward to use correct caches, and accumulate projector gradients from both views. iBOT: backpropagate the full objective (clsLoss1 + clsLoss2 + lambda*(patchLoss1 + patchLoss2)) instead of only clsLoss1. Add ComputeMaskedPatchGradient for MSE gradient on masked patches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct backward pass in RIFE, FILM, and RAFT video models RIFE: use cached block input shape instead of previous block output shape for gradient splitting to correctly match the fused input width. FILM: fail fast on all required backward-pass caches instead of silently skipping fusion and flow estimator backprop when caches are null. RAFT: replace incorrect upsampleConv.Backward with bilinear UpsampleFlowBackward that mirrors the actual forward upsampling, and hoist invariant layer guards out of the refinement loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen DataLeakagePrevention tests with behavioral assertions ScalerStatisticsReflectTrainingDataOnly: add positive assertion that the pipeline transformation is not identity, confirming it was fitted on data. StillWorksCorrectly: exercise the model with actual predictions and assert outputs are finite numbers, not just that objects are non-null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address major PR review comments across multiple files KNNEvaluator: hoist _trainLabels null check outside the voting loop. JointMutualInformation: use local miScores instead of field on line 212. ContradictionDetector: treat malformed contradictory fields as false instead of falling through to text scan that matches JSON key names. BFGSOptimizer: guard inverse Hessian update with curvature condition (y*s > 0) and use validated locals instead of null-forgiving operators. IncrementalPCA: validate all fitted state upfront in PartialFit and remove null-forgiving on _explainedVariance. RealGatedLinearRecurrenceLayer: hoist all cached tensor validation to top of Backward to prevent partial gradient corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining major PR review comments PreprocessingRegistry: align documentation — clarify get/set is thread-safe but concurrent builds still overwrite each other's pipeline. InMemoryFederatedTrainer: fail fast when HE is enabled but no parameter indices are resolved instead of silently downgrading. Fix DP accounting: FedAsync uses 1/totalClients per iteration, FedBuff uses bufferSize/totalClients. AiModelBuilder: extract IsTimeSeriesTask() helper that checks both AutoML options and model type name for chronological split decisions. Reject row-changing data preparation in natural-client federated mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address trivial PR review comments across multiple files DeformableConvolutionalLayer: rename misleading variable names (mwShape/mbShape to maskWeights/maskBias, mw2/mwg/mbg to maskW/maskWeightGrad/maskBiasGrad). S4DLayer: hoist _lastHiddenStatesReal/Imag guards out of the BPTT loop. OnlineFeatureSelector: pass resolved running arrays into ComputeOnlineCorrelation instead of re-checking nullable fields inside the hot loop. SSLPretrainingPipeline: extract encoder to a local variable to avoid duplicate null-check expressions. SubpixelConvolutionalLayer: hoist ScalarActivation out of the per-element loop. BaggingClassifier: replace OrderBy(random.Next()) with Fisher-Yates partial shuffle for O(sampleSize) unbiased sampling without replacement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist gradient tensor validations in HedgehogLayer.FeatureMapBackward Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 12 pr review comments across multiple files - FloraAdapter: fix momentum reset bug (was zeroed every step) - KnowledgeDistillationTrainerBase: remove dead checkpoint branch - KVCache: hoist loop-invariant lookups outside inner loops - AiModelResult: capture stable model reference in explainer delegate - VectorModel: guard empty coefficients, remove dead BoxOrDefault - ConvLSTMLayer: use local lastInput instead of field reference - PrincipalNeighbourhoodAggregationLayer: hoist invariant tensors, remove null-forgiving operators in fallback path - BarlowTwins: remove redundant projector null-check - FILM: use local-cache pattern for cached features - DeepFilterNet: propagate gain layer backward gradient Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: net471 build errors in baggingclassifier and leafsent140loader - Replace range syntax pool[..sampleSize] with Array.Copy for net471 compatibility (RuntimeHelpers.GetSubArray not available) - Add explicit null check before string.IsNullOrWhiteSpace for net471 compatibility (missing NotNullWhen attribute) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use correct inner class name and remove null-forgiving in dashboard Replace InterpretabilityExplanation with Explanation (the actual inner class name) and replace null-forgiving operators with proper null guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address pr #941 review comments - KDTree: use single guarded local for _data null check in Compare - BallTree: use single guarded local for _data null check in Sort - DiffusionMemoryManager: remove duplicate unconditional total sharded line - InfluenceFunctionExplainer: hoist cachedGradients null check outside loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 7, 2026 13:51

vercel bot deployed to Preview – aidotnet_website March 7, 2026 13:52 View deployment

Copilot started reviewing on behalf of ooples March 7, 2026 13:52 View session

vercel bot deployed to Preview – aidotnet-playground-api March 7, 2026 13:52 View deployment

coderabbitai bot requested changes Mar 7, 2026

View reviewed changes

Comment thread src/Preprocessing/PreprocessingPipeline.cs Outdated

Comment thread .../AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

Comment thread .../AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

Copilot AI reviewed Mar 7, 2026

View reviewed changes

vercel bot deployed to Preview – aidotnet_website March 7, 2026 14:52 View deployment

vercel bot deployed to Preview – aidotnet-playground-api March 7, 2026 14:53 View deployment

coderabbitai bot requested changes Mar 7, 2026

View reviewed changes

Comment thread src/Preprocessing/PreprocessingPipeline.cs Outdated

Comment thread src/Preprocessing/PreprocessingPipeline.cs Outdated

Comment thread src/Preprocessing/TransformerBase.cs

Comment thread .../AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingSerializationIntegrationTests.cs

Copilot AI review requested due to automatic review settings March 7, 2026 16:05

vercel bot deployed to Preview – aidotnet_website March 7, 2026 16:05 View deployment

Copilot started reviewing on behalf of ooples March 7, 2026 16:06 View session

vercel bot deployed to Preview – aidotnet-playground-api March 7, 2026 16:06 View deployment

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 7, 2026

View reviewed changes

Copilot AI reviewed Mar 7, 2026

View reviewed changes

vercel bot deployed to Preview – aidotnet-playground-api March 7, 2026 16:37 View deployment

vercel bot deployed to Preview – aidotnet_website March 7, 2026 16:37 View deployment

coderabbitai bot requested changes Mar 7, 2026

View reviewed changes

Comment thread src/Preprocessing/Scalers/Normalizer.cs

coderabbitai bot approved these changes Mar 7, 2026

View reviewed changes

ooples merged commit 49903c7 into master Mar 7, 2026
37 of 40 checks passed

ooples deleted the fix/preprocessing-serialization branch March 7, 2026 19:37

coderabbitai bot mentioned this pull request Mar 8, 2026

test: add 25 critical integration tests for AiModelBuilder pipeline #947

Merged

3 tasks

Uh oh!

Conversation

ooples commented Mar 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Files changed

Test plan

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ooples commented Mar 7, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Mar 7, 2026 •

edited

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading