Conversation
) Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline` assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline is already stored as an instance field on the builder and flows to AiModelResult via PreprocessingInfo — the global static registry was unnecessary and caused race conditions when multiple builders ran concurrently. Changes: - Remove 6 static registry assignments from AiModelBuilder.cs - Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class - Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with an instance-level PreprocessingTransformer property - Add 4 integration tests verifying concurrent builds don't cross-contaminate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaces global static preprocessing/postprocessing registry wiring with per-instance transformers: AiModelBuilder no longer mutates static registries, DocumentNeuralNetworkBase gains instance-level PreprocessingTransformer/PostprocessingTransformer used at runtime, both registries marked [Obsolete], and integration tests assert no cross-model contamination. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer/Test
participant Builder as AiModelBuilder
participant Model as DocumentNeuralNetworkBase
participant Registry as Pre/PostprocessingRegistry (obsolete)
Dev->>Builder: ConfigurePreprocessing(pre) / ConfigurePostprocessing(post)
Builder-->>Builder: store pipelines on builder instance
Dev->>Builder: BuildSupervisedAsync(...) / BuildProgramSynthesisInferenceOnlyResult
Builder->>Model: instantiate model
Builder->>Model: call ConfigureTransformers(pre, post)
alt PreprocessingTransformer present & fitted
Model->>Model: Preprocess using instance transformer
else
Model->>Model: ApplyDefaultPreprocessing
end
Model->>Model: Run training/inference
alt PostprocessingTransformer present & fitted
Model->>Model: Postprocess using instance transformer
else
Model->>Model: ApplyDefaultPostprocessing
end
Note over Registry: Registry marked obsolete and no longer written by Builder\n(kept readable for backward compatibility)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #931 by removing reliance on static/global preprocessing (and postprocessing) registries that can be overwritten during concurrent AiModelBuilder usage, aiming to prevent cross-contamination between parallel model builds.
Changes:
- Removed
PreprocessingRegistry<T,TInput>.Current = ...assignments fromAiModelBuilder.ConfigurePreprocessing()overloads. - Marked
PreprocessingRegistry<T,TInput>as[Obsolete]and added deprecation guidance in its XML docs. - Replaced
DocumentNeuralNetworkBasepreprocessing from static registry lookup to an instance-levelPreprocessingTransformerproperty. - Added integration tests asserting the static preprocessing registry is not set by model building and remains usable as a deprecated API.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs | Adds integration coverage for concurrent/sequential builds and deprecated registry behavior. |
| src/Preprocessing/PreprocessingRegistry.cs | Deprecates the static preprocessing registry with [Obsolete] and updated documentation. |
| src/Document/DocumentNeuralNetworkBase.cs | Switches document preprocessing from static registry usage to an instance-level transformer property. |
| src/AiModelBuilder.cs | Removes global registry writes for preprocessing and postprocessing configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Document/DocumentNeuralNetworkBase.cs (1)
239-249:⚠️ Potential issue | 🟠 MajorInconsistency:
PostprocessOutputstill uses staticPostprocessingRegistry.The PR objectives explicitly state removing
PostprocessingRegistryassignments fromConfigurePostprocessing()to prevent the same race conditions. However, this method still reads from the static registry:if (PostprocessingRegistry<T, Tensor<T>>.IsConfigured) { return PostprocessingRegistry<T, Tensor<T>>.Transform(modelOutput); }This creates the same TOCTOU race condition for postprocessing that you fixed for preprocessing. For consistency, this should use an instance-level
PostprocessingTransformerproperty analogous toPreprocessingTransformer.,
Proposed fix
Add an instance-level postprocessing transformer similar to preprocessing:
+ /// <summary> + /// Gets or sets the instance-level postprocessing transformer for this document model. + /// </summary> + protected IDataTransformer<T, Tensor<T>, Tensor<T>>? PostprocessingTransformer { get; set; }Then update
PostprocessOutput:protected Tensor<T> PostprocessOutput(Tensor<T> modelOutput) { - // Priority 1: User-configured pipeline via AiModelBuilder - if (PostprocessingRegistry<T, Tensor<T>>.IsConfigured) + // Priority 1: Instance-level transformer (set explicitly on this model) + var transformer = PostprocessingTransformer; + if (transformer is not null && transformer.IsFitted) { - return PostprocessingRegistry<T, Tensor<T>>.Transform(modelOutput); + return transformer.Transform(modelOutput); } // Priority 2: Model-specific industry-standard defaults return ApplyDefaultPostprocessing(modelOutput); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Document/DocumentNeuralNetworkBase.cs` around lines 239 - 249, PostprocessOutput currently reads the static PostprocessingRegistry<T, Tensor<T>> (PostprocessingRegistry.IsConfigured / Transform) which reintroduces the TOCTOU race you eliminated for preprocessing; replace use of the static registry with an instance-level PostprocessingTransformer property (mirror how PreprocessingTransformer is used), set that transformer in ConfigurePostprocessing instead of assigning the static registry, and have PostprocessOutput call PostprocessingTransformer.Transform(modelOutput) when non-null (falling back to ApplyDefaultPostprocessing when null) so all postprocessing is instance-scoped and consistent with preprocessing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 64-73: Add positive assertions to verify instance-level
preprocessing on each result: for both results (results[0] and results[1] or the
loop variable), assert that result.PreprocessingInfo is not null and that
result.PreprocessingInfo.IsFitted is true (use the null-conditional check like
result.PreprocessingInfo?.IsFitted ?? false if needed). Place these assertions
immediately after the existing NotNull checks so the test confirms preprocessing
flows via the instance-level PreprocessingInfo rather than the static
PreprocessingRegistry.
---
Outside diff comments:
In `@src/Document/DocumentNeuralNetworkBase.cs`:
- Around line 239-249: PostprocessOutput currently reads the static
PostprocessingRegistry<T, Tensor<T>> (PostprocessingRegistry.IsConfigured /
Transform) which reintroduces the TOCTOU race you eliminated for preprocessing;
replace use of the static registry with an instance-level
PostprocessingTransformer property (mirror how PreprocessingTransformer is
used), set that transformer in ConfigurePostprocessing instead of assigning the
static registry, and have PostprocessOutput call
PostprocessingTransformer.Transform(modelOutput) when non-null (falling back to
ApplyDefaultPostprocessing when null) so all postprocessing is instance-scoped
and consistent with preprocessing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: faf8dd41-26e8-4977-ac42-41533098adcd
📒 Files selected for processing (4)
src/AiModelBuilder.cssrc/Document/DocumentNeuralNetworkBase.cssrc/Preprocessing/PreprocessingRegistry.cstests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs
💤 Files with no reviewable changes (1)
- src/AiModelBuilder.cs
…ty, docs - Fix PreprocessingTransformer docs: clarify protected accessibility - Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase - Replace PostprocessingRegistry static usage with instance-based transformer - Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry) - Update PreprocessingRegistry.Current docs to reflect no longer auto-set - Fix concurrent test to use separate data loaders (thread-safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (1)
72-81:⚠️ Potential issue | 🔴 CriticalBlocking: these tests still don't prove the new instance-local preprocessing path works.
Assert.NotNull(...)plusIsConfigured == falseonly proves the old global path stayed empty. These tests still pass if preprocessing is silently dropped altogether. Add positive assertions onAiModelResult.PreprocessingInfoso the test fails when the fitted pipeline is not carried through the new instance-local path.✅ Concrete assertions to add
Assert.NotNull(results[0]); + Assert.NotNull(results[0].PreprocessingInfo); + Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false); Assert.NotNull(results[1]); + Assert.NotNull(results[1].PreprocessingInfo); + Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false); @@ Assert.NotNull(result); + Assert.NotNull(result.PreprocessingInfo); + Assert.True(result.PreprocessingInfo?.IsFitted ?? false); @@ Assert.NotNull(result1); + Assert.NotNull(result1.PreprocessingInfo); + Assert.True(result1.PreprocessingInfo?.IsFitted ?? false); Assert.NotNull(result2); + Assert.NotNull(result2.PreprocessingInfo); + Assert.True(result2.PreprocessingInfo?.IsFitted ?? false);As per coding guidelines, "Tests MUST be production-quality" and "Trivial assertions: Tests that only check
Assert.NotNullwhen they should verify actual behavior/values" are blocking issues.Also applies to: 114-121, 185-193
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs` around lines 72 - 81, The tests currently only assert results[0]/results[1] are non-null and that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false; add positive assertions that the fitted preprocessing pipeline is passed via the instance-local path by checking AiModelResult.PreprocessingInfo on each result (e.g., results[0].PreprocessingInfo and results[1].PreprocessingInfo) is not null and contains expected metadata/transform count or types from the training run so the test fails if preprocessing was dropped; update the assertions around the same test cases (including the other occurrences at the referenced ranges) to validate concrete properties of PreprocessingInfo rather than only NotNull.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Preprocessing/PreprocessingRegistry.cs`:
- Around line 10-16: Update the XML documentation in PreprocessingRegistry (and
any remaining remarks in this file) to remove the old singleton/default
guidance: delete or rewrite any text that claims this registry is the active
global preprocessing path or that configured preprocessing is automatically
applied to all models; instead clearly state the API is obsolete, refer callers
to AiModelBuilder.ConfigurePreprocessing() for per-builder pipelines, and
mention that preprocessing is carried forward via AiModelResult and
PreprocessingInfo rather than applied globally by PreprocessingRegistry.
---
Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 72-81: The tests currently only assert results[0]/results[1] are
non-null and that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is
false; add positive assertions that the fitted preprocessing pipeline is passed
via the instance-local path by checking AiModelResult.PreprocessingInfo on each
result (e.g., results[0].PreprocessingInfo and results[1].PreprocessingInfo) is
not null and contains expected metadata/transform count or types from the
training run so the test fails if preprocessing was dropped; update the
assertions around the same test cases (including the other occurrences at the
referenced ranges) to validate concrete properties of PreprocessingInfo rather
than only NotNull.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eba46efd-bea3-447f-b68e-32f802c2777b
📒 Files selected for processing (4)
src/Document/DocumentNeuralNetworkBase.cssrc/Postprocessing/PostprocessingRegistry.cssrc/Preprocessing/PreprocessingRegistry.cstests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs
Changed PreprocessingTransformer and PostprocessingTransformer from protected to protected internal so AiModelBuilder can set them. BuildSupervisedInternalAsync now wires the preprocessing pipeline onto DocumentNeuralNetworkBase models, replacing the static registry approach that caused race conditions during concurrent builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Document/DocumentNeuralNetworkBase.cs`:
- Around line 201-206: The code currently ignores a configured but unfitted
transformer; update the preprocessing and postprocessing methods that reference
PreprocessingTransformer and PostprocessingTransformer so that if a transformer
is attached but transformer.IsFitted is false you fail fast (throw a clear
InvalidOperationException or similar) instead of falling back to the default
pipeline; specifically, change the logic from "if (transformer is not null &&
transformer.IsFitted) return transformer.Transform(...)" to first check "if
(transformer is not null) { if (!transformer.IsFitted) throw new
InvalidOperationException(...); return transformer.Transform(...); }" for both
the preprocessing and postprocessing code paths so misconfiguration is surfaced
immediately.
- Around line 126-139: The properties PreprocessingTransformer and
PostprocessingTransformer on DocumentNeuralNetworkBase are currently declared
protected internal, exposing builder plumbing to external inheritors; change
their accessibility to protected only and add an internal wiring method (e.g.,
ConfigureTransformers) that accepts the two IDataTransformer<T, Tensor<T>,
Tensor<T>>? parameters and assigns them to PreprocessingTransformer and
PostprocessingTransformer so AiModelBuilder (same assembly) can set them without
widening the public inheritance surface.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6df55347-4489-4d7d-a5ec-22e39a96e792
📒 Files selected for processing (2)
src/AiModelBuilder.cssrc/Document/DocumentNeuralNetworkBase.cs
…essing - Changed PreprocessingTransformer/PostprocessingTransformer back to protected and added internal ConfigureTransformers() method for AiModelBuilder wiring - PreprocessDocument/PostprocessOutput now throw if transformer is set but not fitted instead of silently falling back to defaults - Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase - Updated PreprocessingRegistry docs to reflect deprecated/removed status - Added [Collection] to registry tests and try/finally cleanup - Fixed "override" wording to "assign" in docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/Preprocessing/PreprocessingRegistry.cs (1)
17-20:⚠️ Potential issue | 🟡 MinorFinish the obsolete-registry doc cleanup.
The class-level remarks are now correct, but
Transform()still says models can safely call this API when preprocessing is missing. That leaves the old global-registry guidance in place and contradicts the new backward-compatibility-only message.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Preprocessing/PreprocessingRegistry.cs` around lines 17 - 20, Update the XML doc on PreprocessingRegistry.Transform() to remove the outdated guidance that models can safely call this API when preprocessing is missing and instead state that the method exists only for backward compatibility with external code and will be removed in a future version; ensure the Transform() summary/remarks align with the class-level remarks in PreprocessingRegistry and mention deprecation/backward-compatibility status rather than recommending new callers use it.tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (2)
75-84:⚠️ Potential issue | 🟠 MajorAssertions are insufficient to verify the test's stated goal.
The test claims to verify "DoNotCrossContaminate" but only checks:
- Results are not null (trivial)
- Static registry was not set (proves registry isn't used)
This does NOT verify that preprocessing actually worked correctly through the instance-level path. If preprocessing is completely broken (neither registry nor instance-level), this test still passes. Add assertions on
PreprocessingInfoto confirm the instance-level flow:// Assert: both builds succeed Assert.NotNull(results[0]); Assert.NotNull(results[1]); + + // Assert: preprocessing flowed through instance-level PreprocessingInfo + Assert.NotNull(results[0].PreprocessingInfo); + Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false, + "Preprocessing should be fitted via instance-level flow"); + Assert.NotNull(results[1].PreprocessingInfo); + Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false, + "Preprocessing should be fitted via instance-level flow");To truly verify "no cross-contamination", consider also asserting that each model's scaler type is preserved (e.g., by checking pipeline step types or transformed output characteristics).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs` around lines 75 - 84, The current assertions only check non-null results and that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false; update the test to also validate instance-level preprocessing by inspecting each result's PreprocessingInfo (e.g., results[0].PreprocessingInfo and results[1].PreprocessingInfo) to ensure they are not null and contain the expected pipeline/step types (such as scaler/transformer type) for each model, and add assertions that the two PreprocessingInfo instances preserve their respective scaler/transform types (or produce expected transformed-output characteristics) to prove preprocessing flowed instance-locally and did not cross-contaminate.
193-200:⚠️ Potential issue | 🟠 MajorSame trivial assertion pattern - doesn't verify preprocessing actually worked.
Both builds could produce garbage results and this test would still pass. Add assertions verifying
PreprocessingInfowas properly configured for each result:// Assert: both succeeded Assert.NotNull(result1); Assert.NotNull(result2); + + // Assert: preprocessing flowed through instance-level PreprocessingInfo + Assert.NotNull(result1.PreprocessingInfo); + Assert.True(result1.PreprocessingInfo?.IsFitted ?? false); + Assert.NotNull(result2.PreprocessingInfo); + Assert.True(result2.PreprocessingInfo?.IsFitted ?? false);Without these assertions, the test verifies that the registry remains empty, but not that instance-level preprocessing actually functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs` around lines 193 - 200, The test only checks registry emptiness but not that each build actually configured preprocessing; update the assertions to verify instance-level preprocessing on result1 and result2 by asserting their PreprocessingInfo is not null and indicates configuration (e.g., Assert.NotNull(result1.PreprocessingInfo); Assert.True(result1.PreprocessingInfo.IsConfigured) or equivalent property/check on PreprocessingInfo), and repeat for result2 so the test validates that PreprocessingInfo was properly set on each result while PreprocessingRegistry<double, Matrix<double>> remains unconfigured.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/AiModelBuilder.cs`:
- Around line 1871-1882: Extract the transformer wiring into a private helper
(e.g., ConfigureDocumentTransformers) that accepts the model (IFullModel<T,
TInput, TOutput>? or object) and inside checks "if (model is
Document.DocumentNeuralNetworkBase<T> documentModel)" then casts
_preprocessingPipeline/_postprocessingPipeline to IDataTransformer<T, Tensor<T>,
Tensor<T>> and calls documentModel.ConfigureTransformers(preTransformer,
postTransformer) (allowing nulls to clear prior state); call this helper from
every build path that may return a DocumentNeuralNetworkBase<T> including
BuildProgramSynthesisInferenceOnlyResult() so transformers are always
set/cleared for reused instances.
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 117-124: The test currently only asserts the builder returned a
non-null result and that PreprocessingRegistry<double,
Matrix<double>>.IsConfigured is false; instead, verify preprocessing actually
ran by asserting the built model/result contains a non-null PreprocessingInfo
(e.g., result.PreprocessingInfo or result.Model.PreprocessingInfo), that
PreprocessingInfo lists the expected transforms (scaling/normalization entries),
and that the stored scale/shift parameters match the training data statistics
(means/variances) used in the test dataset; keep the original check that the
static PreprocessingRegistry<double, Matrix<double>>.IsConfigured remains false
so the registry wasn't set globally, and replace the lone Assert.NotNull(result)
with these behavioral assertions to prove preprocessing was applied and
configured.
---
Duplicate comments:
In `@src/Preprocessing/PreprocessingRegistry.cs`:
- Around line 17-20: Update the XML doc on PreprocessingRegistry.Transform() to
remove the outdated guidance that models can safely call this API when
preprocessing is missing and instead state that the method exists only for
backward compatibility with external code and will be removed in a future
version; ensure the Transform() summary/remarks align with the class-level
remarks in PreprocessingRegistry and mention deprecation/backward-compatibility
status rather than recommending new callers use it.
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 75-84: The current assertions only check non-null results and that
PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false; update the
test to also validate instance-level preprocessing by inspecting each result's
PreprocessingInfo (e.g., results[0].PreprocessingInfo and
results[1].PreprocessingInfo) to ensure they are not null and contain the
expected pipeline/step types (such as scaler/transformer type) for each model,
and add assertions that the two PreprocessingInfo instances preserve their
respective scaler/transform types (or produce expected transformed-output
characteristics) to prove preprocessing flowed instance-locally and did not
cross-contaminate.
- Around line 193-200: The test only checks registry emptiness but not that each
build actually configured preprocessing; update the assertions to verify
instance-level preprocessing on result1 and result2 by asserting their
PreprocessingInfo is not null and indicates configuration (e.g.,
Assert.NotNull(result1.PreprocessingInfo);
Assert.True(result1.PreprocessingInfo.IsConfigured) or equivalent property/check
on PreprocessingInfo), and repeat for result2 so the test validates that
PreprocessingInfo was properly set on each result while
PreprocessingRegistry<double, Matrix<double>> remains unconfigured.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 57bc0144-6e6a-4b17-9661-038cc33b4930
📒 Files selected for processing (4)
src/AiModelBuilder.cssrc/Document/DocumentNeuralNetworkBase.cssrc/Preprocessing/PreprocessingRegistry.cstests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs
…sertions Extract transformer wiring into ConfigureDocumentTransformers() helper method that always passes current pipeline values (clearing prior state on reuse). Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult(). Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo is non-null and fitted, not just Assert.NotNull(result). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/AiModelBuilder.cs (1)
1250-1252:⚠️ Potential issue | 🔴 CriticalBlocking: the streaming supervised path still bypasses the new transformer wiring.
These additions cover the regular supervised and inference-only paths, but
BuildStreamingSupervisedAsync()at Line 1487 still never callsConfigureDocumentTransformers(_model). ForDocument.DocumentNeuralNetworkBase<T>, that leaves the streaming path able to retain stale preprocess/postprocess transformers from a previous build, or to miss the current builder's pipelines entirely.🔧 Proposed fix
private async Task<AiModelResult<T, TInput, TOutput>> BuildStreamingSupervisedAsync( IStreamingDataLoader<T, TInput, TOutput> streamingLoader) { // Apply GPU configuration first ApplyGpuConfiguration(); // Apply memory management configuration (gradient checkpointing, etc.) ApplyMemoryConfiguration(); // Ensure we have a model configured if (_model is null) { throw new InvalidOperationException( "Streaming training requires a model to be configured. Use ConfigureModel() before calling BuildAsync()."); } + + ConfigureDocumentTransformers(_model); var numOps = MathHelper.GetNumericOperations<T>();A regression test that reuses the same
Document.DocumentNeuralNetworkBase<T>through the streaming loader path would pin this down.As per coding guidelines: "Incomplete features: Half-implemented patterns where some code paths work but others silently do nothing" are blocking issues.
Also applies to: 1874-1877, 6294-6308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/AiModelBuilder.cs` around lines 1250 - 1252, The streaming supervised path never wires the new transformers: modify BuildStreamingSupervisedAsync to call ConfigureDocumentTransformers(_model) (same wiring used for the regular supervised/inference-only paths) after the model is constructed and before it is used by Document.DocumentNeuralNetworkBase<T>, ensuring _model's preprocess/postprocess pipelines are replaced; add a regression test that reuses the same Document.DocumentNeuralNetworkBase<T> via the streaming loader to assert that transformers are updated (no stale transformers remain) when BuildStreamingSupervisedAsync runs.tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs (1)
75-78:⚠️ Potential issue | 🟠 MajorAdd PreprocessingInfo assertions to verify instance-level preprocessing flow.
The concurrent builds test only verifies that results are non-null, but doesn't confirm preprocessing was actually applied through the instance-level
PreprocessingInfo. Without these assertions, the test could pass even if preprocessing is completely broken.// Assert: both builds succeed Assert.NotNull(results[0]); Assert.NotNull(results[1]); + + // Assert: preprocessing flowed through instance-level PreprocessingInfo for both + Assert.NotNull(results[0].PreprocessingInfo); + Assert.True(results[0].PreprocessingInfo?.IsFitted ?? false, + "First build's preprocessing should be fitted via instance-level flow"); + Assert.NotNull(results[1].PreprocessingInfo); + Assert.True(results[1].PreprocessingInfo?.IsFitted ?? false, + "Second build's preprocessing should be fitted via instance-level flow");Per coding guidelines, tests that only check
Assert.NotNullwhen they should verify actual behavior are blocking issues.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs` around lines 75 - 78, Update the test to assert that instance-level preprocessing actually ran by checking the PreprocessingInfo on the produced results instead of only asserting non-null; locate the assertions around results[0] and results[1] in PreprocessingRegistryIntegrationTests and add checks that each result.PreprocessingInfo (or the specific property on the result that records applied preprocessors) is non-null and contains the expected entries (e.g., contains the instance-level preprocessor ID/name or that the processed output differs from the raw input), so the test verifies the preprocessing flow end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 198-206: The test only asserts that result1 and result2 are
non-null but never verifies that preprocessing actually ran; update the
assertions to mirror BuildAsync_WithPreprocessing_Succeeds_WithoutRegistry by
checking the PreprocessingInfo on both results (e.g., assert
result1.PreprocessingInfo and result2.PreprocessingInfo are not null and contain
the expected preprocessing details/steps), and retain the existing assertion
that PreprocessingRegistry<double, Matrix<double>>.IsConfigured is false;
reference result1, result2, and PreprocessingInfo to locate and add the new
assertions.
---
Duplicate comments:
In `@src/AiModelBuilder.cs`:
- Around line 1250-1252: The streaming supervised path never wires the new
transformers: modify BuildStreamingSupervisedAsync to call
ConfigureDocumentTransformers(_model) (same wiring used for the regular
supervised/inference-only paths) after the model is constructed and before it is
used by Document.DocumentNeuralNetworkBase<T>, ensuring _model's
preprocess/postprocess pipelines are replaced; add a regression test that reuses
the same Document.DocumentNeuralNetworkBase<T> via the streaming loader to
assert that transformers are updated (no stale transformers remain) when
BuildStreamingSupervisedAsync runs.
In
`@tests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs`:
- Around line 75-78: Update the test to assert that instance-level preprocessing
actually ran by checking the PreprocessingInfo on the produced results instead
of only asserting non-null; locate the assertions around results[0] and
results[1] in PreprocessingRegistryIntegrationTests and add checks that each
result.PreprocessingInfo (or the specific property on the result that records
applied preprocessors) is non-null and contains the expected entries (e.g.,
contains the instance-level preprocessor ID/name or that the processed output
differs from the raw input), so the test verifies the preprocessing flow
end-to-end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e30f40c2-bf08-454a-bc84-8a89ae6abd79
📒 Files selected for processing (2)
src/AiModelBuilder.cstests/AiDotNet.Tests/IntegrationTests/Preprocessing/PreprocessingRegistryIntegrationTests.cs
…gthen tests PostprocessOutput no longer throws on unfitted transformers since postprocessing transforms (Softmax, LabelDecoder) are often stateless. Updated XML docs to reflect instance-level transformer priority. Strengthen concurrent and sequential test assertions to verify PreprocessingInfo is non-null and fitted on each result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…race conditions (#935) * fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931) Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline` assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline is already stored as an instance field on the builder and flows to AiModelResult via PreprocessingInfo — the global static registry was unnecessary and caused race conditions when multiple builders ran concurrently. Changes: - Remove 6 static registry assignments from AiModelBuilder.cs - Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class - Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with an instance-level PreprocessingTransformer property - Add 4 integration tests verifying concurrent builds don't cross-contaminate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - registry deprecation, thread safety, docs - Fix PreprocessingTransformer docs: clarify protected accessibility - Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase - Replace PostprocessingRegistry static usage with instance-based transformer - Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry) - Update PreprocessingRegistry.Current docs to reflect no longer auto-set - Fix concurrent test to use separate data loaders (thread-safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wire preprocessing transformer onto document models in buildasync Changed PreprocessingTransformer and PostprocessingTransformer from protected to protected internal so AiModelBuilder can set them. BuildSupervisedInternalAsync now wires the preprocessing pipeline onto DocumentNeuralNetworkBase models, replacing the static registry approach that caused race conditions during concurrent builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow transformer surface, fail fast on unfitted, wire postprocessing - Changed PreprocessingTransformer/PostprocessingTransformer back to protected and added internal ConfigureTransformers() method for AiModelBuilder wiring - PreprocessDocument/PostprocessOutput now throw if transformer is set but not fitted instead of silently falling back to defaults - Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase - Updated PreprocessingRegistry docs to reflect deprecated/removed status - Added [Collection] to registry tests and try/finally cleanup - Fixed "override" wording to "assign" in docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: centralize document transformer injection and strengthen test assertions Extract transformer wiring into ConfigureDocumentTransformers() helper method that always passes current pipeline values (clearing prior state on reuse). Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult(). Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo is non-null and fitted, not just Assert.NotNull(result). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests PostprocessOutput no longer throws on unfitted transformers since postprocessing transforms (Softmax, LabelDecoder) are often stateless. Updated XML docs to reflect instance-level transformer priority. Strengthen concurrent and sequential test assertions to verify PreprocessingInfo is non-null and fitted on each result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ining directories (#933) (#941) * fix: prevent data leakage by fitting preprocessing only on training data In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g., StandardScaler) was fitted on the ENTIRE dataset before the train/test split. This leaked test/validation statistics (mean, std dev) into the training pipeline, causing artificially inflated metrics. The fix restructures the method: 1. Split data into train/val/test FIRST using DataSplitter.Split() 2. FitTransform preprocessing pipeline on training data ONLY 3. Transform (not FitTransform) validation and test data The federated learning path is unchanged — it correctly uses all data as training data, so FitTransform on everything remains correct. Fixes #929 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync - Scope #pragma warning disable CS8600/CS8604 with matching restore - Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads - Update PreprocessingRegistry docs to reflect thread-safety and usage pattern - Rewrite tests to actually detect leakage: verify scaler produces different output than all-data scaler, verify transform shape and non-identity - Add end-to-end predict test with preprocessing - Add test verifying no-preprocessing path still works with predict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move fitresample after train/test split to prevent data leakage FitResample (SMOTE, outlier removal, augmentation) was applied to the full dataset before splitting, allowing synthetic samples derived from test/validation data to leak into training. Now FitResample runs after the split on training data only in both standard and AutoML paths. Also fixed AutoML preprocessing to FitTransform on training only and Transform on validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update registry docs and use consistent shuffle policy for splits Updated PreprocessingRegistry docs to clearly warn about concurrent build safety issues. Applied same time-series non-shuffle policy to the final supervised training split that AutoML already uses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use post-preparation training data in no-preprocessing else branch When _dataPreparationPipeline runs (resampling/outlier removal) it modifies XTrain/yTrain. The else branch (no preprocessing) was incorrectly using preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing training-only preparation results for downstream logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make data leakage tests self-sufficient without PR 928 Refactor tests that called result.Predict() (which fails due to the feature selection bug fixed in PR #928) to verify preprocessing pipeline state directly via PreprocessingInfo.TransformFeatures() instead. Tests now verify the data leakage fix independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: preserve preprocessing pipeline state during serialization (#936) * fix: add serialization support to preprocessing pipeline fitted state - Add [JsonProperty] attributes to private fields across all preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes fitted state (means, std devs, min/max, etc.) - Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for proper deserialization of base class state - Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization - Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class that serializes reliably with Newtonsoft.Json (ValueTuples lose field names during serialization) - Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers - Add 16 integration tests verifying serialization round-trips for all modified transformers, pipelines, and inverse transforms Fixes #930 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 13 pr review comments - serialization safety, tests, perf - Remove null-forgiving operator from PipelineStep (use JsonConstructor) - Make PipelineStep.Transformer setter private (immutable after construction) - Fix ColumnIndices deserialization (add private setter for Json.NET) - Fix Steps property to return cached read-only list (no alloc per call) - Add explicit range assertions to MinMaxScaler custom range test - Add transform equivalence check to SimpleImputer test - Add LpNormScaler serialization round-trip test - Add Normalizer serialization round-trip test - Add LogMeanVarianceScaler serialization round-trip test - TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make pipelinestep immutable, validate constructor, add missing tests - Made PipelineStep.Name private set to prevent mutation after construction - PipelineStep constructor now validates name (non-whitespace) and transformer (non-null), failing fast on invalid serialized steps - Added ColumnIndices round-trip test proving non-default indices survive serialization and untouched columns remain unchanged - Added TargetPipeline round-trip tests proving both null and non-null target pipelines survive serialization with transform equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty] backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer, DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler, LogScaler) to prevent Newtonsoft emitting both field and property. Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and _finalTransformer — rely on serializer-level settings with SafeSerializationBinder instead, preventing unsafe TypeNameHandling bypass. Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove static PreprocessingRegistry — prevents concurrent build race conditions (#935) * fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931) Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline` assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline is already stored as an instance field on the builder and flows to AiModelResult via PreprocessingInfo — the global static registry was unnecessary and caused race conditions when multiple builders ran concurrently. Changes: - Remove 6 static registry assignments from AiModelBuilder.cs - Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class - Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with an instance-level PreprocessingTransformer property - Add 4 integration tests verifying concurrent builds don't cross-contaminate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - registry deprecation, thread safety, docs - Fix PreprocessingTransformer docs: clarify protected accessibility - Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase - Replace PostprocessingRegistry static usage with instance-based transformer - Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry) - Update PreprocessingRegistry.Current docs to reflect no longer auto-set - Fix concurrent test to use separate data loaders (thread-safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wire preprocessing transformer onto document models in buildasync Changed PreprocessingTransformer and PostprocessingTransformer from protected to protected internal so AiModelBuilder can set them. BuildSupervisedInternalAsync now wires the preprocessing pipeline onto DocumentNeuralNetworkBase models, replacing the static registry approach that caused race conditions during concurrent builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow transformer surface, fail fast on unfitted, wire postprocessing - Changed PreprocessingTransformer/PostprocessingTransformer back to protected and added internal ConfigureTransformers() method for AiModelBuilder wiring - PreprocessDocument/PostprocessOutput now throw if transformer is set but not fitted instead of silently falling back to defaults - Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase - Updated PreprocessingRegistry docs to reflect deprecated/removed status - Added [Collection] to registry tests and try/finally cleanup - Fixed "override" wording to "assign" in docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: centralize document transformer injection and strengthen test assertions Extract transformer wiring into ConfigureDocumentTransformers() helper method that always passes current pipeline values (clearing prior state on reuse). Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult(). Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo is non-null and fitted, not just Assert.NotNull(result). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests PostprocessOutput no longer throws on unfitted transformers since postprocessing transforms (Softmax, LabelDecoder) are often stateless. Updated XML docs to reflect instance-level transformer priority. Strengthen concurrent and sequential test assertions to verify PreprocessingInfo is non-null and fitted on each result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from video Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from neural network layers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: warn on empty attributions and use guarded locals in jwt attestation verifier Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: include gain layer in backward pass and parameter updates in deepfilternet Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rebuild streaming buffers after deserializing fft/hop size in neural noise reducer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: respect bootstrap flag and align estimator probabilities to ensemble label space Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from clustering modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: throw on invalid ensemble members and cold model in adaptive random forest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify trained-state error messages in ordinal regression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused alpha_s and use guarded local consistently in unipc scheduler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add guard for latent variance in compute elbo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist cached gradient lookup out of per-sample loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace console.writeline with debug.writeline for checkpoint reporting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache compressed momentum after resampling projection matrices Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: show total sharded memory independently of per-device memory in tostring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 21 pr review comments for code quality improvements - CLIPScore: throw on embedding/weight dimension mismatch instead of Math.Min - Adam8BitOptimizer: remove redundant _vScales null check - BFGSOptimizer: move _inverseHessian null check before UpdateInverseHessian call - FTRLOptimizer: use post-update accumulator (nPlusGradSq) in both paths - MomentumOptimizer: snapshot _velocity before serialization loop - OptimizerBase: use RequireModel() instead of Model! - SymbolicPhysicsLearner: Clone throws on null children like Evaluate/Format - IncrementalPCA: fix error message to "internal state inconsistent" - ContradictionDetector: wrap JToken.Value<bool>() in try-catch - CriticModel: wrap JToken.Value<double>() in try-catch - LogNormalAFT: hoist Coefficients null check outside loops (3 methods) - WeibullAFT: hoist Coefficients null check outside loops (3 methods) - NHiTSModel: remove redundant ContainsKey before TryGetValue - DiffusionTools: support both 1D and 2D audio_data deserialization - GMMDetector: validate feature count before scoring a point - TimeSeriesForestClassifier: reject NumTrees <= 0 during training Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use guarded sigmaTs local variable consistently in dpm solver methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply feature selection in Predict() — fixes critical dimension mismatch bug (#928) * fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug The NormalOptimizer (default) randomly selects a subset of features during training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never applied the same feature selection to new input data. This caused the model to receive all columns while it was trained on fewer, throwing "Number of columns must equal length of vector" errors. Changes: - Add SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult to store actual column indices - Propagate indices through optimizer flow: PrepareAndEvaluateSolution → UpdateBestSolution → CreateOptimizationResult - Apply feature selection in AiModelResult.Predict() after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() - Add 12 integration tests verifying the full BuildAsync → Predict pipeline Closes #927 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - feature selection, copy semantics, docs - Remove unused LooseTolerance constant - Fix off-by-one: use != instead of > for feature index count check - Apply feature selection in InferenceSession.Predict() (critical gap) - Add comprehensive docs to SelectedFeatureIndices property - Fix WithParameters to deep-copy collections (consistent with DeepCopy) - Add copy independence assertion to WithParameters test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support 1D tensor feature selection and add predict integration test SelectFeaturesTensor previously threw for 1D tensors. Now handles them by directly selecting elements at the specified indices. Added integration test proving Predict applies SelectedFeatureIndices correctly with a real FeedForwardNeuralNetwork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use identity-check instead of count heuristic for feature selection Replaced inputSize == featureIndices.Count heuristic with proper identity-mapping check using SequenceEqual. A reordered selection like [2, 0, 1] has the same count but still needs reordering. Also validates indices are in-range before calling SelectFeatures. Renamed misleading test names and added shape assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fix collinear test fixture and add permuted feature selection test - Fixed direct-model test to use non-collinear features with correct formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges - Added permutation test proving [2, 0, 1] selection is not treated as identity — verifies the SequenceEqual check works correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add null guards to OptimizationResult Clone and WithParameters Callers that create OptimizationResult via object initializer (bypassing the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds, FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null. Deep-copy calls in Clone/WithParameters would throw NullReferenceException. Now falls back to empty defaults when properties are null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions - InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction instead of using a count-based check that skipped permuted selections - Move bounds validation to one-time check (cached _featureIndicesValidated flag) instead of scanning on every Predict() call - Assert permuted predictions actually differ from identity predictions - Replace fragile selectedCount < 10 with valid range assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: assert feature selection actually occurred in regression test Add assertion verifying optimizer selected a feature subset before testing Predict() with full-width input, proving the test exercises the reduced-column code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache identity check with volatile int, set deterministic weights in test Replace per-call SequenceEqual + Enumerable.Range allocation with cached volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a simple loop instead of LINQ for zero-allocation identity check. Thread-safe via volatile read/write semantics. Set explicit non-uniform weights [1, 2, 3] in permuted feature selection test to ensure permutation deterministically produces different output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve first batch of pr #941 review comments (10 of 54) - InterpretabilityDashboard: filter empty attributions instead of using console output, use validCount for averaging - BaggingClassifier: validate sampleSize vs data size when bootstrap=false - Denclue: hoist _attractors and _attractorDensities guards out of loops - BIRCH: use Labels ?? Predict(x) instead of Labels! in FitPredict - SelfOrganizingMap: hoist _weights and _neuronLabels guards out of loops - CLIQUE: hoist Labels guard out of accumulation loop - SUBCLU: hoist _trainingData and Labels guards out of loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve second batch of pr #941 review comments (20 of 54) - GraphIsomorphismLayer: add local variable aliases for _lastMlpHiddenPreRelu in Forward and BackwardManual for clearer null safety - PagedAttentionKernel: add braces to ToFloat/FromFloat if blocks, replace null-forgiving with explicit boxing guards - KnowledgeDistillationTrainerBase: remove Console.WriteLine from library checkpoint loading code - DiffusionMemoryManager: separate PerDeviceMemory and TotalShardedMemory formatting into independent checks - MixtureOfExpertsLayer: add null guard before ApplyActivationToGraph instead of null-forgiving operator - KVCache: replace null-forgiving with explicit boxing guard for Half conversion - CachedMultiHeadAttention: hoist _cache null guard before Append call - CachedGroupedQueryAttention: hoist _cache null guard before Append call - DeformableConvolutionalLayer: stale comments (variables already renamed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve build errors from linter's automated null-forgiving removal Fix syntax and compilation errors introduced by the automated linter that removed null-forgiving operators without proper replacements: - Fix missing closing parentheses in foreach statements (COPKMeans, AgglomerativeClustering) - Fix duplicate variable declarations (AntColonyOptimization, CachedGroupedQueryAttention, CachedMultiHeadAttention) - Fix dropped underscore prefixes on private field references (KVCache, ConvLSTMLayer, PrincipalNeighbourhoodAggregationLayer) - Add validated accessor properties for nullable cache arrays (KVCache) - Fix variable ordering issues (FloraAdapter) - Fix nullable reference handling (VectorModel, UncertaintyCalibrationData, CrossValidationResult, AiModelResult.Uncertainty) - Fix Debug.WriteLine() call with no arguments (KnowledgeDistillationTrainerBase) - Add null guard for ClassLabels in PredictProbabilities (BaggingClassifier) - Fix nullable permutation array access (ConcatenateLayer) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist null guards for lambdas and alphaTs in DPMSolverMultistepScheduler Replace null-forgiving operators with proper null-coalescing throws for _lambdas and _alphaTs in FirstOrderUpdate, SecondOrderUpdate, and ThirdOrderUpdate methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove redundant null guard in LeafSent140FederatedDatasetLoader The testFilePath is already validated by IsNullOrWhiteSpace check in the enclosing if statement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: capture PreprocessingInfo in local variable to prevent TOCTOU Snapshot PreprocessingInfo once before accessing its members to avoid potential race conditions with concurrent access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: dispose pre-permutation tensor to prevent memory leak in ConcatenateLayer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: accumulate projector gradients from both views in BarlowTwins, SimCLR, and iBOT BarlowTwins: capture view1 projector grads before view2 backward clears them, then combine both views' grads for the projector update. SimCLR: replay projector forward per-view before backward to use correct caches, and accumulate projector gradients from both views. iBOT: backpropagate the full objective (clsLoss1 + clsLoss2 + lambda*(patchLoss1 + patchLoss2)) instead of only clsLoss1. Add ComputeMaskedPatchGradient for MSE gradient on masked patches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct backward pass in RIFE, FILM, and RAFT video models RIFE: use cached block input shape instead of previous block output shape for gradient splitting to correctly match the fused input width. FILM: fail fast on all required backward-pass caches instead of silently skipping fusion and flow estimator backprop when caches are null. RAFT: replace incorrect upsampleConv.Backward with bilinear UpsampleFlowBackward that mirrors the actual forward upsampling, and hoist invariant layer guards out of the refinement loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen DataLeakagePrevention tests with behavioral assertions ScalerStatisticsReflectTrainingDataOnly: add positive assertion that the pipeline transformation is not identity, confirming it was fitted on data. StillWorksCorrectly: exercise the model with actual predictions and assert outputs are finite numbers, not just that objects are non-null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address major PR review comments across multiple files KNNEvaluator: hoist _trainLabels null check outside the voting loop. JointMutualInformation: use local miScores instead of field on line 212. ContradictionDetector: treat malformed contradictory fields as false instead of falling through to text scan that matches JSON key names. BFGSOptimizer: guard inverse Hessian update with curvature condition (y*s > 0) and use validated locals instead of null-forgiving operators. IncrementalPCA: validate all fitted state upfront in PartialFit and remove null-forgiving on _explainedVariance. RealGatedLinearRecurrenceLayer: hoist all cached tensor validation to top of Backward to prevent partial gradient corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining major PR review comments PreprocessingRegistry: align documentation — clarify get/set is thread-safe but concurrent builds still overwrite each other's pipeline. InMemoryFederatedTrainer: fail fast when HE is enabled but no parameter indices are resolved instead of silently downgrading. Fix DP accounting: FedAsync uses 1/totalClients per iteration, FedBuff uses bufferSize/totalClients. AiModelBuilder: extract IsTimeSeriesTask() helper that checks both AutoML options and model type name for chronological split decisions. Reject row-changing data preparation in natural-client federated mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address trivial PR review comments across multiple files DeformableConvolutionalLayer: rename misleading variable names (mwShape/mbShape to maskWeights/maskBias, mw2/mwg/mbg to maskW/maskWeightGrad/maskBiasGrad). S4DLayer: hoist _lastHiddenStatesReal/Imag guards out of the BPTT loop. OnlineFeatureSelector: pass resolved running arrays into ComputeOnlineCorrelation instead of re-checking nullable fields inside the hot loop. SSLPretrainingPipeline: extract encoder to a local variable to avoid duplicate null-check expressions. SubpixelConvolutionalLayer: hoist ScalarActivation out of the per-element loop. BaggingClassifier: replace OrderBy(random.Next()) with Fisher-Yates partial shuffle for O(sampleSize) unbiased sampling without replacement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist gradient tensor validations in HedgehogLayer.FeatureMapBackward Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 12 pr review comments across multiple files - FloraAdapter: fix momentum reset bug (was zeroed every step) - KnowledgeDistillationTrainerBase: remove dead checkpoint branch - KVCache: hoist loop-invariant lookups outside inner loops - AiModelResult: capture stable model reference in explainer delegate - VectorModel: guard empty coefficients, remove dead BoxOrDefault - ConvLSTMLayer: use local lastInput instead of field reference - PrincipalNeighbourhoodAggregationLayer: hoist invariant tensors, remove null-forgiving operators in fallback path - BarlowTwins: remove redundant projector null-check - FILM: use local-cache pattern for cached features - DeepFilterNet: propagate gain layer backward gradient Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: net471 build errors in baggingclassifier and leafsent140loader - Replace range syntax pool[..sampleSize] with Array.Copy for net471 compatibility (RuntimeHelpers.GetSubArray not available) - Add explicit null check before string.IsNullOrWhiteSpace for net471 compatibility (missing NotNullWhen attribute) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use correct inner class name and remove null-forgiving in dashboard Replace InterpretabilityExplanation with Explanation (the actual inner class name) and replace null-forgiving operators with proper null guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address pr #941 review comments - KDTree: use single guarded local for _data null check in Compare - BallTree: use single guarded local for _data null check in Sort - DiffusionMemoryManager: remove duplicate unconditional total sharded line - InfluenceFunctionExplainer: hoist cachedGradients null check outside loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes #931 —
PreprocessingRegistry<T, TInput>was a static global singleton that stored the "current" preprocessing pipeline. When multipleAiModelBuilderinstances built models concurrently, they overwrote each other's pipeline, causing race conditions and cross-contamination.PreprocessingRegistry<T, TInput>.Current = _preprocessingPipelineassignments fromAiModelBuilder.ConfigurePreprocessing()(lines 391, 435, 480). The pipeline was already stored as an instance field and flows toAiModelResultviaPreprocessingInfo.PostprocessingRegistryassignments fromConfigurePostprocessing()(same pattern, same race condition).PreprocessingRegistry<T, TInput>with[Obsolete]attribute — not deleted, in case external code references it.PreprocessingRegistryusage inDocumentNeuralNetworkBase.PreprocessDocument()with a new instance-levelPreprocessingTransformerproperty (thread-safe, no global state).Test plan
PreprocessingRegistryIntegrationTests— 4 tests all passConcurrentBuilds_WithDifferentPreprocessing_DoNotCrossContaminateBuildAsync_WithPreprocessing_Succeeds_WithoutRegistryDeprecatedRegistry_StillFunctionsForBackwardCompatibilitySequentialBuilds_DoNotLeaveStaleRegistryStateLinearRegressionIntegrationTests— all 24 existing tests passdotnet build src/AiDotNet.csproj -f net10.0— succeedsdotnet build tests/AiDotNet.Tests/AiDotNetTests.csproj -f net10.0— succeeds🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Deprecations
Tests