Conversation
…sion mismatch bug The NormalOptimizer (default) randomly selects a subset of features during training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never applied the same feature selection to new input data. This caused the model to receive all columns while it was trained on fewer, throwing "Number of columns must equal length of vector" errors. Changes: - Add SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult to store actual column indices - Propagate indices through optimizer flow: PrepareAndEvaluateSolution → UpdateBestSolution → CreateOptimizationResult - Apply feature selection in AiModelResult.Predict() after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() - Add 12 integration tests verifying the full BuildAsync → Predict pipeline Closes #927 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughTracks and persists numeric selected-feature indices through optimizers and results, applies those indices at prediction time (including tensor/1D handling), updates helper/optimizer APIs to propagate indices, and adds extensive integration tests validating end-to-end feature-selection behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as "AiModelBuilder"
participant Optimizer as "OptimizerBase"
participant Helper as "OptimizerHelper"
participant Result as "OptimizationResult"
participant Predictor as "AiModelResult"
participant Model as "IFullModel"
Builder->>Optimizer: Optimize(trainingX, y)
Optimizer->>Optimizer: RandomlySelectFeatures(totalFeatures)
Optimizer->>Helper: SelectFeatures(X, selectedIndices)
Optimizer->>Model: Train(X_subset, y)
Optimizer->>Helper: CreateOptimizationResult(bestStepData, bestSelectedFeatureIndices)
Helper-->>Result: OptimizationResult (with SelectedFeatureIndices)
Predictor->>Predictor: PreprocessingInfo.TransformFeatures(newX)
Predictor->>Result: read SelectedFeatureIndices
alt SelectedFeatureIndices non-empty
Predictor->>Helper: SelectFeatures(normalizedX, SelectedFeatureIndices)
Helper-->>Model: Model.Predict(X_subset)
else
Predictor-->>Model: Model.Predict(normalizedX)
end
Model-->>Predictor: predictions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug where AiModelResult.Predict() did not apply the same feature selection used during training by the NormalOptimizer (and all OptimizerBase-derived optimizers), causing dimension mismatch errors ("Number of columns must equal length of vector") for all models built through AiModelBuilder. The fix introduces a new SelectedFeatureIndices (List<int>) property that stores the actual column indices selected during training and applies them to new input data in Predict().
Changes:
- Added
SelectedFeatureIndicesproperty toOptimizationStepData,OptimizationResult, andModelResultto carry feature selection indices through the optimizer pipeline - Propagated
SelectedFeatureIndicesthroughOptimizerBase.PrepareAndEvaluateSolution→UpdateBestSolution→CreateOptimizationResult, and updatedOptimizerHelper.CreateOptimizationResultto accept the indices - Applied feature selection in
AiModelResult.Predict()using the stored indices, plus 12 new integration tests covering the fullBuildAsync→Predictpipeline
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/Models/OptimizationStepData.cs |
Added SelectedFeatureIndices property with default initializer |
src/Models/Results/ModelResult.cs |
Added SelectedFeatureIndices property to the struct |
src/Models/Results/OptimizationResult.cs |
Added SelectedFeatureIndices property; updated constructor, DeepCopy(), and WithParameters() |
src/Helpers/OptimizerHelper.cs |
Added optional bestSelectedFeatureIndices parameter to CreateOptimizationResult |
src/Optimizers/OptimizerBase.cs |
Stores and propagates SelectedFeatureIndices in PrepareAndEvaluateSolution, UpdateBestSolution, and CreateOptimizationResult |
src/Models/Results/AiModelResult.cs |
Core fix: applies feature selection using SelectedFeatureIndices in Predict() after preprocessing but before Model.Predict() |
tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs |
12 new integration tests covering the bug fix and related scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 1833-1845: Add a single helper method
ApplySelectedFeaturesForPrediction(TInput input) in AiModelResult that checks
OptimizationResult?.SelectedFeatureIndices, uses
Helpers.InputHelper<T,TInput>.GetInputSize(input) and when needed calls
Helpers.OptimizerHelper<T,TInput,TOutput>.SelectFeatures(input, featureIndices)
and returns the (possibly) reduced input; then replace the inline
feature-selection block in AiModelResult.Predict() with a call to this new
method and update InferenceSequence.Predict() (the path that currently
preprocesses full input and calls _result.Model.Predict(...)) to call
AiModelResult.ApplySelectedFeaturesForPrediction(...) before invoking
_result.Model.Predict so both prediction paths reuse the same centralized logic.
In `@src/Models/Results/ModelResult.cs`:
- Around line 166-170: Add full XML documentation for the SelectedFeatureIndices
property in the ModelResult struct/class to match the rest of the file: expand
the <summary> to describe the property, add a <remarks> section explaining how
indices map to feature columns and when/why they are populated, and include a
"For Beginners" paragraph that gives a simple example of using
SelectedFeatureIndices (e.g., selecting columns 0,2,4 after optimization).
Ensure you update the XML tags immediately above the SelectedFeatureIndices
property so it follows the same documentation pattern as the other properties in
ModelResult.
In `@src/Models/Results/OptimizationResult.cs`:
- Line 512: WithParameters currently assigns SelectedFeatureIndices by reference
which is inconsistent with DeepCopy; update the WithParameters implementation in
OptimizationResult (method/class name: WithParameters on OptimizationResult) to
make a defensive copy of SelectedFeatureIndices (e.g., construct a new List<int>
from the existing SelectedFeatureIndices) so the new instance does not share the
same list reference and matches DeepCopy semantics.
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 23-24: Remove the unused constant LooseTolerance from the
AiModelBuilderPredictIntegrationTests class to eliminate dead code; locate the
field declaration "private const double LooseTolerance = 0.5;" and delete it (or
alternatively, if a looser numeric comparison is intended, replace usages of
Tolerance with LooseTolerance where appropriate in methods like
BuildAsync_PredictTwice_ReturnsSameResults), but prefer simply removing the
unused LooseTolerance constant.
- Around line 361-375: The test only checks value equality but not independence;
update the test
OptimizationResult_WithParameters_PreservesSelectedFeatureIndices to also verify
mutation isolation by mutating original.SelectedFeatureIndices after calling
WithParameters and asserting updated.SelectedFeatureIndices remains unchanged
(e.g., original.SelectedFeatureIndices.Add(99); Assert.Equal(new List<int>{1,3},
updated.SelectedFeatureIndices)); if you prefer to fix the implementation
instead, ensure OptimizationResult.WithParameters returns a new
OptimizationResult that clones SelectedFeatureIndices (copy the list) so the
returned instance does not share the same list reference.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 33f37fd1-2716-46a7-938d-6ff40840bb46
📒 Files selected for processing (7)
src/Helpers/OptimizerHelper.cssrc/Models/OptimizationStepData.cssrc/Models/Results/AiModelResult.cssrc/Models/Results/ModelResult.cssrc/Models/Results/OptimizationResult.cssrc/Optimizers/OptimizerBase.cstests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
…, docs - Remove unused LooseTolerance constant - Fix off-by-one: use != instead of > for feature index count check - Apply feature selection in InferenceSession.Predict() (critical gap) - Add comprehensive docs to SelectedFeatureIndices property - Fix WithParameters to deep-copy collections (consistent with DeepCopy) - Add copy independence assertion to WithParameters test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/Models/Results/AiModelResult.cs (1)
2197-2207: 🛠️ Refactor suggestion | 🟠 MajorExtract this into the shared prediction helper.
This is the same feature-selection block as in
AiModelResult.Predict(). Keep one private helper for both paths so the selection predicate and validation cannot drift again.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Models/Results/AiModelResult.cs` around lines 2197 - 2207, Extract the duplicated feature-selection block into a single private helper method and call it from both AiModelResult.Predict() and the other path using the same logic: read _result.OptimizationResult?.SelectedFeatureIndices, call Helpers.InputHelper<T, TInput>.GetInputSize(normalizedNewData) for validation, and call Helpers.OptimizerHelper<T, TInput, TOutput>.SelectFeatures(normalizedNewData, featureIndices) when sizes differ; preserve the same null/Count checks and return (or set) the normalizedNewData from the helper so both callers use the identical selection/validation logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 1836-1844: Replace the current heuristic that skips feature
selection based on counts with a proper identity-check: create a helper (e.g.,
ApplySelectedFeaturesForPrediction) that reads
OptimizationResult.SelectedFeatureIndices, returns input immediately if
null/empty, computes inputSize via Helpers.InputHelper<T,
TInput>.GetInputSize(input), then determines identity mapping by checking
featureIndices.Count == inputSize AND
featureIndices.SequenceEqual(Enumerable.Range(0, inputSize)); only skip
SelectFeatures when that identity check passes; otherwise validate indices are
in-range and call Helpers.OptimizerHelper<T, TInput,
TOutput>.SelectFeatures(input, featureIndices). Apply the same helper in the
session prediction path so both branches use identical logic.
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 234-237: The test FeatureSelectionReducesColumns is misleading
because it only checks selectedCount > 0 and <= 10 but not that reduction
occurred; update the assertions in AiModelBuilderPredictIntegrationTests.cs to
match intent: if you want to assert actual column reduction change the second
assertion to Assert.True(optimizationResult.SelectedFeatureIndices.Count < 10,
"Should select fewer features than the original 10"), otherwise rename the test
(and update the inline comment) to something like
FeatureSelectionHandlesSelectionGracefully and adjust the comment to say the
optimizer may select any subset (including all) and we only assert non-empty
selection via selectedCount/optimizationResult.SelectedFeatureIndices.Count.
---
Duplicate comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 2197-2207: Extract the duplicated feature-selection block into a
single private helper method and call it from both AiModelResult.Predict() and
the other path using the same logic: read
_result.OptimizationResult?.SelectedFeatureIndices, call Helpers.InputHelper<T,
TInput>.GetInputSize(normalizedNewData) for validation, and call
Helpers.OptimizerHelper<T, TInput, TOutput>.SelectFeatures(normalizedNewData,
featureIndices) when sizes differ; preserve the same null/Count checks and
return (or set) the normalizedNewData from the helper so both callers use the
identical selection/validation logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: db2dd249-403e-49db-9b42-e06f49bc8a89
📒 Files selected for processing (4)
src/Models/Results/AiModelResult.cssrc/Models/Results/ModelResult.cssrc/Models/Results/OptimizationResult.cstests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
…test SelectFeaturesTensor previously threw for 1D tensors. Now handles them by directly selecting elements at the specified indices. Added integration test proving Predict applies SelectedFeatureIndices correctly with a real FeedForwardNeuralNetwork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 386-442: The test method
InferenceSession_Predict_AppliesFeatureSelection is misnamed and has a weak
assertion; rename it to AiModelResult_Predict_AppliesTensorFeatureSelection to
reflect that it constructs AiModelResult and calls AiModelResult.Predict, and
replace or augment the Assert.NotNull(prediction) with a stronger assertion that
verifies the prediction shape/size (e.g., assert prediction length or output
dimension equals outputSize) using the existing outputSize variable so the test
ensures feature selection produced an output of the expected shape.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3339c2b3-a58b-415b-8b33-8753b1ce71b8
📒 Files selected for processing (2)
src/Helpers/OptimizerHelper.cstests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
Replaced inputSize == featureIndices.Count heuristic with proper identity-mapping check using SequenceEqual. A reordered selection like [2, 0, 1] has the same count but still needs reordering. Also validates indices are in-range before calling SelectFeatures. Renamed misleading test names and added shape assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/Models/Results/AiModelResult.cs (1)
2222-2232:⚠️ Potential issue | 🔴 CriticalBlocking: session inference still bypasses the new selection helper.
InferenceSequence.Predict()still uses the oldinputSize != featureIndices.Countshortcut. Line 2227 will skip valid reorderings like[2, 0, 1], and this path never gets the bounds validation now implemented inApplySelectedFeaturesForPrediction(...). That leavesBeginInferenceSession().CreateSequence().Predict(...)behavior inconsistent withAiModelResult.Predict(...)and can return incorrect predictions through a public inference path.Suggested fix
- // Apply feature selection if the optimizer selected a subset of features during training. - var featureIndices = _result.OptimizationResult?.SelectedFeatureIndices; - if (featureIndices != null && featureIndices.Count > 0) - { - int inputSize = Helpers.InputHelper<T, TInput>.GetInputSize(normalizedNewData); - if (inputSize != featureIndices.Count) - { - normalizedNewData = Helpers.OptimizerHelper<T, TInput, TOutput> - .SelectFeatures(normalizedNewData, featureIndices); - } - } + normalizedNewData = _result.ApplySelectedFeaturesForPrediction(normalizedNewData);Based on learnings and coding guidelines, "Every PR must contain production-ready code" and "Incomplete features: Half-implemented patterns where some code paths work but others silently do nothing" are blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Models/Results/AiModelResult.cs` around lines 2222 - 2232, InferenceSequence.Predict() is still using the old fast-path check "inputSize != featureIndices.Count" which skips valid reorderings (e.g. [2,0,1]) and bypasses the bounds validation in ApplySelectedFeaturesForPrediction(...); update InferenceSequence.Predict() (the method invoked via BeginInferenceSession().CreateSequence().Predict(...)) to call the same ApplySelectedFeaturesForPrediction(normalizedNewData, featureIndices) helper (or copy its exact validation + selection logic) instead of relying on inputSize comparison so reorderings are applied and SelectedFeatureIndices bounds are validated consistently with AiModelResult.Predict().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 387-444: The test
AiModelResult_Predict_AppliesTensorFeatureSelection only checks a
length-reducing ascending subset; add a deterministic permutation case using
SelectedFeatureIndices = new List<int> {2,0,3,1} (or similar full-length
reordering) inside the same test (or a new test) to feed a known fullInput
Tensor<float> and assert that result.Predict(...) returns values in the
reordered order (check actual prediction values, not just shape) to ensure
AiModelResult.Predict applies reordering (referencing SelectedFeatureIndices,
AiModelResult.Predict, OptimizationResult and Tensor<float>/prediction.Shape to
locate where to change).
- Around line 312-334: The test
BuildAsync_DirectModelPredict_WithoutBuilder_StillWorks uses a collinear fixture
(second column = 2*first) and an incorrect formula comment (says y = x1 + 2*x2 +
1 while the data corresponds to a different mapping), and the assertions use
overly wide ranges; fix by replacing the input matrix created via CreateMatrix
so columns are non-collinear (e.g., choose independent values for the second
feature) or explicitly construct x so the intended mapping y = x1 + 2*x2 + 1
holds, then compute the exact expected predictions for newX and assert those
values with a tight tolerance (use Assert.Equal with a delta or narrow
Assert.InRange) against model.Predict; update the comment to match the fixture
and keep references to RidgeRegression<double>,
BuildAsync_DirectModelPredict_WithoutBuilder_StillWorks, CreateMatrix,
CreateVector, and Predict so reviewers can find the changes.
---
Duplicate comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 2222-2232: InferenceSequence.Predict() is still using the old
fast-path check "inputSize != featureIndices.Count" which skips valid
reorderings (e.g. [2,0,1]) and bypasses the bounds validation in
ApplySelectedFeaturesForPrediction(...); update InferenceSequence.Predict() (the
method invoked via BeginInferenceSession().CreateSequence().Predict(...)) to
call the same ApplySelectedFeaturesForPrediction(normalizedNewData,
featureIndices) helper (or copy its exact validation + selection logic) instead
of relying on inputSize comparison so reorderings are applied and
SelectedFeatureIndices bounds are validated consistently with
AiModelResult.Predict().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 181238a3-54fb-4a33-80ea-6b541a37df36
📒 Files selected for processing (2)
src/Models/Results/AiModelResult.cstests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/Models/Results/AiModelResult.cs:1844
- The guard
if (inputSize != featureIndices.Count)skips feature selection when the input already has the same number of features as selected indices. This is intended to handle the case where the user has already pre-selected features, but it creates a silent correctness risk: if the user passes a matrix with the same number of columns as selected features but those columns are not the correct selected features (e.g., the original training data had 5 features and 2 were selected, and the user happens to pass a 2-column matrix of different features), no feature selection is applied and the model receives the wrong columns without error. While this is a design tradeoff, there's no documentation of this assumption and no validation that the input size equals the selected count for valid reasons. Consider either documenting this behavior clearly or storing the original full feature count alongsideSelectedFeatureIndicesso that you can distinguish "pre-selected" vs. "wrong-size" inputs.
// INFERENCE OPTIMIZATION PATH: apply configured inference optimizations for neural network models
if (InferenceOptimizationConfig != null &&
Model is NeuralNetworkBase<T> neuralModel &&
normalizedNewData is Tensor<T> inputTensor)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fixed direct-model test to use non-collinear features with correct formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges - Added permutation test proving [2, 0, 1] selection is not treated as identity — verifies the SequenceEqual check works correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Callers that create OptimizationResult via object initializer (bypassing the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds, FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null. Deep-copy calls in Clone/WithParameters would throw NullReferenceException. Now falls back to empty defaults when properties are null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Models/Results/OptimizationResult.cs (1)
477-494:⚠️ Potential issue | 🟠 Major
DeepCopy()andWithParameters()still return inconsistent nested state.
TrainingResult,ValidationResult,TestResult, andFitDetectionResultare still reused by reference. That breaks isolation inDeepCopy(), andWithParameters()also leaves predictions/statistics from the old model attached to a differentBestSolution. Deep-clone those nested results inDeepCopy(), and recompute or clear them inWithParameters()when the parameters change.Also applies to: 502-519
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Models/Results/OptimizationResult.cs` around lines 477 - 494, DeepCopy() currently shallow-copies nested result objects (TrainingResult, ValidationResult, TestResult, FitDetectionResult), causing shared mutable state; update OptimizationResult<T, TInput, TOutput>.DeepCopy() to deep-clone those nested result objects (call their DeepCopy/Clone methods or construct new instances copying all fields) instead of reusing references, and ensure BestSolution is deep-copied before attaching any nested results. Likewise, in WithParameters(...) detect when parameters change and either recompute dependent predictions/statistics for TrainingResult/ValidationResult/TestResult/FitDetectionResult based on the new BestSolution or explicitly clear/reset those fields so stale predictions aren’t carried to a different solution; update the WithParameters method to create fresh result objects or null/empty them when parameters change.
♻️ Duplicate comments (2)
tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs (2)
508-523:⚠️ Potential issue | 🔴 CriticalThis permutation test still always passes on the broken branch.
predictionsMatchis computed and then ignored, so the only real assertion here is the output shape. A regression that skips reordering entirely still passes. Use a manual oracle instead: permute the input tensor yourself, run it through the identity-path model, and assert that output matchesresult.Predict(input).🔧 Assert the actual permutation result
- var identityPrediction = identityResult.Predict(input); + var manuallyPermutedInput = new Tensor<float>(new[] { featureCount }); + manuallyPermutedInput[0] = 30.0f; + manuallyPermutedInput[1] = 10.0f; + manuallyPermutedInput[2] = 20.0f; + var expectedPrediction = identityResult.Predict(manuallyPermutedInput); @@ - bool predictionsMatch = Math.Abs(permutedPrediction[0] - identityPrediction[0]) < 1e-6f; - // Note: predictions *might* match if all weights happen to be equal (unlikely with random init), - // but in practice DenseLayer initializes weights differently per position. - // The key thing is the test doesn't throw — proving permutation is applied, not skipped. Assert.Equal(outputSize, permutedPrediction.Shape[0]); + Assert.Equal(outputSize, expectedPrediction.Shape[0]); + Assert.Equal(expectedPrediction[0], permutedPrediction[0], 1e-6f);As per coding guidelines,
Tests MUST be production-qualityandGood tests should assert specific expected values, not just non-null.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs` around lines 508 - 523, The test currently computes predictionsMatch but never uses it, so it only asserts shape and misses verifying permutation; fix by explicitly applying the same feature permutation to the input tensor and using that permuted input as the oracle for the identity-path model: create permutedInput by reordering the original input according to the permutation used by the model (the same mapping that should have been applied in result), then call var oraclePrediction = identityResult.Predict(permutedInput) and assert that oraclePrediction equals permutedPrediction (element-wise equality within an epsilon) and that shapes match; use the existing variables result.Predict, identityResult.Predict, permutedPrediction and input to locate where to insert this check.
331-337:⚠️ Potential issue | 🔴 CriticalThese bounds are still too wide for a fixed regression fixture.
The dataset encodes a concrete mapping, but
39..43and44..48still allow materially wrong outputs to pass. Either switch this direct-path regression to a non-regularized model and assert41/46tightly, or keepRidgeRegression<double>and narrow the tolerance to a bound that is justified by its default regularization.🔧 One deterministic way to tighten this
- var model = new RidgeRegression<double>(); + var model = new MultipleRegression<double>(); @@ - Assert.InRange(predictions[0], 39, 43); - Assert.InRange(predictions[1], 44, 48); + Assert.Equal(41.0, predictions[0], 1e-6); + Assert.Equal(46.0, predictions[1], 1e-6);As per coding guidelines,
Good tests should assert specific expected values, not just non-null.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs` around lines 331 - 337, The test currently allows overly wide tolerances for a deterministic regression; in AiModelBuilderPredictIntegrationTests update the model or the assertions so the expected outputs are tight: either switch the model creation from RidgeRegression<double> to a non-regularized estimator (e.g., use OrdinaryLeastSquares / the equivalent non-regularized builder) so the mapping produces exact values 41 and 46, then replace the loose Assert.InRange checks with exact equality assertions for predictions[0] == 41 and predictions[1] == 46 (or use Assert.Equal with an appropriate minimal precision), or keep RidgeRegression<double> but reduce the ranges to a narrowly-justified tolerance based on its default regularization (e.g., ±1 or use Assert.Equal with a small delta); update the assertions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 28-56: The test never guarantees the optimizer actually reduced
features; after BuildAsync() assert that the trained model selected a strict
subset by checking SelectedFeatureIndices.Count < 5 (i.e. add
Assert.True(result.SelectedFeatureIndices.Count < 5) or the equivalent accessor
on the built model), or instead configure the optimizer (NormalOptimizer)
deterministically to force feature selection; ensure this unconditional
assertion runs before calling result.Predict(newData).
---
Outside diff comments:
In `@src/Models/Results/OptimizationResult.cs`:
- Around line 477-494: DeepCopy() currently shallow-copies nested result objects
(TrainingResult, ValidationResult, TestResult, FitDetectionResult), causing
shared mutable state; update OptimizationResult<T, TInput, TOutput>.DeepCopy()
to deep-clone those nested result objects (call their DeepCopy/Clone methods or
construct new instances copying all fields) instead of reusing references, and
ensure BestSolution is deep-copied before attaching any nested results.
Likewise, in WithParameters(...) detect when parameters change and either
recompute dependent predictions/statistics for
TrainingResult/ValidationResult/TestResult/FitDetectionResult based on the new
BestSolution or explicitly clear/reset those fields so stale predictions aren’t
carried to a different solution; update the WithParameters method to create
fresh result objects or null/empty them when parameters change.
---
Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 508-523: The test currently computes predictionsMatch but never
uses it, so it only asserts shape and misses verifying permutation; fix by
explicitly applying the same feature permutation to the input tensor and using
that permuted input as the oracle for the identity-path model: create
permutedInput by reordering the original input according to the permutation used
by the model (the same mapping that should have been applied in result), then
call var oraclePrediction = identityResult.Predict(permutedInput) and assert
that oraclePrediction equals permutedPrediction (element-wise equality within an
epsilon) and that shapes match; use the existing variables result.Predict,
identityResult.Predict, permutedPrediction and input to locate where to insert
this check.
- Around line 331-337: The test currently allows overly wide tolerances for a
deterministic regression; in AiModelBuilderPredictIntegrationTests update the
model or the assertions so the expected outputs are tight: either switch the
model creation from RidgeRegression<double> to a non-regularized estimator
(e.g., use OrdinaryLeastSquares / the equivalent non-regularized builder) so the
mapping produces exact values 41 and 46, then replace the loose Assert.InRange
checks with exact equality assertions for predictions[0] == 41 and
predictions[1] == 46 (or use Assert.Equal with an appropriate minimal
precision), or keep RidgeRegression<double> but reduce the ranges to a
narrowly-justified tolerance based on its default regularization (e.g., ±1 or
use Assert.Equal with a small delta); update the assertions accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1a135915-d41d-4cc6-bfd5-e56f1b28ceec
📒 Files selected for processing (2)
src/Models/Results/OptimizationResult.cstests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
…eck, robust assertions - InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction instead of using a count-based check that skipped permuted selections - Move bounds validation to one-time check (cached _featureIndicesValidated flag) instead of scanning on every Predict() call - Assert permuted predictions actually differ from identity predictions - Replace fragile selectedCount < 10 with valid range assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add assertion verifying optimizer selected a feature subset before testing Predict() with full-width input, proving the test exercises the reduced-column code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…s in test Replace per-call SequenceEqual + Enumerable.Range allocation with cached volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a simple loop instead of LINQ for zero-allocation identity check. Thread-safe via volatile read/write semantics. Set explicit non-uniform weights [1, 2, 3] in permuted feature selection test to ensure permutation deterministically produces different output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor tests that called result.Predict() (which fails due to the feature selection bug fixed in PR #928) to verify preprocessing pipeline state directly via PreprocessingInfo.TransformFeatures() instead. Tests now verify the data leakage fix independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mismatch bug (#928) * fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug The NormalOptimizer (default) randomly selects a subset of features during training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never applied the same feature selection to new input data. This caused the model to receive all columns while it was trained on fewer, throwing "Number of columns must equal length of vector" errors. Changes: - Add SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult to store actual column indices - Propagate indices through optimizer flow: PrepareAndEvaluateSolution → UpdateBestSolution → CreateOptimizationResult - Apply feature selection in AiModelResult.Predict() after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() - Add 12 integration tests verifying the full BuildAsync → Predict pipeline Closes #927 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - feature selection, copy semantics, docs - Remove unused LooseTolerance constant - Fix off-by-one: use != instead of > for feature index count check - Apply feature selection in InferenceSession.Predict() (critical gap) - Add comprehensive docs to SelectedFeatureIndices property - Fix WithParameters to deep-copy collections (consistent with DeepCopy) - Add copy independence assertion to WithParameters test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support 1D tensor feature selection and add predict integration test SelectFeaturesTensor previously threw for 1D tensors. Now handles them by directly selecting elements at the specified indices. Added integration test proving Predict applies SelectedFeatureIndices correctly with a real FeedForwardNeuralNetwork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use identity-check instead of count heuristic for feature selection Replaced inputSize == featureIndices.Count heuristic with proper identity-mapping check using SequenceEqual. A reordered selection like [2, 0, 1] has the same count but still needs reordering. Also validates indices are in-range before calling SelectFeatures. Renamed misleading test names and added shape assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fix collinear test fixture and add permuted feature selection test - Fixed direct-model test to use non-collinear features with correct formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges - Added permutation test proving [2, 0, 1] selection is not treated as identity — verifies the SequenceEqual check works correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add null guards to OptimizationResult Clone and WithParameters Callers that create OptimizationResult via object initializer (bypassing the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds, FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null. Deep-copy calls in Clone/WithParameters would throw NullReferenceException. Now falls back to empty defaults when properties are null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions - InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction instead of using a count-based check that skipped permuted selections - Move bounds validation to one-time check (cached _featureIndicesValidated flag) instead of scanning on every Predict() call - Assert permuted predictions actually differ from identity predictions - Replace fragile selectedCount < 10 with valid range assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: assert feature selection actually occurred in regression test Add assertion verifying optimizer selected a feature subset before testing Predict() with full-width input, proving the test exercises the reduced-column code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache identity check with volatile int, set deterministic weights in test Replace per-call SequenceEqual + Enumerable.Range allocation with cached volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a simple loop instead of LINQ for zero-allocation identity check. Thread-safe via volatile read/write semantics. Set explicit non-uniform weights [1, 2, 3] in permuted feature selection test to ensure permutation deterministically produces different output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…934) * fix: prevent data leakage by fitting preprocessing only on training data In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g., StandardScaler) was fitted on the ENTIRE dataset before the train/test split. This leaked test/validation statistics (mean, std dev) into the training pipeline, causing artificially inflated metrics. The fix restructures the method: 1. Split data into train/val/test FIRST using DataSplitter.Split() 2. FitTransform preprocessing pipeline on training data ONLY 3. Transform (not FitTransform) validation and test data The federated learning path is unchanged — it correctly uses all data as training data, so FitTransform on everything remains correct. Fixes #929 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync - Scope #pragma warning disable CS8600/CS8604 with matching restore - Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads - Update PreprocessingRegistry docs to reflect thread-safety and usage pattern - Rewrite tests to actually detect leakage: verify scaler produces different output than all-data scaler, verify transform shape and non-identity - Add end-to-end predict test with preprocessing - Add test verifying no-preprocessing path still works with predict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move fitresample after train/test split to prevent data leakage FitResample (SMOTE, outlier removal, augmentation) was applied to the full dataset before splitting, allowing synthetic samples derived from test/validation data to leak into training. Now FitResample runs after the split on training data only in both standard and AutoML paths. Also fixed AutoML preprocessing to FitTransform on training only and Transform on validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update registry docs and use consistent shuffle policy for splits Updated PreprocessingRegistry docs to clearly warn about concurrent build safety issues. Applied same time-series non-shuffle policy to the final supervised training split that AutoML already uses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use post-preparation training data in no-preprocessing else branch When _dataPreparationPipeline runs (resampling/outlier removal) it modifies XTrain/yTrain. The else branch (no preprocessing) was incorrectly using preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing training-only preparation results for downstream logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make data leakage tests self-sufficient without PR 928 Refactor tests that called result.Predict() (which fails due to the feature selection bug fixed in PR #928) to verify preprocessing pipeline state directly via PreprocessingInfo.TransformFeatures() instead. Tests now verify the data leakage fix independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make data leakage test deterministic with widely-spaced unique values Replace clustered bimodal data with linearly-spaced unique values (0, 10, 20, ..., 990) to guarantee any 70/30 shuffle split produces significantly different scaler statistics from the full dataset, eliminating flaky test failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from default!, null!, and remaining field! patterns Replaces default! with proper initialization (NumOps.Zero, throw, or nullable types), null! with nullable parameters or removed dead code, and field! with extracted local variables with null-coalescing throw expressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address pr review comments for aimodelbuilder and totem - Fix exception text to reference ConfigureModel() instead of WithModel() - Remove nullable T? from TOTEM._lastCommitmentLoss field, initialize with NumOps.Zero in both constructors to avoid ambiguous nullability - Fix incorrect comment about Matrix<T>/Vector<T> being value types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use numops.equals instead of equalitycomparer for net471 compat Replace EqualityComparer<T>.Default.Equals with _numOps.Equals to fix CS8604 null reference error on net471 and follow codebase patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow pragma scope and snapshot transform input in tests - Narrow CS8600/CS8604 pragma scopes to specific declaration and call sites instead of spanning 800+ lines - Clarify StatisticsHelper comment about default(T) vs NumOps.Zero - Snapshot original values before Transform in test to prevent in-place mutation from invalidating the comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ining directories (#933) (#941) * fix: prevent data leakage by fitting preprocessing only on training data In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g., StandardScaler) was fitted on the ENTIRE dataset before the train/test split. This leaked test/validation statistics (mean, std dev) into the training pipeline, causing artificially inflated metrics. The fix restructures the method: 1. Split data into train/val/test FIRST using DataSplitter.Split() 2. FitTransform preprocessing pipeline on training data ONLY 3. Transform (not FitTransform) validation and test data The federated learning path is unchanged — it correctly uses all data as training data, so FitTransform on everything remains correct. Fixes #929 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync - Scope #pragma warning disable CS8600/CS8604 with matching restore - Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads - Update PreprocessingRegistry docs to reflect thread-safety and usage pattern - Rewrite tests to actually detect leakage: verify scaler produces different output than all-data scaler, verify transform shape and non-identity - Add end-to-end predict test with preprocessing - Add test verifying no-preprocessing path still works with predict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move fitresample after train/test split to prevent data leakage FitResample (SMOTE, outlier removal, augmentation) was applied to the full dataset before splitting, allowing synthetic samples derived from test/validation data to leak into training. Now FitResample runs after the split on training data only in both standard and AutoML paths. Also fixed AutoML preprocessing to FitTransform on training only and Transform on validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update registry docs and use consistent shuffle policy for splits Updated PreprocessingRegistry docs to clearly warn about concurrent build safety issues. Applied same time-series non-shuffle policy to the final supervised training split that AutoML already uses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use post-preparation training data in no-preprocessing else branch When _dataPreparationPipeline runs (resampling/outlier removal) it modifies XTrain/yTrain. The else branch (no preprocessing) was incorrectly using preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing training-only preparation results for downstream logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make data leakage tests self-sufficient without PR 928 Refactor tests that called result.Predict() (which fails due to the feature selection bug fixed in PR #928) to verify preprocessing pipeline state directly via PreprocessingInfo.TransformFeatures() instead. Tests now verify the data leakage fix independently. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: preserve preprocessing pipeline state during serialization (#936) * fix: add serialization support to preprocessing pipeline fitted state - Add [JsonProperty] attributes to private fields across all preprocessing transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler, DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler, LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes fitted state (means, std devs, min/max, etc.) - Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for proper deserialization of base class state - Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve ambiguous constructor selection during deserialization - Replace ValueTuple in PreprocessingPipeline._steps with a new PipelineStep<T, TInput> class that serializes reliably with Newtonsoft.Json (ValueTuples lose field names during serialization) - Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to PreprocessingInfo.Pipeline and TargetPipeline for polymorphic deserialization of pipeline step transformers - Add 16 integration tests verifying serialization round-trips for all modified transformers, pipelines, and inverse transforms Fixes #930 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 13 pr review comments - serialization safety, tests, perf - Remove null-forgiving operator from PipelineStep (use JsonConstructor) - Make PipelineStep.Transformer setter private (immutable after construction) - Fix ColumnIndices deserialization (add private setter for Json.NET) - Fix Steps property to return cached read-only list (no alloc per call) - Add explicit range assertions to MinMaxScaler custom range test - Add transform equivalence check to SimpleImputer test - Add LpNormScaler serialization round-trip test - Add Normalizer serialization round-trip test - Add LogMeanVarianceScaler serialization round-trip test - TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make pipelinestep immutable, validate constructor, add missing tests - Made PipelineStep.Name private set to prevent mutation after construction - PipelineStep constructor now validates name (non-whitespace) and transformer (non-null), failing fast on invalid serialized steps - Added ColumnIndices round-trip test proving non-default indices survive serialization and untouched columns remain unchanged - Added TargetPipeline round-trip tests proving both null and non-null target pipelines survive serialization with transform equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty] backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer, DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler, LogScaler) to prevent Newtonsoft emitting both field and property. Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and _finalTransformer — rely on serializer-level settings with SafeSerializationBinder instead, preventing unsafe TypeNameHandling bypass. Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove static PreprocessingRegistry — prevents concurrent build race conditions (#935) * fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931) Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline` assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline is already stored as an instance field on the builder and flows to AiModelResult via PreprocessingInfo — the global static registry was unnecessary and caused race conditions when multiple builders ran concurrently. Changes: - Remove 6 static registry assignments from AiModelBuilder.cs - Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class - Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with an instance-level PreprocessingTransformer property - Add 4 integration tests verifying concurrent builds don't cross-contaminate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - registry deprecation, thread safety, docs - Fix PreprocessingTransformer docs: clarify protected accessibility - Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase - Replace PostprocessingRegistry static usage with instance-based transformer - Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry) - Update PreprocessingRegistry.Current docs to reflect no longer auto-set - Fix concurrent test to use separate data loaders (thread-safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wire preprocessing transformer onto document models in buildasync Changed PreprocessingTransformer and PostprocessingTransformer from protected to protected internal so AiModelBuilder can set them. BuildSupervisedInternalAsync now wires the preprocessing pipeline onto DocumentNeuralNetworkBase models, replacing the static registry approach that caused race conditions during concurrent builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow transformer surface, fail fast on unfitted, wire postprocessing - Changed PreprocessingTransformer/PostprocessingTransformer back to protected and added internal ConfigureTransformers() method for AiModelBuilder wiring - PreprocessDocument/PostprocessOutput now throw if transformer is set but not fitted instead of silently falling back to defaults - Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase - Updated PreprocessingRegistry docs to reflect deprecated/removed status - Added [Collection] to registry tests and try/finally cleanup - Fixed "override" wording to "assign" in docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: centralize document transformer injection and strengthen test assertions Extract transformer wiring into ConfigureDocumentTransformers() helper method that always passes current pipeline values (clearing prior state on reuse). Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult(). Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo is non-null and fitted, not just Assert.NotNull(result). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests PostprocessOutput no longer throws on unfitted transformers since postprocessing transforms (Softmax, LabelDecoder) are often stateless. Updated XML docs to reflect instance-level transformer priority. Strengthen concurrent and sequential test assertions to verify PreprocessingInfo is non-null and fitted on each result. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from video Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from neural network layers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: warn on empty attributions and use guarded locals in jwt attestation verifier Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: include gain layer in backward pass and parameter updates in deepfilternet Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rebuild streaming buffers after deserializing fft/hop size in neural noise reducer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from federated learning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: respect bootstrap flag and align estimator probabilities to ensemble label space Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove null-forgiving operators from clustering modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: throw on invalid ensemble members and cold model in adaptive random forest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify trained-state error messages in ordinal regression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused alpha_s and use guarded local consistently in unipc scheduler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add guard for latent variance in compute elbo Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist cached gradient lookup out of per-sample loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace console.writeline with debug.writeline for checkpoint reporting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache compressed momentum after resampling projection matrices Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: show total sharded memory independently of per-device memory in tostring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 21 pr review comments for code quality improvements - CLIPScore: throw on embedding/weight dimension mismatch instead of Math.Min - Adam8BitOptimizer: remove redundant _vScales null check - BFGSOptimizer: move _inverseHessian null check before UpdateInverseHessian call - FTRLOptimizer: use post-update accumulator (nPlusGradSq) in both paths - MomentumOptimizer: snapshot _velocity before serialization loop - OptimizerBase: use RequireModel() instead of Model! - SymbolicPhysicsLearner: Clone throws on null children like Evaluate/Format - IncrementalPCA: fix error message to "internal state inconsistent" - ContradictionDetector: wrap JToken.Value<bool>() in try-catch - CriticModel: wrap JToken.Value<double>() in try-catch - LogNormalAFT: hoist Coefficients null check outside loops (3 methods) - WeibullAFT: hoist Coefficients null check outside loops (3 methods) - NHiTSModel: remove redundant ContainsKey before TryGetValue - DiffusionTools: support both 1D and 2D audio_data deserialization - GMMDetector: validate feature count before scoring a point - TimeSeriesForestClassifier: reject NumTrees <= 0 during training Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use guarded sigmaTs local variable consistently in dpm solver methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply feature selection in Predict() — fixes critical dimension mismatch bug (#928) * fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug The NormalOptimizer (default) randomly selects a subset of features during training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never applied the same feature selection to new input data. This caused the model to receive all columns while it was trained on fewer, throwing "Number of columns must equal length of vector" errors. Changes: - Add SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult to store actual column indices - Propagate indices through optimizer flow: PrepareAndEvaluateSolution → UpdateBestSolution → CreateOptimizationResult - Apply feature selection in AiModelResult.Predict() after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() - Add 12 integration tests verifying the full BuildAsync → Predict pipeline Closes #927 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 7 pr review comments - feature selection, copy semantics, docs - Remove unused LooseTolerance constant - Fix off-by-one: use != instead of > for feature index count check - Apply feature selection in InferenceSession.Predict() (critical gap) - Add comprehensive docs to SelectedFeatureIndices property - Fix WithParameters to deep-copy collections (consistent with DeepCopy) - Add copy independence assertion to WithParameters test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support 1D tensor feature selection and add predict integration test SelectFeaturesTensor previously threw for 1D tensors. Now handles them by directly selecting elements at the specified indices. Added integration test proving Predict applies SelectedFeatureIndices correctly with a real FeedForwardNeuralNetwork. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use identity-check instead of count heuristic for feature selection Replaced inputSize == featureIndices.Count heuristic with proper identity-mapping check using SequenceEqual. A reordered selection like [2, 0, 1] has the same count but still needs reordering. Also validates indices are in-range before calling SelectFeatures. Renamed misleading test names and added shape assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fix collinear test fixture and add permuted feature selection test - Fixed direct-model test to use non-collinear features with correct formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges - Added permutation test proving [2, 0, 1] selection is not treated as identity — verifies the SequenceEqual check works correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add null guards to OptimizationResult Clone and WithParameters Callers that create OptimizationResult via object initializer (bypassing the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds, FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null. Deep-copy calls in Clone/WithParameters would throw NullReferenceException. Now falls back to empty defaults when properties are null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions - InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction instead of using a count-based check that skipped permuted selections - Move bounds validation to one-time check (cached _featureIndicesValidated flag) instead of scanning on every Predict() call - Assert permuted predictions actually differ from identity predictions - Replace fragile selectedCount < 10 with valid range assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: assert feature selection actually occurred in regression test Add assertion verifying optimizer selected a feature subset before testing Predict() with full-width input, proving the test exercises the reduced-column code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cache identity check with volatile int, set deterministic weights in test Replace per-call SequenceEqual + Enumerable.Range allocation with cached volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a simple loop instead of LINQ for zero-allocation identity check. Thread-safe via volatile read/write semantics. Set explicit non-uniform weights [1, 2, 3] in permuted feature selection test to ensure permutation deterministically produces different output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve first batch of pr #941 review comments (10 of 54) - InterpretabilityDashboard: filter empty attributions instead of using console output, use validCount for averaging - BaggingClassifier: validate sampleSize vs data size when bootstrap=false - Denclue: hoist _attractors and _attractorDensities guards out of loops - BIRCH: use Labels ?? Predict(x) instead of Labels! in FitPredict - SelfOrganizingMap: hoist _weights and _neuronLabels guards out of loops - CLIQUE: hoist Labels guard out of accumulation loop - SUBCLU: hoist _trainingData and Labels guards out of loops Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve second batch of pr #941 review comments (20 of 54) - GraphIsomorphismLayer: add local variable aliases for _lastMlpHiddenPreRelu in Forward and BackwardManual for clearer null safety - PagedAttentionKernel: add braces to ToFloat/FromFloat if blocks, replace null-forgiving with explicit boxing guards - KnowledgeDistillationTrainerBase: remove Console.WriteLine from library checkpoint loading code - DiffusionMemoryManager: separate PerDeviceMemory and TotalShardedMemory formatting into independent checks - MixtureOfExpertsLayer: add null guard before ApplyActivationToGraph instead of null-forgiving operator - KVCache: replace null-forgiving with explicit boxing guard for Half conversion - CachedMultiHeadAttention: hoist _cache null guard before Append call - CachedGroupedQueryAttention: hoist _cache null guard before Append call - DeformableConvolutionalLayer: stale comments (variables already renamed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve build errors from linter's automated null-forgiving removal Fix syntax and compilation errors introduced by the automated linter that removed null-forgiving operators without proper replacements: - Fix missing closing parentheses in foreach statements (COPKMeans, AgglomerativeClustering) - Fix duplicate variable declarations (AntColonyOptimization, CachedGroupedQueryAttention, CachedMultiHeadAttention) - Fix dropped underscore prefixes on private field references (KVCache, ConvLSTMLayer, PrincipalNeighbourhoodAggregationLayer) - Add validated accessor properties for nullable cache arrays (KVCache) - Fix variable ordering issues (FloraAdapter) - Fix nullable reference handling (VectorModel, UncertaintyCalibrationData, CrossValidationResult, AiModelResult.Uncertainty) - Fix Debug.WriteLine() call with no arguments (KnowledgeDistillationTrainerBase) - Add null guard for ClassLabels in PredictProbabilities (BaggingClassifier) - Fix nullable permutation array access (ConcatenateLayer) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist null guards for lambdas and alphaTs in DPMSolverMultistepScheduler Replace null-forgiving operators with proper null-coalescing throws for _lambdas and _alphaTs in FirstOrderUpdate, SecondOrderUpdate, and ThirdOrderUpdate methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove redundant null guard in LeafSent140FederatedDatasetLoader The testFilePath is already validated by IsNullOrWhiteSpace check in the enclosing if statement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: capture PreprocessingInfo in local variable to prevent TOCTOU Snapshot PreprocessingInfo once before accessing its members to avoid potential race conditions with concurrent access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: dispose pre-permutation tensor to prevent memory leak in ConcatenateLayer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: accumulate projector gradients from both views in BarlowTwins, SimCLR, and iBOT BarlowTwins: capture view1 projector grads before view2 backward clears them, then combine both views' grads for the projector update. SimCLR: replay projector forward per-view before backward to use correct caches, and accumulate projector gradients from both views. iBOT: backpropagate the full objective (clsLoss1 + clsLoss2 + lambda*(patchLoss1 + patchLoss2)) instead of only clsLoss1. Add ComputeMaskedPatchGradient for MSE gradient on masked patches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct backward pass in RIFE, FILM, and RAFT video models RIFE: use cached block input shape instead of previous block output shape for gradient splitting to correctly match the fused input width. FILM: fail fast on all required backward-pass caches instead of silently skipping fusion and flow estimator backprop when caches are null. RAFT: replace incorrect upsampleConv.Backward with bilinear UpsampleFlowBackward that mirrors the actual forward upsampling, and hoist invariant layer guards out of the refinement loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen DataLeakagePrevention tests with behavioral assertions ScalerStatisticsReflectTrainingDataOnly: add positive assertion that the pipeline transformation is not identity, confirming it was fitted on data. StillWorksCorrectly: exercise the model with actual predictions and assert outputs are finite numbers, not just that objects are non-null. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address major PR review comments across multiple files KNNEvaluator: hoist _trainLabels null check outside the voting loop. JointMutualInformation: use local miScores instead of field on line 212. ContradictionDetector: treat malformed contradictory fields as false instead of falling through to text scan that matches JSON key names. BFGSOptimizer: guard inverse Hessian update with curvature condition (y*s > 0) and use validated locals instead of null-forgiving operators. IncrementalPCA: validate all fitted state upfront in PartialFit and remove null-forgiving on _explainedVariance. RealGatedLinearRecurrenceLayer: hoist all cached tensor validation to top of Backward to prevent partial gradient corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address remaining major PR review comments PreprocessingRegistry: align documentation — clarify get/set is thread-safe but concurrent builds still overwrite each other's pipeline. InMemoryFederatedTrainer: fail fast when HE is enabled but no parameter indices are resolved instead of silently downgrading. Fix DP accounting: FedAsync uses 1/totalClients per iteration, FedBuff uses bufferSize/totalClients. AiModelBuilder: extract IsTimeSeriesTask() helper that checks both AutoML options and model type name for chronological split decisions. Reject row-changing data preparation in natural-client federated mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address trivial PR review comments across multiple files DeformableConvolutionalLayer: rename misleading variable names (mwShape/mbShape to maskWeights/maskBias, mw2/mwg/mbg to maskW/maskWeightGrad/maskBiasGrad). S4DLayer: hoist _lastHiddenStatesReal/Imag guards out of the BPTT loop. OnlineFeatureSelector: pass resolved running arrays into ComputeOnlineCorrelation instead of re-checking nullable fields inside the hot loop. SSLPretrainingPipeline: extract encoder to a local variable to avoid duplicate null-check expressions. SubpixelConvolutionalLayer: hoist ScalarActivation out of the per-element loop. BaggingClassifier: replace OrderBy(random.Next()) with Fisher-Yates partial shuffle for O(sampleSize) unbiased sampling without replacement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hoist gradient tensor validations in HedgehogLayer.FeatureMapBackward Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 12 pr review comments across multiple files - FloraAdapter: fix momentum reset bug (was zeroed every step) - KnowledgeDistillationTrainerBase: remove dead checkpoint branch - KVCache: hoist loop-invariant lookups outside inner loops - AiModelResult: capture stable model reference in explainer delegate - VectorModel: guard empty coefficients, remove dead BoxOrDefault - ConvLSTMLayer: use local lastInput instead of field reference - PrincipalNeighbourhoodAggregationLayer: hoist invariant tensors, remove null-forgiving operators in fallback path - BarlowTwins: remove redundant projector null-check - FILM: use local-cache pattern for cached features - DeepFilterNet: propagate gain layer backward gradient Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: net471 build errors in baggingclassifier and leafsent140loader - Replace range syntax pool[..sampleSize] with Array.Copy for net471 compatibility (RuntimeHelpers.GetSubArray not available) - Add explicit null check before string.IsNullOrWhiteSpace for net471 compatibility (missing NotNullWhen attribute) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use correct inner class name and remove null-forgiving in dashboard Replace InterpretabilityExplanation with Explanation (the actual inner class name) and replace null-forgiving operators with proper null guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address pr #941 review comments - KDTree: use single guarded local for _data null check in Compare - BallTree: use single guarded local for _data null check in Sort - DiffusionMemoryManager: remove duplicate unconditional total sharded line - InfluenceFunctionExplainer: hoist cachedGradients null check outside loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes a critical bug where
AiModelResult.Predict()did not apply the same feature selection that was used during training, causing dimension mismatch errors for all models built throughAiModelBuilderwith the default optimizer.The Bug
NormalOptimizer(the default optimizer) randomly selects a subset of features during training viaPrepareAndEvaluateSolutionOptimizationResult.SelectedFeaturesstores column vectors, butPredict()never applied this selection to new input dataPredict()passed all N columns to a model trained on M < N columns, it threw:"Number of columns must equal length of vector"Impact
This affects every model built through
AiModelBuilderusing the default optimizer (which is the primary use case). The defaultMaximumFeatures=0setting means the optimizer always selects just 1 feature, making this bug trigger 100% of the time for multi-feature datasets.The Fix
Added
SelectedFeatureIndices(List<int>) toOptimizationStepData,OptimizationResult, andModelResult— stores actual column indices (the existingSelectedFeaturesstores column vectors which require fragile heuristic-based index recovery)Propagated indices through the optimizer flow:
PrepareAndEvaluateSolution→UpdateBestSolution→CreateOptimizationResultApplied feature selection in
AiModelResult.Predict()— after preprocessing but beforeModel.Predict(), usingOptimizerHelper.SelectFeatures()with the stored indicesFiles Changed
OptimizationStepData.csSelectedFeatureIndicespropertyOptimizationResult.csSelectedFeatureIndices+ updatedDeepCopy/WithParametersModelResult.csSelectedFeatureIndicespropertyOptimizerHelper.csCreateOptimizationResultto accept indicesOptimizerBase.csPrepareAndEvaluateSolution,UpdateBestSolution,CreateOptimizationResultAiModelResult.csPredict()Test plan
BuildAsync→PredictpipelineBuildAsync_DefaultOptimizer_PredictDoesNotThrowDimensionMismatch— the exact bug reproBuildAsync_SelectedFeatureIndices_AreStoredInOptimizationResult— verifies indices are storedBuildAsync_FeatureSelectionReducesColumns_PredictHandlesIt— 10 features, verifies selectionBuildAsync_PredictTwice_ReturnsSameResults— determinism checkOptimizationResult_DeepCopy_PreservesSelectedFeatureIndices— deep copy safetyOptimizationResult_WithParameters_PreservesSelectedFeatureIndices— parameter update safetyBuildAsync_DirectModelPredict_WithoutBuilder_StillWorks— no regression for direct useLinearRegressionIntegrationTestsstill passCloses #927
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests