Skip to content

fix: apply feature selection in Predict() — fixes critical dimension mismatch bug#928

Merged
ooples merged 9 commits intomasterfrom
fix/critical-predict-and-type-safety-bugs
Mar 8, 2026
Merged

fix: apply feature selection in Predict() — fixes critical dimension mismatch bug#928
ooples merged 9 commits intomasterfrom
fix/critical-predict-and-type-safety-bugs

Conversation

@ooples
Copy link
Copy Markdown
Owner

@ooples ooples commented Mar 7, 2026

Summary

Fixes a critical bug where AiModelResult.Predict() did not apply the same feature selection that was used during training, causing dimension mismatch errors for all models built through AiModelBuilder with the default optimizer.

The Bug

  • NormalOptimizer (the default optimizer) randomly selects a subset of features during training via PrepareAndEvaluateSolution
  • The model is trained on this subset (e.g., 1 out of 5 columns)
  • OptimizationResult.SelectedFeatures stores column vectors, but Predict() never applied this selection to new input data
  • When Predict() passed all N columns to a model trained on M < N columns, it threw: "Number of columns must equal length of vector"

Impact

This affects every model built through AiModelBuilder using the default optimizer (which is the primary use case). The default MaximumFeatures=0 setting means the optimizer always selects just 1 feature, making this bug trigger 100% of the time for multi-feature datasets.

The Fix

  1. Added SelectedFeatureIndices (List<int>) to OptimizationStepData, OptimizationResult, and ModelResult — stores actual column indices (the existing SelectedFeatures stores column vectors which require fragile heuristic-based index recovery)

  2. Propagated indices through the optimizer flow: PrepareAndEvaluateSolutionUpdateBestSolutionCreateOptimizationResult

  3. Applied feature selection in AiModelResult.Predict() — after preprocessing but before Model.Predict(), using OptimizerHelper.SelectFeatures() with the stored indices

Files Changed

File Change
OptimizationStepData.cs Added SelectedFeatureIndices property
OptimizationResult.cs Added SelectedFeatureIndices + updated DeepCopy/WithParameters
ModelResult.cs Added SelectedFeatureIndices property
OptimizerHelper.cs Updated CreateOptimizationResult to accept indices
OptimizerBase.cs Store & propagate indices in PrepareAndEvaluateSolution, UpdateBestSolution, CreateOptimizationResult
AiModelResult.cs Core fix: Apply feature selection in Predict()

Test plan

  • 12 new integration tests covering the full BuildAsyncPredict pipeline
  • BuildAsync_DefaultOptimizer_PredictDoesNotThrowDimensionMismatch — the exact bug repro
  • BuildAsync_SelectedFeatureIndices_AreStoredInOptimizationResult — verifies indices are stored
  • BuildAsync_FeatureSelectionReducesColumns_PredictHandlesIt — 10 features, verifies selection
  • BuildAsync_PredictTwice_ReturnsSameResults — determinism check
  • OptimizationResult_DeepCopy_PreservesSelectedFeatureIndices — deep copy safety
  • OptimizationResult_WithParameters_PreservesSelectedFeatureIndices — parameter update safety
  • BuildAsync_DirectModelPredict_WithoutBuilder_StillWorks — no regression for direct use
  • All 24 existing LinearRegressionIntegrationTests still pass

Closes #927

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Models now record numeric indices of input columns chosen during optimization and expose them in optimization and model results.
    • Feature-index selections are propagated through result copies and updates so the chosen subset is preserved.
  • Bug Fixes

    • Prediction now supports 1D input selection, preventing dimension-mismatch errors.
  • Tests

    • New integration tests cover end-to-end feature-selection storage, application during prediction, determinism, bounds, and large-data scenarios.

…sion mismatch bug

The NormalOptimizer (default) randomly selects a subset of features during
training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never
applied the same feature selection to new input data. This caused the model
to receive all columns while it was trained on fewer, throwing "Number of
columns must equal length of vector" errors.

Changes:
- Add SelectedFeatureIndices (List<int>) to OptimizationStepData,
  OptimizationResult, and ModelResult to store actual column indices
- Propagate indices through optimizer flow: PrepareAndEvaluateSolution →
  UpdateBestSolution → CreateOptimizationResult
- Apply feature selection in AiModelResult.Predict() after preprocessing
  but before Model.Predict(), using OptimizerHelper.SelectFeatures()
- Add 12 integration tests verifying the full BuildAsync → Predict pipeline

Closes #927

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 7, 2026 11:27
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
aidotnet_website Ready Ready Preview, Comment Mar 7, 2026 4:52pm
aidotnet-playground-api Ready Ready Preview, Comment Mar 7, 2026 4:52pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Tracks and persists numeric selected-feature indices through optimizers and results, applies those indices at prediction time (including tensor/1D handling), updates helper/optimizer APIs to propagate indices, and adds extensive integration tests validating end-to-end feature-selection behavior.

Changes

Cohort / File(s) Summary
Data Model Extensions
src/Models/Results/OptimizationResult.cs, src/Models/Results/ModelResult.cs, src/Models/OptimizationStepData.cs
Add public SelectedFeatureIndices : List<int> to OptimizationResult, ModelResult, and OptimizationStepData; initialize and deep-copy/propagate in constructors/methods. BLOCKING: verify no null/uninitialized cases and API compatibility for public properties.
Optimizer Integration
src/Optimizers/OptimizerBase.cs
Propagate SelectedFeatureIndices through PrepareAndEvaluateSolution, UpdateBestSolution, UpdateAndApplyBestSolution; add overload to derive selection from total feature count. BLOCKING: ensure overrides in custom optimizers remain valid, bounds checks present, and no TODOs/stubs remain.
Helper Changes
src/Helpers/OptimizerHelper.cs
Extend CreateOptimizationResult signature with optional List<int>? bestSelectedFeatureIndices = null and set OptimizationResult.SelectedFeatureIndices; adjust SelectFeatures to handle 1D tensors (select elements and return 1D tensor). BLOCKING: confirm 1D tensor handling is robust and production-ready (no placeholder logic).
Prediction Logic
src/Models/Results/AiModelResult.cs
Add private ApplySelectedFeaturesForPrediction to apply OptimizationResult.SelectedFeatureIndices after preprocessing and before Model.Predict; invoked in AiModelResult.Predict(TInput) and InferenceSequence path. BLOCKING: validate identity short-circuit, strict index-range checks, and correct matrix/tensor shape handling.
Tests
tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs
Add comprehensive integration tests exercising storage/propagation and application of SelectedFeatureIndices, single/multi-row prediction, deterministic repeats, explicit optimizer paths, tensor-selection behavior, DeepCopy/WithParameters preservation, and large-data scenarios. BLOCKING: large test surface—ensure CI capacity, remove any test-only TODOs, and confirm determinism of randomized parts.
Call-site Adjustments
src/.../OptimizerHelper.CreateOptimizationResult callers
Update internal call sites to pass optional bestSelectedFeatureIndices where applicable. BLOCKING: confirm no public-facing binary incompatibility where helper is consumed externally.

Sequence Diagram(s)

sequenceDiagram
    participant Builder as "AiModelBuilder"
    participant Optimizer as "OptimizerBase"
    participant Helper as "OptimizerHelper"
    participant Result as "OptimizationResult"
    participant Predictor as "AiModelResult"
    participant Model as "IFullModel"

    Builder->>Optimizer: Optimize(trainingX, y)
    Optimizer->>Optimizer: RandomlySelectFeatures(totalFeatures)
    Optimizer->>Helper: SelectFeatures(X, selectedIndices)
    Optimizer->>Model: Train(X_subset, y)
    Optimizer->>Helper: CreateOptimizationResult(bestStepData, bestSelectedFeatureIndices)
    Helper-->>Result: OptimizationResult (with SelectedFeatureIndices)

    Predictor->>Predictor: PreprocessingInfo.TransformFeatures(newX)
    Predictor->>Result: read SelectedFeatureIndices
    alt SelectedFeatureIndices non-empty
        Predictor->>Helper: SelectFeatures(normalizedX, SelectedFeatureIndices)
        Helper-->>Model: Model.Predict(X_subset)
    else
        Predictor-->>Model: Model.Predict(normalizedX)
    end
    Model-->>Predictor: predictions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

feature

Poem

✂️ Columns pruned, indices in tow,
From training fields to inference flow.
Predictions now match the taught routine,
Tests guard the shapes, no mismatch seen. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the critical bug fix: applying feature selection in Predict() to resolve dimension-mismatch errors.
Linked Issues check ✅ Passed All objectives from #927 are met: feature selection indices persisted through OptimizationResult/ModelResult, propagated via OptimizerBase, applied in Predict(), DeepCopy/WithParameters preserve indices, and comprehensive integration tests validate the fix.
Out of Scope Changes check ✅ Passed All changes directly address #927: adding SelectedFeatureIndices properties, propagating through optimizer flows, implementing feature selection in Predict(), and supporting tests—no extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/critical-predict-and-type-safety-bugs

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where AiModelResult.Predict() did not apply the same feature selection used during training by the NormalOptimizer (and all OptimizerBase-derived optimizers), causing dimension mismatch errors ("Number of columns must equal length of vector") for all models built through AiModelBuilder. The fix introduces a new SelectedFeatureIndices (List<int>) property that stores the actual column indices selected during training and applies them to new input data in Predict().

Changes:

  • Added SelectedFeatureIndices property to OptimizationStepData, OptimizationResult, and ModelResult to carry feature selection indices through the optimizer pipeline
  • Propagated SelectedFeatureIndices through OptimizerBase.PrepareAndEvaluateSolutionUpdateBestSolutionCreateOptimizationResult, and updated OptimizerHelper.CreateOptimizationResult to accept the indices
  • Applied feature selection in AiModelResult.Predict() using the stored indices, plus 12 new integration tests covering the full BuildAsyncPredict pipeline

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Models/OptimizationStepData.cs Added SelectedFeatureIndices property with default initializer
src/Models/Results/ModelResult.cs Added SelectedFeatureIndices property to the struct
src/Models/Results/OptimizationResult.cs Added SelectedFeatureIndices property; updated constructor, DeepCopy(), and WithParameters()
src/Helpers/OptimizerHelper.cs Added optional bestSelectedFeatureIndices parameter to CreateOptimizationResult
src/Optimizers/OptimizerBase.cs Stores and propagates SelectedFeatureIndices in PrepareAndEvaluateSolution, UpdateBestSolution, and CreateOptimizationResult
src/Models/Results/AiModelResult.cs Core fix: applies feature selection using SelectedFeatureIndices in Predict() after preprocessing but before Model.Predict()
tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs 12 new integration tests covering the bug fix and related scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Models/Results/AiModelResult.cs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 1833-1845: Add a single helper method
ApplySelectedFeaturesForPrediction(TInput input) in AiModelResult that checks
OptimizationResult?.SelectedFeatureIndices, uses
Helpers.InputHelper<T,TInput>.GetInputSize(input) and when needed calls
Helpers.OptimizerHelper<T,TInput,TOutput>.SelectFeatures(input, featureIndices)
and returns the (possibly) reduced input; then replace the inline
feature-selection block in AiModelResult.Predict() with a call to this new
method and update InferenceSequence.Predict() (the path that currently
preprocesses full input and calls _result.Model.Predict(...)) to call
AiModelResult.ApplySelectedFeaturesForPrediction(...) before invoking
_result.Model.Predict so both prediction paths reuse the same centralized logic.

In `@src/Models/Results/ModelResult.cs`:
- Around line 166-170: Add full XML documentation for the SelectedFeatureIndices
property in the ModelResult struct/class to match the rest of the file: expand
the <summary> to describe the property, add a <remarks> section explaining how
indices map to feature columns and when/why they are populated, and include a
"For Beginners" paragraph that gives a simple example of using
SelectedFeatureIndices (e.g., selecting columns 0,2,4 after optimization).
Ensure you update the XML tags immediately above the SelectedFeatureIndices
property so it follows the same documentation pattern as the other properties in
ModelResult.

In `@src/Models/Results/OptimizationResult.cs`:
- Line 512: WithParameters currently assigns SelectedFeatureIndices by reference
which is inconsistent with DeepCopy; update the WithParameters implementation in
OptimizationResult (method/class name: WithParameters on OptimizationResult) to
make a defensive copy of SelectedFeatureIndices (e.g., construct a new List<int>
from the existing SelectedFeatureIndices) so the new instance does not share the
same list reference and matches DeepCopy semantics.

In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 23-24: Remove the unused constant LooseTolerance from the
AiModelBuilderPredictIntegrationTests class to eliminate dead code; locate the
field declaration "private const double LooseTolerance = 0.5;" and delete it (or
alternatively, if a looser numeric comparison is intended, replace usages of
Tolerance with LooseTolerance where appropriate in methods like
BuildAsync_PredictTwice_ReturnsSameResults), but prefer simply removing the
unused LooseTolerance constant.
- Around line 361-375: The test only checks value equality but not independence;
update the test
OptimizationResult_WithParameters_PreservesSelectedFeatureIndices to also verify
mutation isolation by mutating original.SelectedFeatureIndices after calling
WithParameters and asserting updated.SelectedFeatureIndices remains unchanged
(e.g., original.SelectedFeatureIndices.Add(99); Assert.Equal(new List<int>{1,3},
updated.SelectedFeatureIndices)); if you prefer to fix the implementation
instead, ensure OptimizationResult.WithParameters returns a new
OptimizationResult that clones SelectedFeatureIndices (copy the list) so the
returned instance does not share the same list reference.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33f37fd1-2716-46a7-938d-6ff40840bb46

📥 Commits

Reviewing files that changed from the base of the PR and between 640251b and 362a433.

📒 Files selected for processing (7)
  • src/Helpers/OptimizerHelper.cs
  • src/Models/OptimizationStepData.cs
  • src/Models/Results/AiModelResult.cs
  • src/Models/Results/ModelResult.cs
  • src/Models/Results/OptimizationResult.cs
  • src/Optimizers/OptimizerBase.cs
  • tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs

Comment thread src/Models/Results/AiModelResult.cs Outdated
Comment thread src/Models/Results/ModelResult.cs
Comment thread src/Models/Results/OptimizationResult.cs Outdated
…, docs

- Remove unused LooseTolerance constant
- Fix off-by-one: use != instead of > for feature index count check
- Apply feature selection in InferenceSession.Predict() (critical gap)
- Add comprehensive docs to SelectedFeatureIndices property
- Fix WithParameters to deep-copy collections (consistent with DeepCopy)
- Add copy independence assertion to WithParameters test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/Models/Results/AiModelResult.cs (1)

2197-2207: 🛠️ Refactor suggestion | 🟠 Major

Extract this into the shared prediction helper.

This is the same feature-selection block as in AiModelResult.Predict(). Keep one private helper for both paths so the selection predicate and validation cannot drift again.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Models/Results/AiModelResult.cs` around lines 2197 - 2207, Extract the
duplicated feature-selection block into a single private helper method and call
it from both AiModelResult.Predict() and the other path using the same logic:
read _result.OptimizationResult?.SelectedFeatureIndices, call
Helpers.InputHelper<T, TInput>.GetInputSize(normalizedNewData) for validation,
and call Helpers.OptimizerHelper<T, TInput,
TOutput>.SelectFeatures(normalizedNewData, featureIndices) when sizes differ;
preserve the same null/Count checks and return (or set) the normalizedNewData
from the helper so both callers use the identical selection/validation logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 1836-1844: Replace the current heuristic that skips feature
selection based on counts with a proper identity-check: create a helper (e.g.,
ApplySelectedFeaturesForPrediction) that reads
OptimizationResult.SelectedFeatureIndices, returns input immediately if
null/empty, computes inputSize via Helpers.InputHelper<T,
TInput>.GetInputSize(input), then determines identity mapping by checking
featureIndices.Count == inputSize AND
featureIndices.SequenceEqual(Enumerable.Range(0, inputSize)); only skip
SelectFeatures when that identity check passes; otherwise validate indices are
in-range and call Helpers.OptimizerHelper<T, TInput,
TOutput>.SelectFeatures(input, featureIndices). Apply the same helper in the
session prediction path so both branches use identical logic.

In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 234-237: The test FeatureSelectionReducesColumns is misleading
because it only checks selectedCount > 0 and <= 10 but not that reduction
occurred; update the assertions in AiModelBuilderPredictIntegrationTests.cs to
match intent: if you want to assert actual column reduction change the second
assertion to Assert.True(optimizationResult.SelectedFeatureIndices.Count < 10,
"Should select fewer features than the original 10"), otherwise rename the test
(and update the inline comment) to something like
FeatureSelectionHandlesSelectionGracefully and adjust the comment to say the
optimizer may select any subset (including all) and we only assert non-empty
selection via selectedCount/optimizationResult.SelectedFeatureIndices.Count.

---

Duplicate comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 2197-2207: Extract the duplicated feature-selection block into a
single private helper method and call it from both AiModelResult.Predict() and
the other path using the same logic: read
_result.OptimizationResult?.SelectedFeatureIndices, call Helpers.InputHelper<T,
TInput>.GetInputSize(normalizedNewData) for validation, and call
Helpers.OptimizerHelper<T, TInput, TOutput>.SelectFeatures(normalizedNewData,
featureIndices) when sizes differ; preserve the same null/Count checks and
return (or set) the normalizedNewData from the helper so both callers use the
identical selection/validation logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: db2dd249-403e-49db-9b42-e06f49bc8a89

📥 Commits

Reviewing files that changed from the base of the PR and between 362a433 and b5ebf53.

📒 Files selected for processing (4)
  • src/Models/Results/AiModelResult.cs
  • src/Models/Results/ModelResult.cs
  • src/Models/Results/OptimizationResult.cs
  • tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs

Comment thread src/Models/Results/AiModelResult.cs Outdated
…test

SelectFeaturesTensor previously threw for 1D tensors. Now handles them
by directly selecting elements at the specified indices. Added integration
test proving Predict applies SelectedFeatureIndices correctly with a real
FeedForwardNeuralNetwork.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 386-442: The test method
InferenceSession_Predict_AppliesFeatureSelection is misnamed and has a weak
assertion; rename it to AiModelResult_Predict_AppliesTensorFeatureSelection to
reflect that it constructs AiModelResult and calls AiModelResult.Predict, and
replace or augment the Assert.NotNull(prediction) with a stronger assertion that
verifies the prediction shape/size (e.g., assert prediction length or output
dimension equals outputSize) using the existing outputSize variable so the test
ensures feature selection produced an output of the expected shape.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3339c2b3-a58b-415b-8b33-8753b1ce71b8

📥 Commits

Reviewing files that changed from the base of the PR and between b5ebf53 and c71e314.

📒 Files selected for processing (2)
  • src/Helpers/OptimizerHelper.cs
  • tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs

Replaced inputSize == featureIndices.Count heuristic with proper
identity-mapping check using SequenceEqual. A reordered selection like
[2, 0, 1] has the same count but still needs reordering. Also validates
indices are in-range before calling SelectFeatures. Renamed misleading
test names and added shape assertion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/Models/Results/AiModelResult.cs (1)

2222-2232: ⚠️ Potential issue | 🔴 Critical

Blocking: session inference still bypasses the new selection helper.

InferenceSequence.Predict() still uses the old inputSize != featureIndices.Count shortcut. Line 2227 will skip valid reorderings like [2, 0, 1], and this path never gets the bounds validation now implemented in ApplySelectedFeaturesForPrediction(...). That leaves BeginInferenceSession().CreateSequence().Predict(...) behavior inconsistent with AiModelResult.Predict(...) and can return incorrect predictions through a public inference path.

Suggested fix
-            // Apply feature selection if the optimizer selected a subset of features during training.
-            var featureIndices = _result.OptimizationResult?.SelectedFeatureIndices;
-            if (featureIndices != null && featureIndices.Count > 0)
-            {
-                int inputSize = Helpers.InputHelper<T, TInput>.GetInputSize(normalizedNewData);
-                if (inputSize != featureIndices.Count)
-                {
-                    normalizedNewData = Helpers.OptimizerHelper<T, TInput, TOutput>
-                        .SelectFeatures(normalizedNewData, featureIndices);
-                }
-            }
+            normalizedNewData = _result.ApplySelectedFeaturesForPrediction(normalizedNewData);

Based on learnings and coding guidelines, "Every PR must contain production-ready code" and "Incomplete features: Half-implemented patterns where some code paths work but others silently do nothing" are blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Models/Results/AiModelResult.cs` around lines 2222 - 2232,
InferenceSequence.Predict() is still using the old fast-path check "inputSize !=
featureIndices.Count" which skips valid reorderings (e.g. [2,0,1]) and bypasses
the bounds validation in ApplySelectedFeaturesForPrediction(...); update
InferenceSequence.Predict() (the method invoked via
BeginInferenceSession().CreateSequence().Predict(...)) to call the same
ApplySelectedFeaturesForPrediction(normalizedNewData, featureIndices) helper (or
copy its exact validation + selection logic) instead of relying on inputSize
comparison so reorderings are applied and SelectedFeatureIndices bounds are
validated consistently with AiModelResult.Predict().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 387-444: The test
AiModelResult_Predict_AppliesTensorFeatureSelection only checks a
length-reducing ascending subset; add a deterministic permutation case using
SelectedFeatureIndices = new List<int> {2,0,3,1} (or similar full-length
reordering) inside the same test (or a new test) to feed a known fullInput
Tensor<float> and assert that result.Predict(...) returns values in the
reordered order (check actual prediction values, not just shape) to ensure
AiModelResult.Predict applies reordering (referencing SelectedFeatureIndices,
AiModelResult.Predict, OptimizationResult and Tensor<float>/prediction.Shape to
locate where to change).
- Around line 312-334: The test
BuildAsync_DirectModelPredict_WithoutBuilder_StillWorks uses a collinear fixture
(second column = 2*first) and an incorrect formula comment (says y = x1 + 2*x2 +
1 while the data corresponds to a different mapping), and the assertions use
overly wide ranges; fix by replacing the input matrix created via CreateMatrix
so columns are non-collinear (e.g., choose independent values for the second
feature) or explicitly construct x so the intended mapping y = x1 + 2*x2 + 1
holds, then compute the exact expected predictions for newX and assert those
values with a tight tolerance (use Assert.Equal with a delta or narrow
Assert.InRange) against model.Predict; update the comment to match the fixture
and keep references to RidgeRegression<double>,
BuildAsync_DirectModelPredict_WithoutBuilder_StillWorks, CreateMatrix,
CreateVector, and Predict so reviewers can find the changes.

---

Duplicate comments:
In `@src/Models/Results/AiModelResult.cs`:
- Around line 2222-2232: InferenceSequence.Predict() is still using the old
fast-path check "inputSize != featureIndices.Count" which skips valid
reorderings (e.g. [2,0,1]) and bypasses the bounds validation in
ApplySelectedFeaturesForPrediction(...); update InferenceSequence.Predict() (the
method invoked via BeginInferenceSession().CreateSequence().Predict(...)) to
call the same ApplySelectedFeaturesForPrediction(normalizedNewData,
featureIndices) helper (or copy its exact validation + selection logic) instead
of relying on inputSize comparison so reorderings are applied and
SelectedFeatureIndices bounds are validated consistently with
AiModelResult.Predict().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 181238a3-54fb-4a33-80ea-6b541a37df36

📥 Commits

Reviewing files that changed from the base of the PR and between c71e314 and e2c8e52.

📒 Files selected for processing (2)
  • src/Models/Results/AiModelResult.cs
  • tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/Models/Results/AiModelResult.cs:1844

  • The guard if (inputSize != featureIndices.Count) skips feature selection when the input already has the same number of features as selected indices. This is intended to handle the case where the user has already pre-selected features, but it creates a silent correctness risk: if the user passes a matrix with the same number of columns as selected features but those columns are not the correct selected features (e.g., the original training data had 5 features and 2 were selected, and the user happens to pass a 2-column matrix of different features), no feature selection is applied and the model receives the wrong columns without error. While this is a design tradeoff, there's no documentation of this assumption and no validation that the input size equals the selected count for valid reasons. Consider either documenting this behavior clearly or storing the original full feature count alongside SelectedFeatureIndices so that you can distinguish "pre-selected" vs. "wrong-size" inputs.

        // INFERENCE OPTIMIZATION PATH: apply configured inference optimizations for neural network models
        if (InferenceOptimizationConfig != null &&
            Model is NeuralNetworkBase<T> neuralModel &&
            normalizedNewData is Tensor<T> inputTensor)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Models/Results/OptimizationResult.cs Outdated
- Fixed direct-model test to use non-collinear features with correct
  formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges
- Added permutation test proving [2, 0, 1] selection is not treated as
  identity — verifies the SequenceEqual check works correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Callers that create OptimizationResult via object initializer (bypassing
the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds,
FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null.
Deep-copy calls in Clone/WithParameters would throw NullReferenceException.
Now falls back to empty defaults when properties are null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Models/Results/AiModelResult.cs Outdated
Comment thread src/Models/Results/AiModelResult.cs Outdated
Comment thread src/Models/Results/ModelResult.cs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Models/Results/OptimizationResult.cs (1)

477-494: ⚠️ Potential issue | 🟠 Major

DeepCopy() and WithParameters() still return inconsistent nested state.

TrainingResult, ValidationResult, TestResult, and FitDetectionResult are still reused by reference. That breaks isolation in DeepCopy(), and WithParameters() also leaves predictions/statistics from the old model attached to a different BestSolution. Deep-clone those nested results in DeepCopy(), and recompute or clear them in WithParameters() when the parameters change.

Also applies to: 502-519

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Models/Results/OptimizationResult.cs` around lines 477 - 494, DeepCopy()
currently shallow-copies nested result objects (TrainingResult,
ValidationResult, TestResult, FitDetectionResult), causing shared mutable state;
update OptimizationResult<T, TInput, TOutput>.DeepCopy() to deep-clone those
nested result objects (call their DeepCopy/Clone methods or construct new
instances copying all fields) instead of reusing references, and ensure
BestSolution is deep-copied before attaching any nested results. Likewise, in
WithParameters(...) detect when parameters change and either recompute dependent
predictions/statistics for
TrainingResult/ValidationResult/TestResult/FitDetectionResult based on the new
BestSolution or explicitly clear/reset those fields so stale predictions aren’t
carried to a different solution; update the WithParameters method to create
fresh result objects or null/empty them when parameters change.
♻️ Duplicate comments (2)
tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs (2)

508-523: ⚠️ Potential issue | 🔴 Critical

This permutation test still always passes on the broken branch.

predictionsMatch is computed and then ignored, so the only real assertion here is the output shape. A regression that skips reordering entirely still passes. Use a manual oracle instead: permute the input tensor yourself, run it through the identity-path model, and assert that output matches result.Predict(input).

🔧 Assert the actual permutation result
-        var identityPrediction = identityResult.Predict(input);
+        var manuallyPermutedInput = new Tensor<float>(new[] { featureCount });
+        manuallyPermutedInput[0] = 30.0f;
+        manuallyPermutedInput[1] = 10.0f;
+        manuallyPermutedInput[2] = 20.0f;
+        var expectedPrediction = identityResult.Predict(manuallyPermutedInput);
@@
-        bool predictionsMatch = Math.Abs(permutedPrediction[0] - identityPrediction[0]) < 1e-6f;
-        // Note: predictions *might* match if all weights happen to be equal (unlikely with random init),
-        // but in practice DenseLayer initializes weights differently per position.
-        // The key thing is the test doesn't throw — proving permutation is applied, not skipped.
         Assert.Equal(outputSize, permutedPrediction.Shape[0]);
+        Assert.Equal(outputSize, expectedPrediction.Shape[0]);
+        Assert.Equal(expectedPrediction[0], permutedPrediction[0], 1e-6f);

As per coding guidelines, Tests MUST be production-quality and Good tests should assert specific expected values, not just non-null.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`
around lines 508 - 523, The test currently computes predictionsMatch but never
uses it, so it only asserts shape and misses verifying permutation; fix by
explicitly applying the same feature permutation to the input tensor and using
that permuted input as the oracle for the identity-path model: create
permutedInput by reordering the original input according to the permutation used
by the model (the same mapping that should have been applied in result), then
call var oraclePrediction = identityResult.Predict(permutedInput) and assert
that oraclePrediction equals permutedPrediction (element-wise equality within an
epsilon) and that shapes match; use the existing variables result.Predict,
identityResult.Predict, permutedPrediction and input to locate where to insert
this check.

331-337: ⚠️ Potential issue | 🔴 Critical

These bounds are still too wide for a fixed regression fixture.

The dataset encodes a concrete mapping, but 39..43 and 44..48 still allow materially wrong outputs to pass. Either switch this direct-path regression to a non-regularized model and assert 41/46 tightly, or keep RidgeRegression<double> and narrow the tolerance to a bound that is justified by its default regularization.

🔧 One deterministic way to tighten this
-        var model = new RidgeRegression<double>();
+        var model = new MultipleRegression<double>();
@@
-        Assert.InRange(predictions[0], 39, 43);
-        Assert.InRange(predictions[1], 44, 48);
+        Assert.Equal(41.0, predictions[0], 1e-6);
+        Assert.Equal(46.0, predictions[1], 1e-6);

As per coding guidelines, Good tests should assert specific expected values, not just non-null.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`
around lines 331 - 337, The test currently allows overly wide tolerances for a
deterministic regression; in AiModelBuilderPredictIntegrationTests update the
model or the assertions so the expected outputs are tight: either switch the
model creation from RidgeRegression<double> to a non-regularized estimator
(e.g., use OrdinaryLeastSquares / the equivalent non-regularized builder) so the
mapping produces exact values 41 and 46, then replace the loose Assert.InRange
checks with exact equality assertions for predictions[0] == 41 and
predictions[1] == 46 (or use Assert.Equal with an appropriate minimal
precision), or keep RidgeRegression<double> but reduce the ranges to a
narrowly-justified tolerance based on its default regularization (e.g., ±1 or
use Assert.Equal with a small delta); update the assertions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 28-56: The test never guarantees the optimizer actually reduced
features; after BuildAsync() assert that the trained model selected a strict
subset by checking SelectedFeatureIndices.Count < 5 (i.e. add
Assert.True(result.SelectedFeatureIndices.Count < 5) or the equivalent accessor
on the built model), or instead configure the optimizer (NormalOptimizer)
deterministically to force feature selection; ensure this unconditional
assertion runs before calling result.Predict(newData).

---

Outside diff comments:
In `@src/Models/Results/OptimizationResult.cs`:
- Around line 477-494: DeepCopy() currently shallow-copies nested result objects
(TrainingResult, ValidationResult, TestResult, FitDetectionResult), causing
shared mutable state; update OptimizationResult<T, TInput, TOutput>.DeepCopy()
to deep-clone those nested result objects (call their DeepCopy/Clone methods or
construct new instances copying all fields) instead of reusing references, and
ensure BestSolution is deep-copied before attaching any nested results.
Likewise, in WithParameters(...) detect when parameters change and either
recompute dependent predictions/statistics for
TrainingResult/ValidationResult/TestResult/FitDetectionResult based on the new
BestSolution or explicitly clear/reset those fields so stale predictions aren’t
carried to a different solution; update the WithParameters method to create
fresh result objects or null/empty them when parameters change.

---

Duplicate comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs`:
- Around line 508-523: The test currently computes predictionsMatch but never
uses it, so it only asserts shape and misses verifying permutation; fix by
explicitly applying the same feature permutation to the input tensor and using
that permuted input as the oracle for the identity-path model: create
permutedInput by reordering the original input according to the permutation used
by the model (the same mapping that should have been applied in result), then
call var oraclePrediction = identityResult.Predict(permutedInput) and assert
that oraclePrediction equals permutedPrediction (element-wise equality within an
epsilon) and that shapes match; use the existing variables result.Predict,
identityResult.Predict, permutedPrediction and input to locate where to insert
this check.
- Around line 331-337: The test currently allows overly wide tolerances for a
deterministic regression; in AiModelBuilderPredictIntegrationTests update the
model or the assertions so the expected outputs are tight: either switch the
model creation from RidgeRegression<double> to a non-regularized estimator
(e.g., use OrdinaryLeastSquares / the equivalent non-regularized builder) so the
mapping produces exact values 41 and 46, then replace the loose Assert.InRange
checks with exact equality assertions for predictions[0] == 41 and
predictions[1] == 46 (or use Assert.Equal with an appropriate minimal
precision), or keep RidgeRegression<double> but reduce the ranges to a
narrowly-justified tolerance based on its default regularization (e.g., ±1 or
use Assert.Equal with a small delta); update the assertions accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a135915-d41d-4cc6-bfd5-e56f1b28ceec

📥 Commits

Reviewing files that changed from the base of the PR and between e2c8e52 and 3c94dab.

📒 Files selected for processing (2)
  • src/Models/Results/OptimizationResult.cs
  • tests/AiDotNet.Tests/IntegrationTests/Regression/AiModelBuilderPredictIntegrationTests.cs

…eck, robust assertions

- InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction
  instead of using a count-based check that skipped permuted selections
- Move bounds validation to one-time check (cached _featureIndicesValidated flag)
  instead of scanning on every Predict() call
- Assert permuted predictions actually differ from identity predictions
- Replace fragile selectedCount < 10 with valid range assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add assertion verifying optimizer selected a feature subset before testing
Predict() with full-width input, proving the test exercises the reduced-column
code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Models/Results/AiModelResult.cs Outdated
Comment thread src/Models/Results/AiModelResult.cs Outdated
…s in test

Replace per-call SequenceEqual + Enumerable.Range allocation with cached
volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a
simple loop instead of LINQ for zero-allocation identity check.
Thread-safe via volatile read/write semantics.

Set explicit non-uniform weights [1, 2, 3] in permuted feature selection
test to ensure permutation deterministically produces different output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ooples added a commit that referenced this pull request Mar 7, 2026
Refactor tests that called result.Predict() (which fails due to the
feature selection bug fixed in PR #928) to verify preprocessing pipeline
state directly via PreprocessingInfo.TransformFeatures() instead.
Tests now verify the data leakage fix independently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ooples ooples merged commit 60d1df1 into master Mar 8, 2026
36 of 39 checks passed
@ooples ooples deleted the fix/critical-predict-and-type-safety-bugs branch March 8, 2026 15:03
ooples added a commit that referenced this pull request Mar 9, 2026
…mismatch bug (#928)

* fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug

The NormalOptimizer (default) randomly selects a subset of features during
training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never
applied the same feature selection to new input data. This caused the model
to receive all columns while it was trained on fewer, throwing "Number of
columns must equal length of vector" errors.

Changes:
- Add SelectedFeatureIndices (List<int>) to OptimizationStepData,
  OptimizationResult, and ModelResult to store actual column indices
- Propagate indices through optimizer flow: PrepareAndEvaluateSolution →
  UpdateBestSolution → CreateOptimizationResult
- Apply feature selection in AiModelResult.Predict() after preprocessing
  but before Model.Predict(), using OptimizerHelper.SelectFeatures()
- Add 12 integration tests verifying the full BuildAsync → Predict pipeline

Closes #927

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - feature selection, copy semantics, docs

- Remove unused LooseTolerance constant
- Fix off-by-one: use != instead of > for feature index count check
- Apply feature selection in InferenceSession.Predict() (critical gap)
- Add comprehensive docs to SelectedFeatureIndices property
- Fix WithParameters to deep-copy collections (consistent with DeepCopy)
- Add copy independence assertion to WithParameters test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: support 1D tensor feature selection and add predict integration test

SelectFeaturesTensor previously threw for 1D tensors. Now handles them
by directly selecting elements at the specified indices. Added integration
test proving Predict applies SelectedFeatureIndices correctly with a real
FeedForwardNeuralNetwork.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use identity-check instead of count heuristic for feature selection

Replaced inputSize == featureIndices.Count heuristic with proper
identity-mapping check using SequenceEqual. A reordered selection like
[2, 0, 1] has the same count but still needs reordering. Also validates
indices are in-range before calling SelectFeatures. Renamed misleading
test names and added shape assertion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: fix collinear test fixture and add permuted feature selection test

- Fixed direct-model test to use non-collinear features with correct
  formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges
- Added permutation test proving [2, 0, 1] selection is not treated as
  identity — verifies the SequenceEqual check works correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add null guards to OptimizationResult Clone and WithParameters

Callers that create OptimizationResult via object initializer (bypassing
the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds,
FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null.
Deep-copy calls in Clone/WithParameters would throw NullReferenceException.
Now falls back to empty defaults when properties are null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions

- InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction
  instead of using a count-based check that skipped permuted selections
- Move bounds validation to one-time check (cached _featureIndicesValidated flag)
  instead of scanning on every Predict() call
- Assert permuted predictions actually differ from identity predictions
- Replace fragile selectedCount < 10 with valid range assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: assert feature selection actually occurred in regression test

Add assertion verifying optimizer selected a feature subset before testing
Predict() with full-width input, proving the test exercises the reduced-column
code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cache identity check with volatile int, set deterministic weights in test

Replace per-call SequenceEqual + Enumerable.Range allocation with cached
volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a
simple loop instead of LINQ for zero-allocation identity check.
Thread-safe via volatile read/write semantics.

Set explicit non-uniform weights [1, 2, 3] in permuted feature selection
test to ensure permutation deterministically produces different output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ooples added a commit that referenced this pull request Mar 9, 2026
…934)

* fix: prevent data leakage by fitting preprocessing only on training data

In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g.,
StandardScaler) was fitted on the ENTIRE dataset before the train/test
split. This leaked test/validation statistics (mean, std dev) into the
training pipeline, causing artificially inflated metrics.

The fix restructures the method:
1. Split data into train/val/test FIRST using DataSplitter.Split()
2. FitTransform preprocessing pipeline on training data ONLY
3. Transform (not FitTransform) validation and test data

The federated learning path is unchanged — it correctly uses all data
as training data, so FitTransform on everything remains correct.

Fixes #929

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync

- Scope #pragma warning disable CS8600/CS8604 with matching restore
- Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads
- Update PreprocessingRegistry docs to reflect thread-safety and usage pattern
- Rewrite tests to actually detect leakage: verify scaler produces different
  output than all-data scaler, verify transform shape and non-identity
- Add end-to-end predict test with preprocessing
- Add test verifying no-preprocessing path still works with predict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move fitresample after train/test split to prevent data leakage

FitResample (SMOTE, outlier removal, augmentation) was applied to the
full dataset before splitting, allowing synthetic samples derived from
test/validation data to leak into training. Now FitResample runs after
the split on training data only in both standard and AutoML paths.
Also fixed AutoML preprocessing to FitTransform on training only and
Transform on validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update registry docs and use consistent shuffle policy for splits

Updated PreprocessingRegistry docs to clearly warn about concurrent build
safety issues. Applied same time-series non-shuffle policy to the final
supervised training split that AutoML already uses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use post-preparation training data in no-preprocessing else branch

When _dataPreparationPipeline runs (resampling/outlier removal) it modifies
XTrain/yTrain. The else branch (no preprocessing) was incorrectly using
preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing
training-only preparation results for downstream logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make data leakage tests self-sufficient without PR 928

Refactor tests that called result.Predict() (which fails due to the
feature selection bug fixed in PR #928) to verify preprocessing pipeline
state directly via PreprocessingInfo.TransformFeatures() instead.
Tests now verify the data leakage fix independently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make data leakage test deterministic with widely-spaced unique values

Replace clustered bimodal data with linearly-spaced unique values (0, 10, 20, ..., 990)
to guarantee any 70/30 shuffle split produces significantly different scaler statistics
from the full dataset, eliminating flaky test failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from default!, null!, and remaining field! patterns

Replaces default! with proper initialization (NumOps.Zero, throw, or nullable types),
null! with nullable parameters or removed dead code, and field! with extracted local
variables with null-coalescing throw expressions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address pr review comments for aimodelbuilder and totem

- Fix exception text to reference ConfigureModel() instead of WithModel()
- Remove nullable T? from TOTEM._lastCommitmentLoss field, initialize
  with NumOps.Zero in both constructors to avoid ambiguous nullability
- Fix incorrect comment about Matrix<T>/Vector<T> being value types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use numops.equals instead of equalitycomparer for net471 compat

Replace EqualityComparer<T>.Default.Equals with _numOps.Equals to
fix CS8604 null reference error on net471 and follow codebase patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow pragma scope and snapshot transform input in tests

- Narrow CS8600/CS8604 pragma scopes to specific declaration and call
  sites instead of spanning 800+ lines
- Clarify StatisticsHelper comment about default(T) vs NumOps.Zero
- Snapshot original values before Transform in test to prevent
  in-place mutation from invalidating the comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ooples added a commit that referenced this pull request Mar 10, 2026
…ining directories (#933) (#941)

* fix: prevent data leakage by fitting preprocessing only on training data

In BuildSupervisedInternalAsync(), the preprocessing pipeline (e.g.,
StandardScaler) was fitted on the ENTIRE dataset before the train/test
split. This leaked test/validation statistics (mean, std dev) into the
training pipeline, causing artificially inflated metrics.

The fix restructures the method:
1. Split data into train/val/test FIRST using DataSplitter.Split()
2. FitTransform preprocessing pipeline on training data ONLY
3. Transform (not FitTransform) validation and test data

The federated learning path is unchanged — it correctly uses all data
as training data, so FitTransform on everything remains correct.

Fixes #929

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 12 pr review comments - leakage tests, pragma scope, registry sync

- Scope #pragma warning disable CS8600/CS8604 with matching restore
- Restore PreprocessingRegistry.Current in all ConfigurePreprocessing overloads
- Update PreprocessingRegistry docs to reflect thread-safety and usage pattern
- Rewrite tests to actually detect leakage: verify scaler produces different
  output than all-data scaler, verify transform shape and non-identity
- Add end-to-end predict test with preprocessing
- Add test verifying no-preprocessing path still works with predict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move fitresample after train/test split to prevent data leakage

FitResample (SMOTE, outlier removal, augmentation) was applied to the
full dataset before splitting, allowing synthetic samples derived from
test/validation data to leak into training. Now FitResample runs after
the split on training data only in both standard and AutoML paths.
Also fixed AutoML preprocessing to FitTransform on training only and
Transform on validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update registry docs and use consistent shuffle policy for splits

Updated PreprocessingRegistry docs to clearly warn about concurrent build
safety issues. Applied same time-series non-shuffle policy to the final
supervised training split that AutoML already uses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use post-preparation training data in no-preprocessing else branch

When _dataPreparationPipeline runs (resampling/outlier removal) it modifies
XTrain/yTrain. The else branch (no preprocessing) was incorrectly using
preparedX/preparedY (pre-preparation) instead of XTrain/yTrain, losing
training-only preparation results for downstream logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make data leakage tests self-sufficient without PR 928

Refactor tests that called result.Predict() (which fails due to the
feature selection bug fixed in PR #928) to verify preprocessing pipeline
state directly via PreprocessingInfo.TransformFeatures() instead.
Tests now verify the data leakage fix independently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: preserve preprocessing pipeline state during serialization (#936)

* fix: add serialization support to preprocessing pipeline fitted state

- Add [JsonProperty] attributes to private fields across all preprocessing
  transformers (StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler,
  DecimalScaler, GlobalContrastScaler, LogScaler, LogMeanVarianceScaler,
  LpNormScaler, Normalizer, SimpleImputer) so Newtonsoft.Json serializes
  fitted state (means, std devs, min/max, etc.)

- Add [JsonProperty] to TransformerBase.IsFitted and ColumnIndices for
  proper deserialization of base class state

- Add [JsonConstructor] to MinMaxScaler and RobustScaler to resolve
  ambiguous constructor selection during deserialization

- Replace ValueTuple in PreprocessingPipeline._steps with a new
  PipelineStep<T, TInput> class that serializes reliably with
  Newtonsoft.Json (ValueTuples lose field names during serialization)

- Add [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] to
  PreprocessingInfo.Pipeline and TargetPipeline for polymorphic
  deserialization of pipeline step transformers

- Add 16 integration tests verifying serialization round-trips for all
  modified transformers, pipelines, and inverse transforms

Fixes #930

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 13 pr review comments - serialization safety, tests, perf

- Remove null-forgiving operator from PipelineStep (use JsonConstructor)
- Make PipelineStep.Transformer setter private (immutable after construction)
- Fix ColumnIndices deserialization (add private setter for Json.NET)
- Fix Steps property to return cached read-only list (no alloc per call)
- Add explicit range assertions to MinMaxScaler custom range test
- Add transform equivalence check to SimpleImputer test
- Add LpNormScaler serialization round-trip test
- Add Normalizer serialization round-trip test
- Add LogMeanVarianceScaler serialization round-trip test
- TypeNameHandling.Auto is safe: SafeSerializationBinder restricts types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make pipelinestep immutable, validate constructor, add missing tests

- Made PipelineStep.Name private set to prevent mutation after construction
- PipelineStep constructor now validates name (non-whitespace) and transformer
  (non-null), failing fast on invalid serialized steps
- Added ColumnIndices round-trip test proving non-default indices survive
  serialization and untouched columns remain unchanged
- Added TargetPipeline round-trip tests proving both null and non-null target
  pipelines survive serialization with transform equivalence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent JSON duplicate serialization and remove per-member TypeNameHandling

Add [JsonIgnore] to public getter-only properties that mirror [JsonProperty]
backing fields in all transformers (StandardScaler, RobustScaler, SimpleImputer,
DecimalScaler, GlobalContrastScaler, MinMaxScaler, Normalizer, MaxAbsScaler,
LogScaler) to prevent Newtonsoft emitting both field and property.

Remove per-member TypeNameHandling.Auto from PipelineStep.Transformer and
_finalTransformer — rely on serializer-level settings with SafeSerializationBinder
instead, preventing unsafe TypeNameHandling bypass.

Add Deconstruct method to PipelineStep for backward-compatible tuple destructuring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove static PreprocessingRegistry — prevents concurrent build race conditions (#935)

* fix: remove static PreprocessingRegistry usage from AiModelBuilder (#931)

Remove the 3 `PreprocessingRegistry<T, TInput>.Current = _preprocessingPipeline`
assignments from AiModelBuilder.ConfigurePreprocessing() and the 3 analogous
PostprocessingRegistry assignments from ConfigurePostprocessing(). The pipeline
is already stored as an instance field on the builder and flows to AiModelResult
via PreprocessingInfo — the global static registry was unnecessary and caused
race conditions when multiple builders ran concurrently.

Changes:
- Remove 6 static registry assignments from AiModelBuilder.cs
- Add [Obsolete] attribute to PreprocessingRegistry<T, TInput> class
- Replace PreprocessingRegistry usage in DocumentNeuralNetworkBase with
  an instance-level PreprocessingTransformer property
- Add 4 integration tests verifying concurrent builds don't cross-contaminate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - registry deprecation, thread safety, docs

- Fix PreprocessingTransformer docs: clarify protected accessibility
- Add PostprocessingTransformer instance property to DocumentNeuralNetworkBase
- Replace PostprocessingRegistry static usage with instance-based transformer
- Mark PostprocessingRegistry as [Obsolete] (same treatment as PreprocessingRegistry)
- Update PreprocessingRegistry.Current docs to reflect no longer auto-set
- Fix concurrent test to use separate data loaders (thread-safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: wire preprocessing transformer onto document models in buildasync

Changed PreprocessingTransformer and PostprocessingTransformer from
protected to protected internal so AiModelBuilder can set them.
BuildSupervisedInternalAsync now wires the preprocessing pipeline onto
DocumentNeuralNetworkBase models, replacing the static registry approach
that caused race conditions during concurrent builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow transformer surface, fail fast on unfitted, wire postprocessing

- Changed PreprocessingTransformer/PostprocessingTransformer back to protected
  and added internal ConfigureTransformers() method for AiModelBuilder wiring
- PreprocessDocument/PostprocessOutput now throw if transformer is set but not
  fitted instead of silently falling back to defaults
- Wired both preprocessing AND postprocessing onto DocumentNeuralNetworkBase
- Updated PreprocessingRegistry docs to reflect deprecated/removed status
- Added [Collection] to registry tests and try/finally cleanup
- Fixed "override" wording to "assign" in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: centralize document transformer injection and strengthen test assertions

Extract transformer wiring into ConfigureDocumentTransformers() helper method
that always passes current pipeline values (clearing prior state on reuse).
Called from both supervised build path and BuildProgramSynthesisInferenceOnlyResult().

Strengthen PreprocessingRegistryIntegrationTests to verify PreprocessingInfo
is non-null and fitted, not just Assert.NotNull(result).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unfitted check from PostprocessOutput, update docs, strengthen tests

PostprocessOutput no longer throws on unfitted transformers since
postprocessing transforms (Softmax, LabelDecoder) are often stateless.
Updated XML docs to reflect instance-level transformer priority.
Strengthen concurrent and sequential test assertions to verify
PreprocessingInfo is non-null and fitted on each result.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from video

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from neural network layers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: warn on empty attributions and use guarded locals in jwt attestation verifier

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: include gain layer in backward pass and parameter updates in deepfilternet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rebuild streaming buffers after deserializing fft/hop size in neural noise reducer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from federated learning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from federated learning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: respect bootstrap flag and align estimator probabilities to ensemble label space

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove null-forgiving operators from clustering modules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: throw on invalid ensemble members and cold model in adaptive random forest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify trained-state error messages in ordinal regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unused alpha_s and use guarded local consistently in unipc scheduler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add guard for latent variance in compute elbo

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist cached gradient lookup out of per-sample loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace console.writeline with debug.writeline for checkpoint reporting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cache compressed momentum after resampling projection matrices

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: show total sharded memory independently of per-device memory in tostring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 21 pr review comments for code quality improvements

- CLIPScore: throw on embedding/weight dimension mismatch instead of Math.Min
- Adam8BitOptimizer: remove redundant _vScales null check
- BFGSOptimizer: move _inverseHessian null check before UpdateInverseHessian call
- FTRLOptimizer: use post-update accumulator (nPlusGradSq) in both paths
- MomentumOptimizer: snapshot _velocity before serialization loop
- OptimizerBase: use RequireModel() instead of Model!
- SymbolicPhysicsLearner: Clone throws on null children like Evaluate/Format
- IncrementalPCA: fix error message to "internal state inconsistent"
- ContradictionDetector: wrap JToken.Value<bool>() in try-catch
- CriticModel: wrap JToken.Value<double>() in try-catch
- LogNormalAFT: hoist Coefficients null check outside loops (3 methods)
- WeibullAFT: hoist Coefficients null check outside loops (3 methods)
- NHiTSModel: remove redundant ContainsKey before TryGetValue
- DiffusionTools: support both 1D and 2D audio_data deserialization
- GMMDetector: validate feature count before scoring a point
- TimeSeriesForestClassifier: reject NumTrees <= 0 during training

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use guarded sigmaTs local variable consistently in dpm solver methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply feature selection in Predict() — fixes critical dimension mismatch bug (#928)

* fix: apply feature selection in AiModelResult.Predict() — fixes dimension mismatch bug

The NormalOptimizer (default) randomly selects a subset of features during
training via PrepareAndEvaluateSolution, but AiModelResult.Predict() never
applied the same feature selection to new input data. This caused the model
to receive all columns while it was trained on fewer, throwing "Number of
columns must equal length of vector" errors.

Changes:
- Add SelectedFeatureIndices (List<int>) to OptimizationStepData,
  OptimizationResult, and ModelResult to store actual column indices
- Propagate indices through optimizer flow: PrepareAndEvaluateSolution →
  UpdateBestSolution → CreateOptimizationResult
- Apply feature selection in AiModelResult.Predict() after preprocessing
  but before Model.Predict(), using OptimizerHelper.SelectFeatures()
- Add 12 integration tests verifying the full BuildAsync → Predict pipeline

Closes #927

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve 7 pr review comments - feature selection, copy semantics, docs

- Remove unused LooseTolerance constant
- Fix off-by-one: use != instead of > for feature index count check
- Apply feature selection in InferenceSession.Predict() (critical gap)
- Add comprehensive docs to SelectedFeatureIndices property
- Fix WithParameters to deep-copy collections (consistent with DeepCopy)
- Add copy independence assertion to WithParameters test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: support 1D tensor feature selection and add predict integration test

SelectFeaturesTensor previously threw for 1D tensors. Now handles them
by directly selecting elements at the specified indices. Added integration
test proving Predict applies SelectedFeatureIndices correctly with a real
FeedForwardNeuralNetwork.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use identity-check instead of count heuristic for feature selection

Replaced inputSize == featureIndices.Count heuristic with proper
identity-mapping check using SequenceEqual. A reordered selection like
[2, 0, 1] has the same count but still needs reordering. Also validates
indices are in-range before calling SelectFeatures. Renamed misleading
test names and added shape assertion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: fix collinear test fixture and add permuted feature selection test

- Fixed direct-model test to use non-collinear features with correct
  formula comment (y = 2*x1 + 3*x2 + 1) and tighter assertion ranges
- Added permutation test proving [2, 0, 1] selection is not treated as
  identity — verifies the SequenceEqual check works correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add null guards to OptimizationResult Clone and WithParameters

Callers that create OptimizationResult via object initializer (bypassing
the constructor) leave CoefficientLowerBounds, CoefficientUpperBounds,
FitnessHistory, SelectedFeatures, and SelectedFeatureIndices as null.
Deep-copy calls in Clone/WithParameters would throw NullReferenceException.
Now falls back to empty defaults when properties are null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: unify feature selection in InferenceSequence, one-time bounds check, robust assertions

- InferenceSequence.Predict() now delegates to ApplySelectedFeaturesForPrediction
  instead of using a count-based check that skipped permuted selections
- Move bounds validation to one-time check (cached _featureIndicesValidated flag)
  instead of scanning on every Predict() call
- Assert permuted predictions actually differ from identity predictions
- Replace fragile selectedCount < 10 with valid range assertion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: assert feature selection actually occurred in regression test

Add assertion verifying optimizer selected a feature subset before testing
Predict() with full-width input, proving the test exercises the reduced-column
code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cache identity check with volatile int, set deterministic weights in test

Replace per-call SequenceEqual + Enumerable.Range allocation with cached
volatile int state (0=unknown, 1=identity, 2=needs-selection). Uses a
simple loop instead of LINQ for zero-allocation identity check.
Thread-safe via volatile read/write semantics.

Set explicit non-uniform weights [1, 2, 3] in permuted feature selection
test to ensure permutation deterministically produces different output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve first batch of pr #941 review comments (10 of 54)

- InterpretabilityDashboard: filter empty attributions instead of using
  console output, use validCount for averaging
- BaggingClassifier: validate sampleSize vs data size when bootstrap=false
- Denclue: hoist _attractors and _attractorDensities guards out of loops
- BIRCH: use Labels ?? Predict(x) instead of Labels! in FitPredict
- SelfOrganizingMap: hoist _weights and _neuronLabels guards out of loops
- CLIQUE: hoist Labels guard out of accumulation loop
- SUBCLU: hoist _trainingData and Labels guards out of loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve second batch of pr #941 review comments (20 of 54)

- GraphIsomorphismLayer: add local variable aliases for _lastMlpHiddenPreRelu
  in Forward and BackwardManual for clearer null safety
- PagedAttentionKernel: add braces to ToFloat/FromFloat if blocks, replace
  null-forgiving with explicit boxing guards
- KnowledgeDistillationTrainerBase: remove Console.WriteLine from library
  checkpoint loading code
- DiffusionMemoryManager: separate PerDeviceMemory and TotalShardedMemory
  formatting into independent checks
- MixtureOfExpertsLayer: add null guard before ApplyActivationToGraph
  instead of null-forgiving operator
- KVCache: replace null-forgiving with explicit boxing guard for Half
  conversion
- CachedMultiHeadAttention: hoist _cache null guard before Append call
- CachedGroupedQueryAttention: hoist _cache null guard before Append call
- DeformableConvolutionalLayer: stale comments (variables already renamed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve build errors from linter's automated null-forgiving removal

Fix syntax and compilation errors introduced by the automated linter that
removed null-forgiving operators without proper replacements:
- Fix missing closing parentheses in foreach statements (COPKMeans, AgglomerativeClustering)
- Fix duplicate variable declarations (AntColonyOptimization, CachedGroupedQueryAttention, CachedMultiHeadAttention)
- Fix dropped underscore prefixes on private field references (KVCache, ConvLSTMLayer, PrincipalNeighbourhoodAggregationLayer)
- Add validated accessor properties for nullable cache arrays (KVCache)
- Fix variable ordering issues (FloraAdapter)
- Fix nullable reference handling (VectorModel, UncertaintyCalibrationData, CrossValidationResult, AiModelResult.Uncertainty)
- Fix Debug.WriteLine() call with no arguments (KnowledgeDistillationTrainerBase)
- Add null guard for ClassLabels in PredictProbabilities (BaggingClassifier)
- Fix nullable permutation array access (ConcatenateLayer)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist null guards for lambdas and alphaTs in DPMSolverMultistepScheduler

Replace null-forgiving operators with proper null-coalescing throws
for _lambdas and _alphaTs in FirstOrderUpdate, SecondOrderUpdate,
and ThirdOrderUpdate methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove redundant null guard in LeafSent140FederatedDatasetLoader

The testFilePath is already validated by IsNullOrWhiteSpace check
in the enclosing if statement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: capture PreprocessingInfo in local variable to prevent TOCTOU

Snapshot PreprocessingInfo once before accessing its members
to avoid potential race conditions with concurrent access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: dispose pre-permutation tensor to prevent memory leak in ConcatenateLayer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: accumulate projector gradients from both views in BarlowTwins, SimCLR, and iBOT

BarlowTwins: capture view1 projector grads before view2 backward clears them,
then combine both views' grads for the projector update.

SimCLR: replay projector forward per-view before backward to use correct caches,
and accumulate projector gradients from both views.

iBOT: backpropagate the full objective (clsLoss1 + clsLoss2 + lambda*(patchLoss1 + patchLoss2))
instead of only clsLoss1. Add ComputeMaskedPatchGradient for MSE gradient on masked patches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct backward pass in RIFE, FILM, and RAFT video models

RIFE: use cached block input shape instead of previous block output shape
for gradient splitting to correctly match the fused input width.

FILM: fail fast on all required backward-pass caches instead of silently
skipping fusion and flow estimator backprop when caches are null.

RAFT: replace incorrect upsampleConv.Backward with bilinear UpsampleFlowBackward
that mirrors the actual forward upsampling, and hoist invariant layer guards
out of the refinement loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: strengthen DataLeakagePrevention tests with behavioral assertions

ScalerStatisticsReflectTrainingDataOnly: add positive assertion that the
pipeline transformation is not identity, confirming it was fitted on data.

StillWorksCorrectly: exercise the model with actual predictions and assert
outputs are finite numbers, not just that objects are non-null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address major PR review comments across multiple files

KNNEvaluator: hoist _trainLabels null check outside the voting loop.
JointMutualInformation: use local miScores instead of field on line 212.
ContradictionDetector: treat malformed contradictory fields as false instead
of falling through to text scan that matches JSON key names.
BFGSOptimizer: guard inverse Hessian update with curvature condition (y*s > 0)
and use validated locals instead of null-forgiving operators.
IncrementalPCA: validate all fitted state upfront in PartialFit and remove
null-forgiving on _explainedVariance.
RealGatedLinearRecurrenceLayer: hoist all cached tensor validation to top of
Backward to prevent partial gradient corruption.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address remaining major PR review comments

PreprocessingRegistry: align documentation — clarify get/set is thread-safe
but concurrent builds still overwrite each other's pipeline.

InMemoryFederatedTrainer: fail fast when HE is enabled but no parameter
indices are resolved instead of silently downgrading. Fix DP accounting:
FedAsync uses 1/totalClients per iteration, FedBuff uses bufferSize/totalClients.

AiModelBuilder: extract IsTimeSeriesTask() helper that checks both AutoML
options and model type name for chronological split decisions. Reject
row-changing data preparation in natural-client federated mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address trivial PR review comments across multiple files

DeformableConvolutionalLayer: rename misleading variable names (mwShape/mbShape
to maskWeights/maskBias, mw2/mwg/mbg to maskW/maskWeightGrad/maskBiasGrad).

S4DLayer: hoist _lastHiddenStatesReal/Imag guards out of the BPTT loop.

OnlineFeatureSelector: pass resolved running arrays into ComputeOnlineCorrelation
instead of re-checking nullable fields inside the hot loop.

SSLPretrainingPipeline: extract encoder to a local variable to avoid duplicate
null-check expressions.

SubpixelConvolutionalLayer: hoist ScalarActivation out of the per-element loop.

BaggingClassifier: replace OrderBy(random.Next()) with Fisher-Yates partial
shuffle for O(sampleSize) unbiased sampling without replacement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist gradient tensor validations in HedgehogLayer.FeatureMapBackward

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 12 pr review comments across multiple files

- FloraAdapter: fix momentum reset bug (was zeroed every step)
- KnowledgeDistillationTrainerBase: remove dead checkpoint branch
- KVCache: hoist loop-invariant lookups outside inner loops
- AiModelResult: capture stable model reference in explainer delegate
- VectorModel: guard empty coefficients, remove dead BoxOrDefault
- ConvLSTMLayer: use local lastInput instead of field reference
- PrincipalNeighbourhoodAggregationLayer: hoist invariant tensors,
  remove null-forgiving operators in fallback path
- BarlowTwins: remove redundant projector null-check
- FILM: use local-cache pattern for cached features
- DeepFilterNet: propagate gain layer backward gradient

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: net471 build errors in baggingclassifier and leafsent140loader

- Replace range syntax pool[..sampleSize] with Array.Copy for net471
  compatibility (RuntimeHelpers.GetSubArray not available)
- Add explicit null check before string.IsNullOrWhiteSpace for net471
  compatibility (missing NotNullWhen attribute)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use correct inner class name and remove null-forgiving in dashboard

Replace InterpretabilityExplanation with Explanation (the actual inner
class name) and replace null-forgiving operators with proper null guards.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address pr #941 review comments

- KDTree: use single guarded local for _data null check in Compare
- BallTree: use single guarded local for _data null check in Sort
- DiffusionMemoryManager: remove duplicate unconditional total sharded line
- InfluenceFunctionExplainer: hoist cachedGradients null check outside loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AiModelResult.Predict() ignores feature selection — dimension mismatch on all optimized models

3 participants