This document outlines the comprehensive implementation plan to address all findings from the PR#430 audit. Every item must be implemented with production-ready code before the PR can be considered complete.
Total Items to Address: 13
- Missing Implementations: 3
- Simplified Code Requiring Production-Ready Fixes: 10
Priority: HIGH Estimated Complexity: High
- Configuration parameters exist in
ContinualLearnerConfig.cs(lines 156-162):PnnUseLateralConnectionsPnnLateralScaling
- No actual strategy implementation exists
File to Create: src/ContinualLearning/Strategies/ProgressiveNeuralNetworksStrategy.cs
Algorithm Description: Progressive Neural Networks freeze previous task columns and add new columns with lateral connections for each new task. This prevents catastrophic forgetting by preserving old knowledge while allowing new learning.
Key Components to Implement:
-
Column Management
- Maintain list of frozen neural network columns (one per task)
- Add new trainable column for each new task
- Freeze previous columns when task completes
-
Lateral Connections
- Implement adapter layers connecting previous columns to new column
- Apply lateral scaling factor from config
- Forward pass must aggregate activations from all columns
-
Training Logic
- Only train parameters in the newest column
- Lateral connection weights are trainable
- Previous column weights remain frozen
Reference Paper: Rusu et al., "Progressive Neural Networks" (2016)
Interface to Implement:
public class ProgressiveNeuralNetworksStrategy<T, TInput, TOutput>
: ContinualStrategyBase<T, TInput, TOutput>, IProgressiveStrategy<T, TInput, TOutput>
{
// Column storage
private readonly List<ILayer<T>[]> _frozenColumns;
private ILayer<T>[]? _activeColumn;
private readonly List<Matrix<T>[]> _lateralConnections;
// Core methods
public override void BeforeTaskTraining(int taskId, IDataset<T, TInput, TOutput> taskData);
public override void AfterTaskTraining(int taskId, IFullModel<T, TInput, TOutput> model);
public override Tensor<T> Forward(Tensor<T> input, int taskId);
public void FreezeCurrentColumn();
public void AddNewColumn(int[] layerSizes);
public Matrix<T> ComputeLateralActivations(int columnIndex, int layerIndex, Tensor<T> input);
}Priority: HIGH Estimated Complexity: High
- Configuration parameters exist in
ContinualLearnerConfig.cs(lines 146-152):PackNetPruneRatioPackNetRetrainEpochs
- No actual strategy implementation exists
File to Create: src/ContinualLearning/Strategies/PackNetStrategy.cs
Algorithm Description: PackNet iteratively prunes and freezes network weights after each task, freeing capacity for new tasks while preserving performance on old tasks.
Key Components to Implement:
-
Weight Masking System
- Binary masks for each layer indicating which weights are "owned" by which task
- Cumulative mask tracking all frozen weights
- Available capacity mask for new task training
-
Pruning Algorithm
- Magnitude-based pruning (prune smallest weights)
- Configurable prune ratio per task
- Preserve minimum weights needed for task performance
-
Retraining Phase
- After pruning, retrain remaining weights
- Only train weights not frozen by previous tasks
- Validate performance doesn't degrade
-
Inference Logic
- Apply appropriate mask for each task during inference
- Support multi-task inference with task ID
Reference Paper: Mallya & Lazebnik, "PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning" (CVPR 2018)
Interface to Implement:
public class PackNetStrategy<T, TInput, TOutput>
: ContinualStrategyBase<T, TInput, TOutput>, IPackNetStrategy<T, TInput, TOutput>
{
// Mask storage
private readonly Dictionary<int, List<Tensor<T>>> _taskMasks; // Task ID -> layer masks
private readonly List<Tensor<T>> _frozenMasks; // Cumulative frozen weights
// Core methods
public override void AfterTaskTraining(int taskId, IFullModel<T, TInput, TOutput> model);
public void PruneNetwork(IFullModel<T, TInput, TOutput> model, T pruneRatio);
public void FreezeTaskWeights(int taskId);
public Tensor<T> GetAvailableCapacityMask(int layerIndex);
public void ApplyMaskForTask(IFullModel<T, TInput, TOutput> model, int taskId);
public void RetrainAfterPruning(IFullModel<T, TInput, TOutput> model, IDataset<T, TInput, TOutput> data, int epochs);
}Priority: HIGH Estimated Complexity: Medium
- Completely missing from Active Learning
- No configuration, no implementation
File to Create: src/ActiveLearning/Strategies/ExpectedGradientLengthStrategy.cs
Algorithm Description: EGL selects samples that would cause the largest gradient update if labeled. It estimates the expected gradient length across possible labels, selecting samples that would most change the model.
Key Components to Implement:
-
Gradient Computation
- Compute gradients for each possible label
- Use model's actual loss function (not simplified MSE)
- Handle multi-class and regression cases
-
Expected Length Calculation
- Weight gradients by predicted probability of each label
- Compute L2 norm of expected gradient
- Normalize across samples for fair comparison
-
Efficient Implementation
- Batch gradient computation where possible
- Cache intermediate computations
- Support for models with
IGradientComputableinterface
Reference Paper: Settles & Craven, "An Analysis of Active Learning Strategies for Sequence Labeling Tasks" (EMNLP 2008)
Interface to Implement:
public class ExpectedGradientLengthStrategy<T, TInput, TOutput>
: IActiveLearningStrategy<T, TInput, TOutput>
{
public string Name => "Expected Gradient Length";
// Core methods
public Vector<T> ComputeScores(
IDataset<T, TInput, TOutput> unlabeledPool,
IFullModel<T, TInput, TOutput> model);
public int[] SelectSamples(
IDataset<T, TInput, TOutput> unlabeledPool,
IFullModel<T, TInput, TOutput> model,
int count);
// Internal methods
private T ComputeExpectedGradientLength(
TInput input,
IFullModel<T, TInput, TOutput> model);
private Vector<T> ComputeGradientForLabel(
TInput input,
TOutput hypotheticalLabel,
IFullModel<T, TInput, TOutput> model);
}File: src/ContinualLearning/Memory/ExperienceReplayBuffer.cs
Line: 420
Current Issue: "Simplified herding: select diverse examples using hash-based diversity"
Uses GetHashCode() for diversity measurement instead of proper feature extraction and mean matching.
Proper Herding Algorithm:
- Extract feature representations from model's penultimate layer
- Compute running mean of selected samples
- Iteratively select sample that moves mean closest to population mean
- Use actual feature distances, not hash codes
private List<ReplayExperience<T, TInput, TOutput>> HerdingSample(int count)
{
var selected = new List<ReplayExperience<T, TInput, TOutput>>();
var remaining = new List<int>(Enumerable.Range(0, _buffer.Count));
// Extract features for all samples (requires model access)
var features = ExtractFeatures(_buffer);
// Compute population mean
var populationMean = ComputeMean(features);
// Running sum of selected features
var selectedSum = new Vector<T>(features[0].Length);
for (int i = 0; i < count && remaining.Count > 0; i++)
{
int bestIdx = -1;
T bestDistance = NumOps.MaxValue;
foreach (var idx in remaining)
{
// Compute mean if we add this sample
var newSum = VectorAdd(selectedSum, features[idx]);
var newMean = VectorDivide(newSum, NumOps.FromDouble(i + 1));
// Distance to population mean
var distance = ComputeL2Distance(newMean, populationMean);
if (NumOps.Compare(distance, bestDistance) < 0)
{
bestDistance = distance;
bestIdx = idx;
}
}
if (bestIdx >= 0)
{
selected.Add(_buffer[bestIdx]);
selectedSum = VectorAdd(selectedSum, features[bestIdx]);
remaining.Remove(bestIdx);
}
}
return selected;
}File: src/ContinualLearning/Memory/ExperienceReplayBuffer.cs
Line: 449
Current Issue: "Simplified K-Center greedy: use hash-based distance approximation"
Uses hash codes as distance proxy instead of actual feature-space distances.
Proper K-Center Greedy Algorithm:
- Extract feature representations for all samples
- Initialize with random or farthest-first sample
- Iteratively select sample farthest from all selected samples
- Use actual L2 or cosine distance in feature space
private List<ReplayExperience<T, TInput, TOutput>> KCenterSample(int count)
{
if (_buffer.Count <= count)
return new List<ReplayExperience<T, TInput, TOutput>>(_buffer);
var selected = new List<int>();
var features = ExtractFeatures(_buffer);
// Start with random sample
selected.Add(RandomHelper.Shared.Next(_buffer.Count));
// Track minimum distance to any selected sample for each point
var minDistances = new T[_buffer.Count];
for (int i = 0; i < _buffer.Count; i++)
{
minDistances[i] = NumOps.MaxValue;
}
while (selected.Count < count)
{
int lastSelected = selected[^1];
// Update minimum distances
for (int i = 0; i < _buffer.Count; i++)
{
if (selected.Contains(i)) continue;
var dist = ComputeL2Distance(features[i], features[lastSelected]);
if (NumOps.Compare(dist, minDistances[i]) < 0)
{
minDistances[i] = dist;
}
}
// Select point with maximum minimum distance (farthest from all selected)
int bestIdx = -1;
T maxMinDist = NumOps.MinValue;
for (int i = 0; i < _buffer.Count; i++)
{
if (selected.Contains(i)) continue;
if (NumOps.Compare(minDistances[i], maxMinDist) > 0)
{
maxMinDist = minDistances[i];
bestIdx = i;
}
}
if (bestIdx >= 0)
{
selected.Add(bestIdx);
}
}
return selected.Select(i => _buffer[i]).ToList();
}File: src/ContinualLearning/Strategies/MemoryAwareSynapses.cs
Line: 527-529
Current Issue: Just falls back to ComputeOutputSensitivity
Proper Random Projection Implementation:
- Generate stable random projection matrix
- Project parameter gradients onto random directions
- Compute importance as gradient magnitude in projected space
private Vector<T> ComputeRandomProjectionImportance(
IFullModel<T, TInput, TOutput> model,
IDataset<T, TInput, TOutput> dataset)
{
var parameters = model.GetParameters();
int paramCount = parameters.Length;
int projectionDim = Math.Min(paramCount, 100); // Reduced dimensionality
// Generate stable random projection matrix (seeded for reproducibility)
var projectionMatrix = GenerateRandomProjectionMatrix(paramCount, projectionDim, seed: 42);
var importanceAccumulator = new Vector<T>(paramCount);
foreach (var (input, output) in dataset.GetBatches(1))
{
// Compute gradients
var gradients = ComputeParameterGradients(model, input, output);
// Project gradients
var projectedGrad = MatrixVectorMultiply(projectionMatrix, gradients);
// Backproject to get importance estimate
var backprojected = MatrixTransposeVectorMultiply(projectionMatrix, projectedGrad);
// Accumulate squared importance
for (int i = 0; i < paramCount; i++)
{
var squared = NumOps.Multiply(backprojected[i], backprojected[i]);
importanceAccumulator[i] = NumOps.Add(importanceAccumulator[i], squared);
}
}
// Normalize by dataset size
var scale = NumOps.FromDouble(1.0 / dataset.Count);
return VectorScale(importanceAccumulator, scale);
}File: src/ContinualLearning/Strategies/MemoryAwareSynapses.cs
Line: 535-540
Current Issue: "Fall back for now"
Proper Fisher Information Diagonal:
- Compute log-likelihood gradients for each sample
- Square the gradients (Fisher = E[grad log p * grad log p^T])
- Average across dataset for diagonal Fisher approximation
private Vector<T> ComputeFisherDiagonalImportance(
IFullModel<T, TInput, TOutput> model,
IDataset<T, TInput, TOutput> dataset)
{
var parameters = model.GetParameters();
int paramCount = parameters.Length;
var fisherDiagonal = new Vector<T>(paramCount);
foreach (var (input, output) in dataset.GetBatches(1))
{
// Get model prediction (for log-likelihood gradient)
var prediction = model.Predict(input);
// Compute gradient of log-likelihood w.r.t. parameters
// For classification: grad log p(y|x,θ)
// For regression: grad log p(y|x,θ) under Gaussian assumption
var logLikelihoodGradients = ComputeLogLikelihoodGradients(model, input, output, prediction);
// Fisher diagonal is E[g * g^T] diagonal = E[g^2]
for (int i = 0; i < paramCount; i++)
{
var gradSquared = NumOps.Multiply(logLikelihoodGradients[i], logLikelihoodGradients[i]);
fisherDiagonal[i] = NumOps.Add(fisherDiagonal[i], gradSquared);
}
}
// Average over dataset
var scale = NumOps.FromDouble(1.0 / dataset.Count);
return VectorScale(fisherDiagonal, scale);
}
private Vector<T> ComputeLogLikelihoodGradients(
IFullModel<T, TInput, TOutput> model,
TInput input,
TOutput target,
TOutput prediction)
{
if (model is IGradientComputable<T, Tensor<T>, Tensor<T>> gradModel)
{
// Use cross-entropy loss for classification (gives log-likelihood gradient)
var inputTensor = ConvertToTensor(input);
var targetTensor = ConvertToTensor(target);
return gradModel.ComputeGradients(inputTensor, targetTensor);
}
// Numerical gradient fallback
return ComputeNumericalLogLikelihoodGradients(model, input, target);
}File: src/ContinualLearning/Strategies/MemoryAwareSynapses.cs
Line: 545-549
Current Issue: "Fall back for now"
Proper Hebbian Importance:
- Track co-activation patterns between connected neurons
- Importance = strength of learned associations (Hebb's rule)
- Weights that fire together frequently are more important
private Vector<T> ComputeHebbianImportance(
IFullModel<T, TInput, TOutput> model,
IDataset<T, TInput, TOutput> dataset)
{
var parameters = model.GetParameters();
int paramCount = parameters.Length;
var hebbianImportance = new Vector<T>(paramCount);
if (model is INeuralNetworkModel<T> nnModel)
{
var layers = nnModel.GetLayers();
int paramOffset = 0;
foreach (var layer in layers)
{
var layerParams = layer.GetParameters();
int layerParamCount = layerParams.Length;
// Compute average activations for this layer
var preActivations = new List<Vector<T>>();
var postActivations = new List<Vector<T>>();
foreach (var (input, _) in dataset.GetBatches(1))
{
var (pre, post) = GetLayerActivations(nnModel, layer, input);
preActivations.Add(pre);
postActivations.Add(post);
}
// Hebbian importance: correlation between pre and post activations
// For weight w_ij: importance = E[pre_i * post_j]
var layerImportance = ComputeHebbianForLayer(
layerParams, preActivations, postActivations);
for (int i = 0; i < layerParamCount; i++)
{
hebbianImportance[paramOffset + i] = layerImportance[i];
}
paramOffset += layerParamCount;
}
}
else
{
// For non-neural network models, fall back to output sensitivity
return ComputeOutputSensitivity(model, dataset);
}
return hebbianImportance;
}
private Vector<T> ComputeHebbianForLayer(
Vector<T> weights,
List<Vector<T>> preActivations,
List<Vector<T>> postActivations)
{
var importance = new Vector<T>(weights.Length);
int n = preActivations.Count;
// Assuming weight matrix of shape [out_features, in_features]
// Weight at position (i,j) connects pre[j] to post[i]
int outDim = postActivations[0].Length;
int inDim = preActivations[0].Length;
for (int sample = 0; sample < n; sample++)
{
for (int i = 0; i < outDim; i++)
{
for (int j = 0; j < inDim; j++)
{
int weightIdx = i * inDim + j;
if (weightIdx < weights.Length)
{
// Hebbian: pre * post
var hebbian = NumOps.Multiply(
preActivations[sample][j],
postActivations[sample][i]);
// Use absolute value as importance
var absHebbian = NumOps.Abs(hebbian);
importance[weightIdx] = NumOps.Add(importance[weightIdx], absHebbian);
}
}
}
}
// Average over samples
var scale = NumOps.FromDouble(1.0 / n);
return VectorScale(importance, scale);
}File: src/ContinualLearning/Strategies/SynapticIntelligence.cs
Line: 555
Current Issue: "This is a simplified version - in practice, you'd need layer boundary info"
Proper Layer-Aware Statistics:
- Access actual layer structure from model
- Compute statistics per layer, not arbitrary chunks
- Include layer type information in statistics
private Dictionary<string, object> ComputeLayerStatistics(Vector<T> importance)
{
var stats = new Dictionary<string, object>();
if (_model is INeuralNetworkModel<T> nnModel)
{
var layers = nnModel.GetLayers();
int paramOffset = 0;
var layerStats = new List<Dictionary<string, object>>();
foreach (var layer in layers)
{
var layerParams = layer.GetParameters();
int layerParamCount = layerParams.Length;
if (layerParamCount == 0)
{
paramOffset += layerParamCount;
continue;
}
// Extract importance values for this layer
var layerImportance = new T[layerParamCount];
for (int i = 0; i < layerParamCount; i++)
{
layerImportance[i] = importance[paramOffset + i];
}
// Compute layer statistics
var layerStat = new Dictionary<string, object>
{
["LayerName"] = layer.Name,
["LayerType"] = layer.GetType().Name,
["ParameterCount"] = layerParamCount,
["MeanImportance"] = NumOps.ToDouble(ComputeMean(layerImportance)),
["MaxImportance"] = NumOps.ToDouble(ComputeMax(layerImportance)),
["MinImportance"] = NumOps.ToDouble(ComputeMin(layerImportance)),
["StdImportance"] = NumOps.ToDouble(ComputeStd(layerImportance)),
["SparsityRatio"] = ComputeSparsityRatio(layerImportance, threshold: 1e-6)
};
layerStats.Add(layerStat);
paramOffset += layerParamCount;
}
stats["LayerStatistics"] = layerStats;
stats["TotalLayers"] = layers.Count;
stats["TotalParameters"] = importance.Length;
}
else
{
// For non-neural network models, provide aggregate statistics only
stats["MeanImportance"] = NumOps.ToDouble(ComputeMean(importance.ToArray()));
stats["MaxImportance"] = NumOps.ToDouble(ComputeMax(importance.ToArray()));
stats["TotalParameters"] = importance.Length;
stats["Note"] = "Model does not expose layer structure";
}
return stats;
}
private double ComputeSparsityRatio(T[] values, double threshold)
{
int sparseCount = values.Count(v =>
NumOps.Compare(NumOps.Abs(v), NumOps.FromDouble(threshold)) < 0);
return (double)sparseCount / values.Length;
}File: src/ActiveLearning/Strategies/CoreSetStrategy.cs
Line: 184-187
Current Issue: Returns equal weights instead of actual density-based weights
Proper Density-Based Weighting:
- Extract features from samples
- Compute local density using k-nearest neighbors
- Weight samples inversely to density (rare samples are more valuable)
private Vector<T> ComputeDensityWeights(
IDataset<T, TInput, TOutput> pool,
IFullModel<T, TInput, TOutput>? model)
{
int n = pool.Count;
var weights = new T[n];
// Extract features
var features = new Vector<T>[n];
for (int i = 0; i < n; i++)
{
features[i] = ExtractFeatures(pool.GetInput(i), model);
}
// Compute k-NN density for each sample
int k = Math.Min(10, n - 1); // k neighbors for density estimation
for (int i = 0; i < n; i++)
{
// Find k nearest neighbors
var distances = new List<(int idx, T dist)>();
for (int j = 0; j < n; j++)
{
if (i == j) continue;
var dist = ComputeL2Distance(features[i], features[j]);
distances.Add((j, dist));
}
// Sort by distance and take k smallest
distances.Sort((a, b) => NumOps.Compare(a.dist, b.dist));
var kNearest = distances.Take(k).ToList();
// Density = 1 / (average distance to k neighbors)
var avgDist = NumOps.Zero;
foreach (var (_, dist) in kNearest)
{
avgDist = NumOps.Add(avgDist, dist);
}
avgDist = NumOps.Divide(avgDist, NumOps.FromDouble(k));
// Add small epsilon to avoid division by zero
var epsilon = NumOps.FromDouble(1e-10);
avgDist = NumOps.Add(avgDist, epsilon);
// Weight inversely proportional to density
// Low density (isolated samples) get high weight
weights[i] = avgDist; // avgDist is already inverse of density
}
// Normalize weights to sum to 1
var weightSum = NumOps.Zero;
foreach (var w in weights)
{
weightSum = NumOps.Add(weightSum, w);
}
for (int i = 0; i < n; i++)
{
weights[i] = NumOps.Divide(weights[i], weightSum);
}
return new Vector<T>(weights);
}
private Vector<T> ExtractFeatures(TInput input, IFullModel<T, TInput, TOutput>? model)
{
// Try to get features from model
if (model is IFeatureExtractor<T, TInput> featureExtractor)
{
return featureExtractor.ExtractFeatures(input);
}
// Fall back to using prediction as features
if (model != null)
{
var prediction = model.Predict(input);
return ConvertToVector(prediction);
}
// Last resort: convert input directly
return ConvertToVector(input);
}File: src/ActiveLearning/Core/ActiveLearner.cs
Line: 642
Current Issue: "This is a simplified version - real implementations would use the model's loss function"
Use Model's Actual Loss Function:
- Check if model exposes its loss function
- Use appropriate loss for model type (cross-entropy, MSE, etc.)
- Support custom loss functions
private T ComputeSampleLoss(
TInput input,
TOutput expectedOutput,
IFullModel<T, TInput, TOutput> model)
{
var prediction = model.Predict(input);
// Try to use model's native loss function
if (model is ILossComputable<T, TOutput> lossModel)
{
return lossModel.ComputeLoss(prediction, expectedOutput);
}
// Determine appropriate loss based on output type
if (prediction is Vector<T> predVec && expectedOutput is Vector<T> targetVec)
{
// Check if this looks like classification (softmax output)
if (IsClassificationOutput(predVec))
{
return ComputeCrossEntropyLoss(predVec, targetVec);
}
else
{
return ComputeMSELoss(predVec, targetVec);
}
}
// Scalar output - use squared error
if (prediction is T predScalar && expectedOutput is T targetScalar)
{
var diff = NumOps.Subtract(predScalar, targetScalar);
return NumOps.Multiply(diff, diff);
}
// Fallback to generic comparison
return ComputeGenericLoss(prediction, expectedOutput);
}
private bool IsClassificationOutput(Vector<T> output)
{
// Classification outputs typically sum to ~1 (softmax)
var sum = NumOps.Zero;
foreach (var v in output)
{
sum = NumOps.Add(sum, v);
}
var diff = NumOps.Subtract(sum, NumOps.One);
return NumOps.Compare(NumOps.Abs(diff), NumOps.FromDouble(0.1)) < 0;
}
private T ComputeCrossEntropyLoss(Vector<T> prediction, Vector<T> target)
{
var loss = NumOps.Zero;
var epsilon = NumOps.FromDouble(1e-15);
for (int i = 0; i < prediction.Length; i++)
{
// Clamp prediction to avoid log(0)
var clampedPred = NumOps.Compare(prediction[i], epsilon) > 0
? prediction[i]
: epsilon;
// -target * log(prediction)
var logPred = NumOps.Log(clampedPred);
var term = NumOps.Multiply(target[i], logPred);
loss = NumOps.Subtract(loss, term);
}
return loss;
}
private T ComputeMSELoss(Vector<T> prediction, Vector<T> target)
{
var loss = NumOps.Zero;
int length = Math.Min(prediction.Length, target.Length);
for (int i = 0; i < length; i++)
{
var diff = NumOps.Subtract(prediction[i], target[i]);
loss = NumOps.Add(loss, NumOps.Multiply(diff, diff));
}
return NumOps.Divide(loss, NumOps.FromDouble(length));
}
private T ComputeGenericLoss(TOutput prediction, TOutput target)
{
// Try to convert to vectors and compute MSE
var predVec = ConvertToVector(prediction);
var targetVec = ConvertToVector(target);
return ComputeMSELoss(predVec, targetVec);
}File: src/CurriculumLearning/CurriculumLearner.cs
Line: 520
Current Issue: "simplified - assumes comparable outputs" using direct .Equals()
Proper Evaluation with Tolerance and Type-Awareness:
- Use appropriate comparison for output type
- Support classification accuracy (argmax comparison)
- Support regression with configurable tolerance
private CurriculumEvaluationResult<T> Evaluate(
IFullModel<T, TInput, TOutput> model,
IDataset<T, TInput, TOutput> dataset)
{
if (dataset.Count == 0)
{
return new CurriculumEvaluationResult<T>
{
Accuracy = NumOps.Zero,
Loss = NumOps.Zero,
SampleCount = 0
};
}
int correct = 0;
var totalLoss = NumOps.Zero;
for (int i = 0; i < dataset.Count; i++)
{
var input = dataset.GetInput(i);
var expectedOutput = dataset.GetOutput(i);
var prediction = model.Predict(input);
// Compute loss
var sampleLoss = ComputeLoss(prediction, expectedOutput);
totalLoss = NumOps.Add(totalLoss, sampleLoss);
// Check correctness
if (IsCorrectPrediction(prediction, expectedOutput))
{
correct++;
}
}
var accuracy = NumOps.FromDouble((double)correct / dataset.Count);
var avgLoss = NumOps.Divide(totalLoss, NumOps.FromDouble(dataset.Count));
return new CurriculumEvaluationResult<T>
{
Accuracy = accuracy,
Loss = avgLoss,
SampleCount = dataset.Count
};
}
private bool IsCorrectPrediction(TOutput prediction, TOutput expected)
{
// Classification: compare argmax
if (prediction is Vector<T> predVec && expected is Vector<T> expVec)
{
// If looks like one-hot or probability distribution, compare argmax
if (predVec.Length > 1 && expVec.Length > 1)
{
int predClass = ArgMax(predVec);
int expClass = ArgMax(expVec);
return predClass == expClass;
}
// Otherwise, use tolerance-based comparison
return VectorsApproximatelyEqual(predVec, expVec, tolerance: 0.01);
}
// Regression: use tolerance
if (prediction is T predScalar && expected is T expScalar)
{
var diff = NumOps.Abs(NumOps.Subtract(predScalar, expScalar));
var tolerance = NumOps.FromDouble(0.01);
return NumOps.Compare(diff, tolerance) < 0;
}
// Last resort: direct equality
return prediction?.Equals(expected) ?? expected == null;
}
private int ArgMax(Vector<T> vec)
{
if (vec.Length == 0) return 0;
int maxIdx = 0;
T maxVal = vec[0];
for (int i = 1; i < vec.Length; i++)
{
if (NumOps.Compare(vec[i], maxVal) > 0)
{
maxVal = vec[i];
maxIdx = i;
}
}
return maxIdx;
}
private bool VectorsApproximatelyEqual(Vector<T> a, Vector<T> b, double tolerance)
{
if (a.Length != b.Length) return false;
var tolT = NumOps.FromDouble(tolerance);
for (int i = 0; i < a.Length; i++)
{
var diff = NumOps.Abs(NumOps.Subtract(a[i], b[i]));
if (NumOps.Compare(diff, tolT) > 0)
{
return false;
}
}
return true;
}File: src/CurriculumLearning/CurriculumLearner.cs
Line: 780
Current Issue: "Logs a message (placeholder for actual logging infrastructure)"
Proper Logging Integration:
- Use ILogger interface for dependency injection
- Support multiple log levels
- Include structured logging with context
// Add to class fields
private readonly ILogger<CurriculumLearner<T, TInput, TOutput>>? _logger;
// Update constructor to accept logger
public CurriculumLearner(
ICurriculumScheduler<T> scheduler,
IDifficultyEstimator<T, TInput, TOutput> difficultyEstimator,
CurriculumLearnerConfig<T>? config = null,
ILogger<CurriculumLearner<T, TInput, TOutput>>? logger = null)
{
_scheduler = scheduler ?? throw new ArgumentNullException(nameof(scheduler));
_difficultyEstimator = difficultyEstimator ?? throw new ArgumentNullException(nameof(difficultyEstimator));
_config = config ?? new CurriculumLearnerConfig<T>();
_logger = logger;
// ... rest of constructor
}
// Replace Log method
private void Log(string message, LogLevel level = LogLevel.Information)
{
if (_logger != null)
{
switch (level)
{
case LogLevel.Debug:
_logger.LogDebug(message);
break;
case LogLevel.Information:
_logger.LogInformation(message);
break;
case LogLevel.Warning:
_logger.LogWarning(message);
break;
case LogLevel.Error:
_logger.LogError(message);
break;
default:
_logger.LogInformation(message);
break;
}
}
// Also invoke event for backward compatibility
OnLogMessage?.Invoke(this, new LogEventArgs(message, level));
}
// Add logging event for non-DI scenarios
public event EventHandler<LogEventArgs>? OnLogMessage;
public class LogEventArgs : EventArgs
{
public string Message { get; }
public LogLevel Level { get; }
public DateTime Timestamp { get; }
public LogEventArgs(string message, LogLevel level)
{
Message = message;
Level = level;
Timestamp = DateTime.UtcNow;
}
}- 2.8 ActiveLearner - ComputeSampleLoss - Many components depend on proper loss computation
- 2.9 CurriculumLearner - Evaluate - Needed for accurate training metrics
- 2.10 CurriculumLearner - Logging - Helps debug subsequent implementations
- 2.1 ExperienceReplayBuffer - HerdingSample - Requires feature extraction helper
- 2.2 ExperienceReplayBuffer - KCenterSample - Uses same feature extraction
- 2.7 CoreSetStrategy - ComputeDensityWeights - Similar feature extraction needs
- 2.3 MemoryAwareSynapses - RandomProjection - Independent
- 2.4 MemoryAwareSynapses - FisherDiagonal - Independent
- 2.5 MemoryAwareSynapses - Hebbian - Requires layer access
- 2.6 SynapticIntelligence - LayerStatistics - Requires layer access
- 1.3 Expected Gradient Length (EGL) - Medium complexity, no dependencies
- 1.1 Progressive Neural Networks (PNN) - High complexity, requires column management
- 1.2 PackNet Strategy - High complexity, requires masking system
Several fixes require common functionality. Create these shared utilities first:
File: src/Common/FeatureExtractionHelper.cs
public static class FeatureExtractionHelper<T>
{
public static Vector<T> ExtractFeatures<TInput, TOutput>(
TInput input,
IFullModel<T, TInput, TOutput>? model)
{
// Implementation that tries multiple approaches
}
public static Matrix<T> ExtractBatchFeatures<TInput, TOutput>(
IDataset<T, TInput, TOutput> dataset,
IFullModel<T, TInput, TOutput>? model)
{
// Efficient batch feature extraction
}
}File: src/Common/DistanceHelper.cs
public static class DistanceHelper<T>
{
public static T ComputeL2Distance(Vector<T> a, Vector<T> b);
public static T ComputeCosineDistance(Vector<T> a, Vector<T> b);
public static T ComputeSquaredL2Distance(Vector<T> a, Vector<T> b);
public static Matrix<T> ComputePairwiseDistances(Vector<T>[] vectors);
}File: src/Common/LossFunctionHelper.cs
public static class LossFunctionHelper<T>
{
public static T ComputeCrossEntropy(Vector<T> prediction, Vector<T> target);
public static T ComputeMSE(Vector<T> prediction, Vector<T> target);
public static T ComputeMAE(Vector<T> prediction, Vector<T> target);
public static T ComputeHuberLoss(Vector<T> prediction, Vector<T> target, T delta);
}Each implementation must include:
- Unit Tests - Test individual methods in isolation
- Integration Tests - Test interaction with real models
- Performance Benchmarks - Ensure implementations are efficient
- Edge Case Tests - Empty datasets, single samples, large datasets
tests/ContinualLearning/Strategies/ProgressiveNeuralNetworksStrategyTests.cstests/ContinualLearning/Strategies/PackNetStrategyTests.cstests/ActiveLearning/Strategies/ExpectedGradientLengthStrategyTests.cs
Before marking any item complete:
- Code compiles without errors or warnings
- No
// simplified,// placeholder,// TODOcomments remain - Unit tests pass
- Integration with existing code verified
- XML documentation complete
- No hardcoded
doubleorfloat(use genericT) - Proper error handling with meaningful exceptions
- Thread-safety considered for parallel scenarios
| Category | Items | Priority |
|---|---|---|
| Missing Implementations | 3 (PNN, PackNet, EGL) | HIGH |
| Simplified Code Fixes | 10 | HIGH |
| Shared Utilities | 3 | MEDIUM (before dependent fixes) |
| Total Work Items | 16 | - |
Estimated Effort: Significant - each item requires careful implementation with proper algorithms, not quick fixes.