Skip to content

Latest commit

 

History

History
447 lines (375 loc) · 12.1 KB

File metadata and controls

447 lines (375 loc) · 12.1 KB

FinML Technical Documentation

Overview

FinML is a high-performance C++ machine learning library specifically designed for financial applications. It implements a dynamic computational graph architecture with automatic differentiation, supporting LSTM, CNN, and sequential neural networks. The library is built with a focus on efficiency, scalability, and numerical stability.

Core Components

1. Computational Graph (src/core/value.cpp)

The foundation of FinML's automatic differentiation system:

class Value {
    float data;           // Scalar value
    float grad;          // Gradient
    std::string op;      // Operation type
    std::vector<ValuePtr> prev;  // Previous nodes
    std::function<void()> backward;  // Backward pass function
};

Key features:

  • Dynamic graph construction
  • Automatic gradient computation
  • Memory-efficient backpropagation
  • Support for complex mathematical operations
  • Chain rule implementation for composite functions

2. Matrix Operations (src/core/matrix.cpp)

Implements efficient matrix operations with automatic differentiation:

class Matrix {
    std::vector<std::vector<ValuePtr>> data;
    size_t numRows;
    size_t numCols;
};

Operations include:

  • Matrix multiplication with gradient tracking
  • Element-wise operations
  • Broadcasting support
  • Efficient memory management
  • Transpose operations
  • Matrix decomposition methods

3. Time Series Processing (src/data/timeseries.cpp)

Implements time series data handling and preprocessing:

class TimeSeries {
    std::vector<double> data;
    std::vector<std::chrono::system_clock::time_point> timestamps;
    size_t sequence_length;
};

Features:

  • Sliding window sequence generation
  • Time-based feature extraction
  • Missing data handling
  • Trend and seasonality decomposition
  • Rolling statistics computation
  • Data normalization and scaling

4. Neural Network Layers

4.1 Sequential Layer (src/layers/sequential_layer.cpp)

Implements a container for sequential layer operations:

class SequentialLayer {
    std::vector<std::shared_ptr<Layer>> layers;
    bool training_mode;
};

Features:

  • Dynamic layer composition
  • Forward pass through all layers
  • Backward pass with gradient flow
  • Layer-wise dropout application
  • Batch normalization support
  • Layer freezing capability

4.2 LSTM Layer (src/layers/lstm_layer.cpp)

Implements Long Short-Term Memory networks:

class LSTMLayer {
    // Gates
    Matrix W_forget_, b_forget_;  // Forget gate
    Matrix W_input_, b_input_;    // Input gate
    Matrix W_cell_, b_cell_;      // Cell state
    Matrix W_output_, b_output_;  // Output gate
    
    // States
    Matrix hidden_state_;         // Hidden state
    Matrix cell_state_;           // Cell state
};

Features:

  • Full LSTM implementation with all gates
  • Gradient flow through time
  • State management
  • Dropout support
  • Bidirectional processing
  • Layer normalization
  • Peephole connections

4.3 Convolutional Layer (src/layers/conv_layer.cpp)

Implements 1D and 2D convolutional operations:

class ConvLayer {
    Matrix filters_;              // Convolution filters
    Matrix bias_;                 // Bias terms
    size_t kernel_size_;          // Kernel dimensions
    size_t stride_;               // Stride length
    size_t padding_;              // Padding size
    std::string padding_mode_;    // Padding mode (same/valid)
};

Features:

  • 1D and 2D convolution support
  • Multiple filter banks
  • Stride and padding options
  • Activation function integration
  • Pooling operations
  • Transposed convolution
  • Dilated convolution

4.4 Linear Layer (src/layers/linear_layer.cpp)

Implements fully connected layers:

class LinearLayer {
    Matrix weights_;    // Weight matrix
    Matrix bias_;       // Bias vector
    Matrix dW_;         // Weight gradients
    Matrix db_;         // Bias gradients
};

Features:

  • Weight initialization options
  • Bias term control
  • Gradient clipping
  • L1/L2 regularization
  • Sparse weight updates

5. Optimization (src/optim/)

5.1 Adam Optimizer (src/optim/adam.cpp)

Implements the Adam optimization algorithm:

class Adam {
    float beta1;        // First moment decay rate
    float beta2;        // Second moment decay rate
    float epsilon;      // Numerical stability term
    std::vector<float> m;  // First moment estimates
    std::vector<float> v;  // Second moment estimates
};

Features:

  • Adaptive learning rates
  • Bias correction
  • Weight decay support
  • Momentum tracking
  • Learning rate scheduling
  • Gradient clipping

5.2 Loss Functions (src/optim/loss.cpp)

Implements various loss functions:

class Loss {
    virtual ValuePtr forward(const Matrix& predictions, const Matrix& targets) = 0;
    virtual Matrix backward(const Matrix& predictions, const Matrix& targets) = 0;
};

Available losses:

  • MSE (Mean Squared Error)
  • RMSE (Root Mean Squared Error)
  • MAE (Mean Absolute Error)
  • Binary Cross-Entropy
  • Huber Loss
  • Custom loss functions

6. Model Training Process

6.1 Forward Pass

Matrix forward(const Matrix& input) {
    // LSTM forward pass
    Matrix i_t = sigmoid(matmul(W_input_, input) + b_input_);
    Matrix f_t = sigmoid(matmul(W_forget_, input) + b_forget_);
    Matrix o_t = sigmoid(matmul(W_output_, input) + b_output_);
    Matrix c_t = tanh(matmul(W_cell_, input) + b_cell_);
    
    cell_state_ = f_t * cell_state_ + i_t * c_t;
    hidden_state_ = o_t * tanh(cell_state_);
    
    return hidden_state_;
}

6.2 Loss Computation

ValuePtr computeLoss(const Matrix& output, const Matrix& target) {
    // MSE loss with automatic differentiation
    ValuePtr sum = Value::create(0.0f);
    for (size_t i = 0; i < output.numRows(); ++i) {
        ValuePtr diff = Value::subtract(output.at(i), target.at(i));
        ValuePtr squared = Value::multiply(diff, diff);
        sum = Value::add(sum, squared);
    }
    return Value::divide(sum, Value::create(static_cast<float>(n)));
}

6.3 Backward Pass

void backward(const Matrix& gradient) {
    // Compute gradients through the computational graph
    loss_val->backward();
    
    // Update parameters using Adam optimizer
    for (size_t i = 0; i < parameters.size(); ++i) {
        auto& param = parameters[i];
        
        // Update biased first moment estimate
        m[i] = beta1 * m[i] + (1.0f - beta1) * param->grad;
        
        // Update biased second raw moment estimate
        v[i] = beta2 * v[i] + (1.0f - beta2) * param->grad * param->grad;
        
        // Compute bias-corrected estimates
        float m_hat = m[i] / (1.0f - std::pow(beta1, t));
        float v_hat = v[i] / (1.0f - std::pow(beta2, t));
        
        // Update parameter
        param->data -= learning_rate * m_hat / (std::sqrt(v_hat) + epsilon);
    }
}

7. Advanced Features

7.1 Dropout Regularization

void updateWeightsWithDropout(double learning_rate, std::function<bool()> dropout_mask) {
    for (size_t i = 0; i < weights_.numRows(); ++i) {
        for (size_t j = 0; j < weights_.numCols(); ++j) {
            if (dropout_mask()) {
                weights_.at(i, j)->data = 0.0f;
            }
        }
    }
}

7.2 Early Stopping

if (val_loss < best_val_loss) {
    best_val_loss = val_loss;
    no_improvement_count = 0;
} else {
    no_improvement_count++;
    if (no_improvement_count >= patience) {
        break;
    }
}

7.3 Time Series Specific Features

class TimeSeriesProcessor {
    // Rolling window operations
    Matrix createRollingWindow(const Matrix& data, size_t window_size) {
        Matrix result(data.numRows() - window_size + 1, window_size);
        for (size_t i = 0; i < result.numRows(); ++i) {
            for (size_t j = 0; j < window_size; ++j) {
                result.at(i, j) = data.at(i + j, 0);
            }
        }
        return result;
    }
    
    // Feature extraction
    Matrix extractFeatures(const Matrix& data) {
        Matrix features(data.numRows(), 4);  // OHLC features
        for (size_t i = 0; i < data.numRows(); ++i) {
            features.at(i, 0) = data.at(i, 0);  // Open
            features.at(i, 1) = data.at(i, 1);  // High
            features.at(i, 2) = data.at(i, 2);  // Low
            features.at(i, 3) = data.at(i, 3);  // Close
        }
        return features;
    }
};

8. Performance Optimizations

8.1 OpenMP Parallelization

#pragma omp parallel for num_threads(num_threads)
for (size_t i = 0; i < current_batch_size; ++i) {
    // Parallel processing of batch samples
}

8.2 Memory Management

  • Smart pointer usage for automatic memory management
  • Efficient matrix operations with minimal copying
  • Gradient accumulation optimization
  • Memory pooling for frequently allocated objects
  • Cache-friendly data structures

8.3 Numerical Stability

// Softmax with numerical stability
float max_val = -std::numeric_limits<float>::infinity();
for (const auto& input : inputs) {
    max_val = std::max(max_val, input->data);
}

9. Usage Examples

9.1 LSTM Model Creation

LSTM model(input_size, hidden_size, output_size, num_layers, dropout_rate);

9.2 Training Configuration

Adam optimizer(model.parameters(), learning_rate, beta1, beta2, epsilon, weight_decay);

9.3 Training Loop

for (size_t epoch = 0; epoch < epochs; ++epoch) {
    for (size_t batch = 0; batch < num_batches; ++batch) {
        model.zeroGrad();
        Matrix output = model.forward(input);
        auto loss = optim::mse_loss(output, target);
        loss->backward();
        optimizer.step();
    }
}

9.4 Time Series Processing

// Create time series processor
TimeSeriesProcessor processor(window_size);

// Prepare data
Matrix data = loadFinancialData();
Matrix sequences = processor.createRollingWindow(data, window_size);
Matrix features = processor.extractFeatures(sequences);

// Train model
model.train(features, targets, epochs, learning_rate);

Technical Specifications

Memory Requirements

  • Minimum: 4GB RAM
  • Recommended: 8GB+ RAM for large models
  • GPU memory: Optional for CUDA support

Performance Metrics

  • Training speed: ~1000 samples/second on CPU
  • Memory efficiency: O(n) where n is the number of parameters
  • Gradient computation: O(n) time complexity
  • Time series processing: O(n) for window operations
  • Convolution operations: O(n * k) where k is kernel size

Dependencies

  • C++17 or later
  • OpenMP for parallelization
  • Eigen (optional) for optimized matrix operations
  • CUDA (optional) for GPU acceleration

Best Practices

1. Model Architecture

  • Start with smaller architectures
  • Use dropout for regularization
  • Monitor validation loss
  • Implement residual connections
  • Use batch normalization
  • Consider layer-wise learning rates

2. Training

  • Use appropriate batch sizes
  • Implement early stopping
  • Monitor gradient norms
  • Use learning rate scheduling
  • Implement gradient clipping
  • Regular model checkpointing

3. Performance

  • Enable OpenMP for parallel processing
  • Use appropriate data types
  • Implement efficient data loading
  • Optimize memory usage
  • Profile critical sections
  • Use SIMD instructions where applicable

4. Time Series Specific

  • Handle missing data appropriately
  • Consider seasonality
  • Implement proper train/test splitting
  • Use appropriate sequence lengths
  • Consider multiple time scales
  • Implement proper feature scaling

Future Enhancements

1. Planned Features

  • GPU acceleration
  • Distributed training
  • More layer types
  • Attention mechanisms
  • Transformer architecture
  • Reinforcement learning support

2. Optimization Goals

  • Reduced memory usage
  • Faster training speed
  • Better numerical stability
  • Improved time series handling
  • Enhanced parallelization
  • Better error handling

Conclusion

FinML provides a robust foundation for financial machine learning applications, with a focus on efficiency, scalability, and numerical stability. The library's architecture enables easy extension and customization while maintaining high performance. Its comprehensive support for time series processing, sequential layers, and convolutional operations makes it particularly suitable for financial applications.