TenaDiff

"Ten[sors with] a[utomatic] Diff[erentiation]"

The TenaDiff library implements a tensor abstraction that automatically builds compute graphs and handles reverse-mode automatic differentiation, allowing users to easily build custom neural networks.

The library is implemented from scratch in C++23 and is designed to align with the PyTorch API.

Basic Example

Let's take a look at a simple example: performing a single step of gradient descent while fitting a model to a line.

// Create parameters for bias and slope.
// We will fit the line using a model of the form y = bias + slope * x.
// Each parameter will of shape (1,).
// The bias will be initialized to -0.03.
// The slope will be initialized to 0.01.
auto bias = Tensor<double>::create(Shape{1}, {-0.03});
auto slope = Tensor<double>::create(Shape{1}, {0.01});

// Set each tensor to require gradients so that
// they accumulate gradients during the backward pass.
bias->set_requires_grad(true);
slope->set_requires_grad(true);

// Create tensors to store intermediate results during the forward
// and backward passes.
// The input will be of shape (batch_size,).
// We will need to broadcast the bias and slope parameters
// to match the shape of the input.
// We will use `bias_broadcasted` and `slope_broadcasted` to store
// the broadcasted bias and slope.
// We will use `mul_out` and `add_out` to store the intermediate
// results of the multiplication and addition.
auto bias_broadcasted = Tensor<double>::create(Shape{batch_size});
auto slope_broadcasted = Tensor<double>::create(Shape{batch_size});
auto mul_out = Tensor<double>::create(Shape{batch_size});
auto add_out = Tensor<double>::create(Shape{batch_size});

// Use mean squared error loss.
// The input to the loss function will be of shape (batch_size,).
auto mse = MeanSquaredErrorLoss<double>(Shape{batch_size});

// Use stochastic gradient descent with a learning rate of 0.05.
auto optimizer = SGD<double>(0.05);

// Conduct the forward pass.
// Tensor operations are read from left to right.
// For instance, `inputs->mul(*slope_broadcasted, *mul_out)`
// describes element-wise multiplying the `inputs` tensor by the
// `slope_broadcasted` tensor and storing the result in the `mul_out` tensor.
bias->broadcast(*bias_broadcasted);
slope->broadcast(*slope_broadcasted);
inputs->mul(*slope_broadcasted, *mul_out);
auto &predictions = mul_out->add(*bias_broadcasted, *add_out);
auto &loss = mse(MeanSquaredErrorLoss<double>::Predictions(predictions),
                 MeanSquaredErrorLoss<double>::Targets(*targets));

// Retrieve the compute graph.
// The compute graph is a directed acyclic graph (DAG)
// that represents the computation of a neural network.
// It is constructed during the forward pass and is used
// to compute the gradients during the backward pass.
auto compute_graph = loss.compute_graph();

// Conduct the backward pass.
// The reverse mode of automatic differentiation is used.
// Starting at the loss, the compute graph is traversed
// backwards using differentiation rules to
// propagate the gradients back to the parameters.
compute_graph.backward();

// Update the parameters.
// The optimizer uses the gradients from the backward pass
// to update the parameters.
optimizer.step(compute_graph);

// Reset the compute graph.
// This is necessary to detach the nodes in the compute graph
// and zero the gradients from the previous backward pass.
compute_graph.reset();

Expanding this snippet to iterate over multiple mini-batches, we can fit our model.

> line 0.86 4.3
Batch 0 | Loss: 9.19552 | Bias: 0.406993 | Slope: 0.741539
Batch 10 | Loss: 0.317648 | Bias: 0.875876 | Slope: 3.37687
Batch 20 | Loss: 0.0273313 | Bias: 0.863401 | Slope: 4.06392
Batch 30 | Loss: 0.0013757 | Bias: 0.86017 | Slope: 4.24451
Batch 40 | Loss: 7.21439e-05 | Bias: 0.859589 | Slope: 4.28779
Batch 50 | Loss: 3.51138e-06 | Bias: 0.859878 | Slope: 4.29712
Batch 60 | Loss: 2.22195e-07 | Bias: 0.860021 | Slope: 4.29938
Batch 70 | Loss: 8.30594e-09 | Bias: 0.86 | Slope: 4.29985
Batch 80 | Loss: 6.48038e-10 | Bias: 0.859999 | Slope: 4.29996
Batch 90 | Loss: 2.9512e-11 | Bias: 0.86 | Slope: 4.29999
Batch 100 | Loss: 2.03897e-12 | Bias: 0.86 | Slope: 4.3
Batch 110 | Loss: 1.19486e-13 | Bias: 0.86 | Slope: 4.3
Batch 120 | Loss: 5.47383e-15 | Bias: 0.86 | Slope: 4.3
Batch 130 | Loss: 2.82228e-16 | Bias: 0.86 | Slope: 4.3
----------------------------------------
Converged after 139 batches.
Loss: 1.86122e-17
Bias: 0.86
Slope: 4.3

You can find the complete source code for this line fitting example in example/line/line.cpp. Try it out for yourself using the make line command.

Advanced Example

Let's take a look at a more interesting example: training a four-layer feed-forward neural network to fit a heightmap.

First, let's export a heightmap from the Rocky Mountains as a PNG using unrealheightmap.

example/heightmap/data/png/b42eb40cc0d4.png

Convert the PNG to PGM format using the example/heightmap/scripts/png_to_pgm script included in this repository. See example/heightmap/data/pgm/b42eb40cc0d4.pgm for the resulting PGM file.

From here, we can write our training loop.

// Assume that we have an `inputs` tensor of shape (epoch_size, 2)
// that contains the normalized x and y coordinates of the heightmap.
// Also assume we have a `targets` tensor of shape (epoch_size, 1) that
// contains the normalized height values of the heightmap for each sample
// in the `inputs` tensor.

// Define the neural network.
// We will use a four-layer feed-forward neural network.
// This time, let's use the entire epoch as input in each iteration (batch gradient descent).
const dimension_size_t epoch_size = inputs->shape().at(0);
const auto hidden_layer_sizes = std::array<dimension_size_t, 3>{128, 64, 32};
auto lin1 =     Linear<double>(epoch_size, inputs->shape().at(1), hidden_layer_sizes[0]);
auto lin1relu = Tensor<double>::create(Shape{epoch_size, hidden_layer_sizes[0]});
auto lin2 =     Linear<double>(epoch_size, hidden_layer_sizes[0], hidden_layer_sizes[1]);
auto lin2relu = Tensor<double>::create(Shape{epoch_size, hidden_layer_sizes[1]});
auto lin3 =     Linear<double>(epoch_size, hidden_layer_sizes[1], hidden_layer_sizes[2]);
auto lin3relu = Tensor<double>::create(Shape{epoch_size, hidden_layer_sizes[2]});
auto lin4 =     Linear<double>(epoch_size, hidden_layer_sizes[2], 1);

// Use mean squared error loss.
// The input to the loss function will be of shape (epoch_size, 1),
// the same as the output of the last layer in our neural network.
auto mse = MeanSquaredErrorLoss<double>(Shape{epoch_size, 1});

// Use stochastic gradient descent with momentum.
auto optimizer = SGDWithMomentum<double>(
    SGDWithMomentum<double>::LearningRate(0.05),
    SGDWithMomentum<double>::Momentum(0.9));

// Training loop.
for (auto i = 0; i < 5000; ++i) {
    // Conduct the forward pass.
    lin1(*inputs).relu(*lin1relu);
    lin2(*lin1relu).relu(*lin2relu);
    lin3(*lin2relu).relu(*lin3relu);
    auto &predictions = lin4(*lin3relu);
    auto &loss = mse(MeanSquaredErrorLoss<double>::Predictions(predictions),
                     MeanSquaredErrorLoss<double>::Targets(*targets));

    // Retrieve the compute graph.
    auto compute_graph = loss.compute_graph();

    // Conduct the backward pass.
    compute_graph.backward();

    // Update the parameters.
    optimizer.step(compute_graph);

    // Reset the compute graph.
    compute_graph.reset();
}

With some additional boilerplate for reading and writing PGM files, we can fit our neural network to the heightmap.

> heightmap example/heightmap/data/pgm/b42eb40cc0d4.pgm example/heightmap/output/

If we output PGM files representing the model's predictions over the epochs, we can visualize the model's training progress.

example/heightmap/data/gif/b42eb40cc0d4.gif

Nice! We've trained a feed-forward neural network to fit a heightmap.

You can find the complete source code for this heightmap example in example/heightmap/heightmap.cpp. Try it out for yourself using the make heightmap command.

Getting Started

First, install the library.

> make install

After installation, you can write your own neural networks using the API described below. For examples of how to use the API, see the example directory. See example/CMakeLists.txt for an example of how to link against the library. To run the examples, use the following commands.

> make line
> make heightmap

API

As we saw in the examples, the atomic unit of the library is Tensor. On top of Tensor, there are a number of additional utilities.

`Tensor`

#include <tenadiff/tensor.hpp>

Tensor is responsible for performing the forward pass and building the compute graph while doing so. The compute graph is constructed fresh for each forward pass at runtime (like PyTorch and TensorFlow V2).

Method	Description
`Tensor::create`	Create a new tensor.
`Tensor::shape`	Get the shape of the tensor.
`Tensor::at`	Get the element at a given multi-dimensional index.
`Tensor::broadcast`	Broadcast the tensor to an expanded shape.
`Tensor::transpose`	Transpose the tensor.
`Tensor::matmul`	Matrix multiply with another tensor.
`Tensor::add`	Element-wise add with another tensor.
`Tensor::sub`	Element-wise subtract another tensor from the current tensor.
`Tensor::relu`	ReLU activation.
`Tensor::softmax`	Apply the softmax function to the tensor.
`Tensor::pow`	Element-wise power of the tensor.
`Tensor::log`	Element-wise log of the tensor.
`Tensor::mul`	Element-wise multiply with another tensor.
`Tensor::neg`	Element-wise negation of the tensor.
`Tensor::sum`	Sum the tensor into a 1-dimensional tensor.
`Tensor::mean`	Take the mean of the tensor into a 1-dimensional tensor.
`Tensor::set_requires_grad`	Set whether the tensor requires a gradient during the backward pass.
`Tensor::requires_grad`	Check whether the tensor requires a gradient during the backward pass.
`Tensor::grad`	Get the gradient of the tensor from the last backward pass.
`Tensor::zero_grad`	Zero the gradient of the tensor.
`Tensor::compute_graph`	Get the compute graph rooted at the tensor.
`Tensor::update`	Update the tensor in-place using the gradient.

`Tensor::Init`

#include <tenadiff/tensor.hpp>

Tensor::Init is a static class that provides methods for initializing new tensors using random number distributions.

Method	Description
`Tensor::Init::uniform`	Initialize a new tensor with a uniform distribution.
`Tensor::Init::kaiming_uniform`	Initialize a new tensor with a Kaiming uniform distribution.

`ComputeGraph`

#include <tenadiff/compute_graph.hpp>

ComputeGraph represents the computation of a neural network. The graph consists of linked ComputeGraphData and ComputeGraphOperation nodes. ComputeGraphData represents a Tensor in the graph. ComputeGraphOperation represents an operation between data nodes, such as Tensor::add. Tensor is responsible for building the graph during the forward pass. ComputeGraph is responsible for resetting the graph between passes, and handles propagating gradients backwards through the graph.

Check out the include/compute_graph.hpp header for details on the graph's structure and ownership model.

Method	Description
`ComputeGraph::backward`	Propagate gradients backwards through the graph.
`ComputeGraph::root`	Get the root `ComputeGraphData` node of the graph.
`ComputeGraph::parameters`	Get the parameter `ComputeGraphData` nodes of the graph.
`ComputeGraph::requires_backward`	Check whether the given `ComputeGraphData` node requires a gradient during the backward pass.
`ComputeGraph::reset`	Detach all nodes and zero any gradients from the previous backward pass.

In the majority of cases, you will only need to interact with ComputeGraph::backward and ComputeGraph::reset. As you are unlikely to require interacting with ComputeGraphData and ComputeGraphOperation nodes directly, their descriptions are omitted here for brevity.

`Linear`

#include <tenadiff/linear.hpp>

Linear represents a linear layer in a neural network. It applies an affine linear transformation to an input tensor of shape (batch_size, input_size).

Method	Description
`Linear::weights`	Get the weights of the linear layer.
`Linear::biases`	Get the biases of the linear layer.
`Linear::operator()`	Apply an affine linear transformation to the input tensor.

`CrossEntropyLoss`

#include <tenadiff/loss.hpp>

CrossEntropyLoss represents the cross-entropy loss function. It is used to measure the loss of a multi-class classification model with an arbitrary number of classes. The predictions must contain the unnormalized logits for each class, with shape (batch_size, num_classes). The targets must be one-hot encoded, with shape (batch_size, num_classes).

Method	Description
`CrossEntropyLoss::operator()`	Compute the cross-entropy loss between the predictions and targets.

`MeanSquaredErrorLoss`

#include <tenadiff/loss.hpp>

MeanSquaredErrorLoss represents the mean squared error loss function. It is used to measure the loss of a regression model.

Method	Description
`MeanSquaredErrorLoss::operator()`	Compute the mean squared error between the predictions and targets.

`SGD`

#include <tenadiff/optimizer.hpp>

SGD represents the stochastic gradient descent optimizer. Optimizers are responsible for updating the parameters of a neural network after a backward pass.

Method	Description
`SGD::step()`	Perform a single optimization step on the parameters of a compute graph.

`SGDWithMomentum`

#include <tenadiff/optimizer.hpp>

SGDWithMomentum represents the stochastic gradient descent optimizer with momentum. It extends the SGD optimizer with a momentum term.

Method	Description
`SGDWithMomentum::step()`	Perform a single optimization step on the parameters of a compute graph.

Performance

Based on profiling of the heightmap example, we can see that the Tensor::matmul forward and backward passes are the most expensive operations.

> perf record -F 99 -p <pid> -g -- sleep 60
> perf script > scratch/out.perf
> stackcollapse-perf.pl scratch/out.perf > scratch/out.folded
> flamegraph.pl scratch/out.folded > example/heightmap/data/flamegraph.svg

example/heightmap/data/flamegraph.svg

There are definitely ways to improve performance. Further optimization will be left as future work.

Future Work

Optimize Tensor::matmul
Gradient clipping
Adam optimizer
Classifier example
Fused kernel for cross-entropy loss
Fused kernel for log softmax
Tensor storage layouts in addition to strided (e.g., sparse)
Device types in addition to CPU (e.g., GPU)
Compute graph export to Mermaid diagram
Python bindings
Utilities for exporting training metrics to TensorBoard
Utilities for data loading and preprocessing
Convolutional layers
Recurrent layers
Attention layers

Inspiration and Resources

Repositories:

Readings:

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
example		example
include		include
src		src
test		test
.clang-tidy		.clang-tidy
.envrc		.envrc
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
compile_flags.txt		compile_flags.txt
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TenaDiff

Basic Example

Advanced Example

Getting Started

API

`Tensor`

`Tensor::Init`

`ComputeGraph`

`Linear`

`CrossEntropyLoss`

`MeanSquaredErrorLoss`

`SGD`

`SGDWithMomentum`

Performance

Future Work

Inspiration and Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TenaDiff

Basic Example

Advanced Example

Getting Started

API

Tensor

Tensor::Init

ComputeGraph

Linear

CrossEntropyLoss

MeanSquaredErrorLoss

SGD

SGDWithMomentum

Performance

Future Work

Inspiration and Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Tensor`

`Tensor::Init`

`ComputeGraph`

`Linear`

`CrossEntropyLoss`

`MeanSquaredErrorLoss`

`SGD`

`SGDWithMomentum`

Packages