Skip to content

Dev#3

Merged
Bean91 merged 9 commits into
mainfrom
dev
Jun 21, 2026
Merged

Dev#3
Bean91 merged 9 commits into
mainfrom
dev

Conversation

@Bean91

@Bean91 Bean91 commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

Release Notes

  • New Features
    • Added epoch-based model training with a configurable learning rate; forward now returns a probability distribution.
    • Extended the embedder to embed entire token sequences in one call.
  • Refactor
    • Migrated forward/backward computation to matrix-based operations and expanded backpropagation through blocks, layers, the neural network, and self-attention.
    • Added cached attention intermediates to improve training consistency.
  • Chores
    • Added documentation build/deploy and CI workflows; updated AI usage documentation.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Backpropagation is added end-to-end across layer, neuralNetwork, selfAttention, block, and embedder components. Forward passes are refactored from std::vector<float> to utility::matrix. A subtract utility is added for gradient signal computation. The model gains a configurable learning_rate, a refactored token-index-based forwardPass returning softmax distributions, and a new train method implementing sliding-window gradient descent. CI/CD workflows for Ubuntu, macOS, and Windows, plus documentation deployment, are added.

Changes

Backpropagation and Training

Layer / File(s) Summary
Matrix subtract utility
include/utility.hpp
Adds subtract(a, b) for element-wise matrix subtraction with dimension validation, used to compute gradient signals dZ = dist - oneHot in the training loop.
layer: matrix feedforward and backward
include/layer.hpp
Replaces vector-based feedForward with a matrix version that caches input X and post-activation output Z; applies relu(dot(x,W)+b); adds backward(dZ) returning {dX, {dW, db}}.
neuralNetwork: remove gain/bias, matrix forward/backward
include/neural_network.hpp
Removes gain/bias fields and their serialization; updates layerNorm to use start/end indices for selective normalization; switches feedForward to utility::matrix; adds reverse-layer backward; replaces changeOne with layer-indexed overloads.
selfAttention: weight init, cached forward, backward
include/self_attention.hpp
Scales weight init stddev to 1/sqrt(n_embd); caches q, k, v, p, x during the forward pass; adds backward(dZ) computing dX and gradients for wq, wk, wv via intermediate matrices dV, dP, dS scaled by 1/sqrt(K).
block: matrix feedforward and backward
include/block.hpp
Replaces the per-row feedforward loop with a direct network.feedForward(x) call; adds backward(dZ) chaining network.backward then attention.backward and assembling combined gradient output from both subcomponents.
embedder: sequence embedding and backward
include/embedder.hpp
Stores token sequence in toks; adds embed(forward_list<int>) overload building a token-count × n_embd matrix by iterating tokens; updates backward to map gradient rows back to table rows via toks iterator positions and apply learning-rate-scaled updates.
model: training loop and refactored forwardPass
include/model.hpp
Adds learning_rate field with default 0.01 and constructor parameter; refactors forwardPass to accept token indices and return a softmax distribution matrix; removes string-based argmax decoding; adds train(string, epochs) with sliding-window epoch loop, one-hot targets, dZ computation, reverse-block backward passes, embedder backward, and changeOne-based parameter updates scaled by learning_rate.

CI and Documentation Infrastructure

Layer / File(s) Summary
GitHub Actions workflows for multi-platform CI
.github/workflows/ubuntu.yml, .github/workflows/macos.yml, .github/workflows/windows.yml, .github/workflows/pre-commit.yml
Adds complete CI workflows for Ubuntu (with Codecov coverage reporting), macOS, Windows, and pre-commit hooks. Each platform workflow checks out the repo, runs make prepare, configures Debug CMake, builds unit_tests, and executes the test binary; Ubuntu additionally installs coverage tools and uploads results.
Documentation build and GitHub Pages deployment
.github/workflows/documentation.yml
Adds a workflow that builds Doxygen documentation on tag pushes and pushes to main/master branches, then deploys the generated HTML site to GitHub Pages using the Cecilapp action.
AI usage documentation
AIUsage.md
Updates documentation to record that CodeRabbit AI was used for PR review.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Bean91/Open-Chat#2: Modifies block::feedForward to chain attention and network.feedForward over the input matrix, directly overlapping with the block-level changes in this PR.

Poem

🐇 Hoppity-hop through the gradient path,
Each weight now learns from the aftermath.
From layer to block to model we go,
subtract the target and watch the loss flow.
Backprop at last, through attention and all —
A training loop blooms from a rabbit's small scrawl! 🌱

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Dev' is vague and generic, failing to convey any meaningful information about the substantive changes (neural network refactoring, backward propagation, training loop implementation, and CI/CD workflows). Use a more descriptive title that summarizes the primary changes, such as 'Add backward propagation and training loop implementation' or 'Refactor neural network for matrix operations and gradient computation'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/embedder.hpp`:
- Around line 111-117: The backward method has two issues: it uses += instead of
-= for gradient updates (inconsistent with layer::changeOne and
selfAttention::changeOne), and it calls std::next(this->toks.begin(), i) inside
the nested loop, causing O(n²) complexity for a forward_list. Fix this by
changing the += operator to -= to match the gradient descent pattern, and
advance a single iterator incrementally before the inner loop starts, reusing it
across loop iterations instead of calling std::next(this->toks.begin(), i)
repeatedly.

In `@include/layer.hpp`:
- Around line 77-82: The feedForward method in the Layer class fails when
processing multi-row input matrices because utility::add requires exact
dimension matching, but the result of utility::dot(x, this->weights) has shape
(x.rows, n_out) while this->biases has shape (1, n_out). Instead of using
utility::add directly on these mismatched dimensions, manually broadcast the
biases across all rows of the dot product result before adding them together,
ensuring the bias vector is applied consistently to each row of the output.
- Around line 84-101: The backward method does not account for the ReLU
activation derivative, which means gradients are not properly masked. Add a
member variable to cache the pre-ReLU activation values during the forward pass,
then in the backward method, create a derivative mask where elements are 1 where
pre-activation values are greater than 0 and 0 elsewhere. Apply this mask to dZ
before computing dW, dX, and db to ensure gradients are zeroed where the
pre-activation was less than or equal to 0.

In `@include/model.hpp`:
- Around line 115-125: The inner loop variable l in all three matrix update
loops is incorrectly bounded by adW[n].rows when it should be bounded by
adW[n].cols. This affects the three changeOne calls for updating the 'q', 'k',
and 'v' matrices in the gradient descent updates. Replace the inner loop
condition for each of the three loops (where k iterates over rows for adW[0],
adW[1], and adW[2]) so that the l variable iterates up to the column count
(.cols) instead of the row count (.rows) to properly traverse all matrix
columns.
- Around line 84-88: The oneHot matrix creation has incorrect dimensions. Since
dist has shape (1, vocab_size), the oneHot matrix should also be created with
shape (1, vocab_size) instead of (1, 1). Change the oneHot initialization from
utility::matrix(dist.rows, 1) to utility::matrix(dist.rows, dist.cols).
Additionally, update the indexing to set the hot value from oneHot[next][0] = 1
to oneHot[0][next] = 1 so the element at column position next (the token index)
is set correctly in the single row matrix.
- Around line 106-113: The backward loop in the backpropagation code uses size_t
for the layer variable, which is an unsigned type, causing an infinite loop when
layer decrements below zero due to unsigned integer underflow. Change the loop
variable from size_t layer to a signed integer type such as int or ssize_t to
allow proper decrement behavior when iterating backward through the layers in
the ndW loop. Alternatively, you could restructure the loop to iterate forward
instead of backward.
- Around line 93-127: The gradient collection loop iterates through blocks in
reverse order (using blocks.rbegin()) and stores gradients in bdW, but the
gradient application loop iterates through blocks in forward order (for (block&
b : this->blocks)), causing gradients to be applied to the wrong blocks. Fix
this by changing the gradient application loop to also iterate in reverse order,
so that bdW[0] (which contains gradients from the last block) is applied to the
last block, not the first block. Use reverse iteration similar to the collection
phase when applying the gradients from bdW to each block in the blocks
container.

In `@include/neural_network.hpp`:
- Around line 58-59: The layerNorm function incorrectly uses the std::vector
range constructor by passing float values x[start] and x[end] instead of
iterators, causing undefined behavior. Replace the vector construction with
proper iterator arithmetic by using x.begin() + start and x.begin() + end as the
iterator arguments to the range constructor, which will correctly create a new
vector containing elements from the specified range.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb18cb7c-c20d-4f45-89bc-8a081d86ee66

📥 Commits

Reviewing files that changed from the base of the PR and between efc2e21 and dcc57bc.

📒 Files selected for processing (7)
  • include/block.hpp
  • include/embedder.hpp
  • include/layer.hpp
  • include/model.hpp
  • include/neural_network.hpp
  • include/self_attention.hpp
  • include/utility.hpp

Comment thread include/embedder.hpp
Comment thread include/layer.hpp
Comment thread include/layer.hpp
Comment thread include/model.hpp Outdated
Comment thread include/model.hpp
Comment thread include/model.hpp Outdated
Comment thread include/model.hpp
Comment thread include/neural_network.hpp Outdated
Bean91 and others added 3 commits June 20, 2026 20:03
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/embedder.hpp`:
- Line 118: The statement using the unary plus operator on tok_it is a typo that
prevents compilation. Replace the unary plus operator `+` with the pre-increment
operator `++` in the tok_it expression to properly advance the iterator to the
next token in the forward_list. This will allow the code to compile and function
correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea4bbca9-bf41-4463-9c96-388d6e616d56

📥 Commits

Reviewing files that changed from the base of the PR and between dcc57bc and e0de468.

📒 Files selected for processing (3)
  • include/embedder.hpp
  • include/layer.hpp
  • include/neural_network.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • include/layer.hpp
  • include/neural_network.hpp

Comment thread include/embedder.hpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/documentation.yml:
- Around line 14-15: Replace the mutable action references in the
documentation.yml workflow file with their corresponding full commit SHAs for
improved supply-chain security. Update the actions/checkout@v2 and
actions/setup-python@v2 uses statements (and any other `@v2` or `@v3` refs mentioned
in the file, including at line 30) to pin them to specific commit SHAs instead.
Additionally, add the parameter persist-credentials: false to the
actions/checkout step to prevent credential persistence in the deploy pipeline,
which further hardens the security posture of the publishing workflow.

In @.github/workflows/macos.yml:
- Line 13: Update the `actions/checkout@v2` action reference in the macos.yml
workflow to address security and maintenance concerns. Replace the deprecated v2
reference with v4 pinned to a specific commit hash instead of just the version
tag to prevent supply-chain risks. Additionally, add the `persist-credentials:
false` configuration option to disable credential persistence, which is a
security best practice for public repositories.

In @.github/workflows/pre-commit.yml:
- Around line 13-15: The GitHub Actions in the pre-commit.yml workflow are using
outdated and unpinned versions that pose maintenance and security risks. Update
the actions/checkout action from v2 to v4 and add the persist-credentials: false
parameter to enhance security. Upgrade the actions/setup-python action from v2
to v4. Finally, replace the unpinned pre-commit/action@v2.0.0 with a pinned
commit hash reference based on the v3.0.0 tag from the official
pre-commit/action repository to ensure supply-chain security and consistency.

In @.github/workflows/ubuntu.yml:
- Line 13: Update the actions/checkout action in all GitHub workflow files
(ubuntu.yml, windows.yml, pre-commit.yml, macos.yml, and documentation.yml) from
the outdated v2 to v4, pin it to a specific immutable commit SHA instead of a
version tag, and add the with configuration to set persist-credentials to false.
This prevents credential leakage between workflow runs and ensures you are using
a verified and secure version of the action. Replace each occurrence of
actions/checkout@v2 with actions/checkout@v4 pinned to a commit SHA and include
the persist-credentials configuration in the with block.
- Line 31: The workflow on line 31 uses an unsafe curl piping pattern to execute
a remote Codecov upload script, which presents a supply-chain security risk.
Replace the bash command that pipes curl output directly with the official
Codecov GitHub action, ensuring it is pinned to a specific version tag or full
SHA for reproducibility. Configure the action with appropriate authentication
credentials to enable secure uploads to Codecov.

In @.github/workflows/windows.yml:
- Line 13: Update the `actions/checkout@v2` action on line 13 to use a recent
pinned version with a commit SHA instead of a version tag, and add the
`persist-credentials: false` parameter to the uses statement. This reduces the
risk of token exposure by preventing the action from persisting the GitHub token
to the local git configuration. Replace the current checkout action with the
latest stable version pinned to its full commit SHA and include the security
parameter in the with block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d05cc6e9-5622-4b48-ad58-cf301a792215

📥 Commits

Reviewing files that changed from the base of the PR and between e0de468 and 5fd98cf.

⛔ Files ignored due to path filters (1)
  • .DS_Store is excluded by !**/.DS_Store
📒 Files selected for processing (9)
  • .github/workflows/documentation.yml
  • .github/workflows/macos.yml
  • .github/workflows/pre-commit.yml
  • .github/workflows/ubuntu.yml
  • .github/workflows/windows.yml
  • AIUsage.md
  • include/embedder.hpp
  • include/layer.hpp
  • include/model.hpp
✅ Files skipped from review due to trivial changes (1)
  • AIUsage.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • include/embedder.hpp
  • include/layer.hpp
  • include/model.hpp

Comment thread .github/workflows/documentation.yml Outdated
Comment thread .github/workflows/macos.yml Outdated
Comment thread .github/workflows/pre-commit.yml Outdated
Comment thread .github/workflows/ubuntu.yml Outdated
Comment thread .github/workflows/ubuntu.yml Outdated
Comment thread .github/workflows/windows.yml Outdated
@Bean91 Bean91 merged commit c3d5287 into main Jun 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant