Dev by Bean91 · Pull Request #3 · Bean91/Open-Chat

Bean91 · 2026-06-21T00:51:35Z

Summary by CodeRabbit

Release Notes

New Features
- Added epoch-based model training with a configurable learning rate; forward now returns a probability distribution.
- Extended the embedder to embed entire token sequences in one call.
Refactor
- Migrated forward/backward computation to matrix-based operations and expanded backpropagation through blocks, layers, the neural network, and self-attention.
- Added cached attention intermediates to improve training consistency.
Chores
- Added documentation build/deploy and CI workflows; updated AI usage documentation.

coderabbitai · 2026-06-21T00:51:54Z

📝 Walkthrough

Walkthrough

Backpropagation is added end-to-end across layer, neuralNetwork, selfAttention, block, and embedder components. Forward passes are refactored from std::vector<float> to utility::matrix. A subtract utility is added for gradient signal computation. The model gains a configurable learning_rate, a refactored token-index-based forwardPass returning softmax distributions, and a new train method implementing sliding-window gradient descent. CI/CD workflows for Ubuntu, macOS, and Windows, plus documentation deployment, are added.

Changes

Backpropagation and Training

Layer / File(s)	Summary
Matrix subtract utility `include/utility.hpp`	Adds `subtract(a, b)` for element-wise matrix subtraction with dimension validation, used to compute gradient signals `dZ = dist - oneHot` in the training loop.
`layer`: matrix feedforward and backward `include/layer.hpp`	Replaces vector-based `feedForward` with a matrix version that caches input `X` and post-activation output `Z`; applies `relu(dot(x,W)+b)`; adds `backward(dZ)` returning `{dX, {dW, db}}`.
`neuralNetwork`: remove gain/bias, matrix forward/backward `include/neural_network.hpp`	Removes `gain`/`bias` fields and their serialization; updates `layerNorm` to use `start`/`end` indices for selective normalization; switches `feedForward` to `utility::matrix`; adds reverse-layer `backward`; replaces `changeOne` with layer-indexed overloads.
`selfAttention`: weight init, cached forward, backward `include/self_attention.hpp`	Scales weight init stddev to `1/sqrt(n_embd)`; caches `q`, `k`, `v`, `p`, `x` during the forward pass; adds `backward(dZ)` computing `dX` and gradients for `wq`, `wk`, `wv` via intermediate matrices `dV`, `dP`, `dS` scaled by `1/sqrt(K)`.
`block`: matrix feedforward and backward `include/block.hpp`	Replaces the per-row feedforward loop with a direct `network.feedForward(x)` call; adds `backward(dZ)` chaining `network.backward` then `attention.backward` and assembling combined gradient output from both subcomponents.
`embedder`: sequence embedding and backward `include/embedder.hpp`	Stores token sequence in `toks`; adds `embed(forward_list<int>)` overload building a token-count × `n_embd` matrix by iterating tokens; updates `backward` to map gradient rows back to `table` rows via `toks` iterator positions and apply learning-rate-scaled updates.
`model`: training loop and refactored `forwardPass` `include/model.hpp`	Adds `learning_rate` field with default `0.01` and constructor parameter; refactors `forwardPass` to accept token indices and return a softmax distribution matrix; removes string-based argmax decoding; adds `train(string, epochs)` with sliding-window epoch loop, one-hot targets, `dZ` computation, reverse-block backward passes, embedder backward, and `changeOne`-based parameter updates scaled by `learning_rate`.

CI and Documentation Infrastructure

Layer / File(s)	Summary
GitHub Actions workflows for multi-platform CI `.github/workflows/ubuntu.yml`, `.github/workflows/macos.yml`, `.github/workflows/windows.yml`, `.github/workflows/pre-commit.yml`	Adds complete CI workflows for Ubuntu (with Codecov coverage reporting), macOS, Windows, and pre-commit hooks. Each platform workflow checks out the repo, runs `make prepare`, configures Debug CMake, builds `unit_tests`, and executes the test binary; Ubuntu additionally installs coverage tools and uploads results.
Documentation build and GitHub Pages deployment `.github/workflows/documentation.yml`	Adds a workflow that builds Doxygen documentation on tag pushes and pushes to `main`/`master` branches, then deploys the generated HTML site to GitHub Pages using the Cecilapp action.
AI usage documentation `AIUsage.md`	Updates documentation to record that CodeRabbit AI was used for PR review.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Bean91/Open-Chat#2: Modifies block::feedForward to chain attention and network.feedForward over the input matrix, directly overlapping with the block-level changes in this PR.

Poem

🐇 Hoppity-hop through the gradient path,
Each weight now learns from the aftermath.
From layer to block to model we go,
subtract the target and watch the loss flow.
Backprop at last, through attention and all —
A training loop blooms from a rabbit's small scrawl! 🌱

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Dev' is vague and generic, failing to convey any meaningful information about the substantive changes (neural network refactoring, backward propagation, training loop implementation, and CI/CD workflows).	Use a more descriptive title that summarizes the primary changes, such as 'Add backward propagation and training loop implementation' or 'Refactor neural network for matrix operations and gradient computation'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/embedder.hpp`:
- Around line 111-117: The backward method has two issues: it uses += instead of
-= for gradient updates (inconsistent with layer::changeOne and
selfAttention::changeOne), and it calls std::next(this->toks.begin(), i) inside
the nested loop, causing O(n²) complexity for a forward_list. Fix this by
changing the += operator to -= to match the gradient descent pattern, and
advance a single iterator incrementally before the inner loop starts, reusing it
across loop iterations instead of calling std::next(this->toks.begin(), i)
repeatedly.

In `@include/layer.hpp`:
- Around line 77-82: The feedForward method in the Layer class fails when
processing multi-row input matrices because utility::add requires exact
dimension matching, but the result of utility::dot(x, this->weights) has shape
(x.rows, n_out) while this->biases has shape (1, n_out). Instead of using
utility::add directly on these mismatched dimensions, manually broadcast the
biases across all rows of the dot product result before adding them together,
ensuring the bias vector is applied consistently to each row of the output.
- Around line 84-101: The backward method does not account for the ReLU
activation derivative, which means gradients are not properly masked. Add a
member variable to cache the pre-ReLU activation values during the forward pass,
then in the backward method, create a derivative mask where elements are 1 where
pre-activation values are greater than 0 and 0 elsewhere. Apply this mask to dZ
before computing dW, dX, and db to ensure gradients are zeroed where the
pre-activation was less than or equal to 0.

In `@include/model.hpp`:
- Around line 115-125: The inner loop variable l in all three matrix update
loops is incorrectly bounded by adW[n].rows when it should be bounded by
adW[n].cols. This affects the three changeOne calls for updating the 'q', 'k',
and 'v' matrices in the gradient descent updates. Replace the inner loop
condition for each of the three loops (where k iterates over rows for adW[0],
adW[1], and adW[2]) so that the l variable iterates up to the column count
(.cols) instead of the row count (.rows) to properly traverse all matrix
columns.
- Around line 84-88: The oneHot matrix creation has incorrect dimensions. Since
dist has shape (1, vocab_size), the oneHot matrix should also be created with
shape (1, vocab_size) instead of (1, 1). Change the oneHot initialization from
utility::matrix(dist.rows, 1) to utility::matrix(dist.rows, dist.cols).
Additionally, update the indexing to set the hot value from oneHot[next][0] = 1
to oneHot[0][next] = 1 so the element at column position next (the token index)
is set correctly in the single row matrix.
- Around line 106-113: The backward loop in the backpropagation code uses size_t
for the layer variable, which is an unsigned type, causing an infinite loop when
layer decrements below zero due to unsigned integer underflow. Change the loop
variable from size_t layer to a signed integer type such as int or ssize_t to
allow proper decrement behavior when iterating backward through the layers in
the ndW loop. Alternatively, you could restructure the loop to iterate forward
instead of backward.
- Around line 93-127: The gradient collection loop iterates through blocks in
reverse order (using blocks.rbegin()) and stores gradients in bdW, but the
gradient application loop iterates through blocks in forward order (for (block&
b : this->blocks)), causing gradients to be applied to the wrong blocks. Fix
this by changing the gradient application loop to also iterate in reverse order,
so that bdW[0] (which contains gradients from the last block) is applied to the
last block, not the first block. Use reverse iteration similar to the collection
phase when applying the gradients from bdW to each block in the blocks
container.

In `@include/neural_network.hpp`:
- Around line 58-59: The layerNorm function incorrectly uses the std::vector
range constructor by passing float values x[start] and x[end] instead of
iterators, causing undefined behavior. Replace the vector construction with
proper iterator arithmetic by using x.begin() + start and x.begin() + end as the
iterator arguments to the range constructor, which will correctly create a new
vector containing elements from the specified range.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb18cb7c-c20d-4f45-89bc-8a081d86ee66

📥 Commits

Reviewing files that changed from the base of the PR and between efc2e21 and dcc57bc.

📒 Files selected for processing (7)

include/block.hpp
include/embedder.hpp
include/layer.hpp
include/model.hpp
include/neural_network.hpp
include/self_attention.hpp
include/utility.hpp

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/embedder.hpp`:
- Line 118: The statement using the unary plus operator on tok_it is a typo that
prevents compilation. Replace the unary plus operator `+` with the pre-increment
operator `++` in the tok_it expression to properly advance the iterator to the
next token in the forward_list. This will allow the code to compile and function
correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea4bbca9-bf41-4463-9c96-388d6e616d56

📥 Commits

Reviewing files that changed from the base of the PR and between dcc57bc and e0de468.

📒 Files selected for processing (3)

include/embedder.hpp
include/layer.hpp
include/neural_network.hpp

🚧 Files skipped from review as they are similar to previous changes (2)

include/layer.hpp
include/neural_network.hpp

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/documentation.yml:
- Around line 14-15: Replace the mutable action references in the
documentation.yml workflow file with their corresponding full commit SHAs for
improved supply-chain security. Update the actions/checkout@v2 and
actions/setup-python@v2 uses statements (and any other `@v2` or `@v3` refs mentioned
in the file, including at line 30) to pin them to specific commit SHAs instead.
Additionally, add the parameter persist-credentials: false to the
actions/checkout step to prevent credential persistence in the deploy pipeline,
which further hardens the security posture of the publishing workflow.

In @.github/workflows/macos.yml:
- Line 13: Update the `actions/checkout@v2` action reference in the macos.yml
workflow to address security and maintenance concerns. Replace the deprecated v2
reference with v4 pinned to a specific commit hash instead of just the version
tag to prevent supply-chain risks. Additionally, add the `persist-credentials:
false` configuration option to disable credential persistence, which is a
security best practice for public repositories.

In @.github/workflows/pre-commit.yml:
- Around line 13-15: The GitHub Actions in the pre-commit.yml workflow are using
outdated and unpinned versions that pose maintenance and security risks. Update
the actions/checkout action from v2 to v4 and add the persist-credentials: false
parameter to enhance security. Upgrade the actions/setup-python action from v2
to v4. Finally, replace the unpinned pre-commit/action@v2.0.0 with a pinned
commit hash reference based on the v3.0.0 tag from the official
pre-commit/action repository to ensure supply-chain security and consistency.

In @.github/workflows/ubuntu.yml:
- Line 13: Update the actions/checkout action in all GitHub workflow files
(ubuntu.yml, windows.yml, pre-commit.yml, macos.yml, and documentation.yml) from
the outdated v2 to v4, pin it to a specific immutable commit SHA instead of a
version tag, and add the with configuration to set persist-credentials to false.
This prevents credential leakage between workflow runs and ensures you are using
a verified and secure version of the action. Replace each occurrence of
actions/checkout@v2 with actions/checkout@v4 pinned to a commit SHA and include
the persist-credentials configuration in the with block.
- Line 31: The workflow on line 31 uses an unsafe curl piping pattern to execute
a remote Codecov upload script, which presents a supply-chain security risk.
Replace the bash command that pipes curl output directly with the official
Codecov GitHub action, ensuring it is pinned to a specific version tag or full
SHA for reproducibility. Configure the action with appropriate authentication
credentials to enable secure uploads to Codecov.

In @.github/workflows/windows.yml:
- Line 13: Update the `actions/checkout@v2` action on line 13 to use a recent
pinned version with a commit SHA instead of a version tag, and add the
`persist-credentials: false` parameter to the uses statement. This reduces the
risk of token exposure by preventing the action from persisting the GitHub token
to the local git configuration. Replace the current checkout action with the
latest stable version pinned to its full commit SHA and include the security
parameter in the with block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d05cc6e9-5622-4b48-ad58-cf301a792215

📥 Commits

Reviewing files that changed from the base of the PR and between e0de468 and 5fd98cf.

⛔ Files ignored due to path filters (1)

.DS_Store is excluded by !**/.DS_Store

📒 Files selected for processing (9)

.github/workflows/documentation.yml
.github/workflows/macos.yml
.github/workflows/pre-commit.yml
.github/workflows/ubuntu.yml
.github/workflows/windows.yml
AIUsage.md
include/embedder.hpp
include/layer.hpp
include/model.hpp

✅ Files skipped from review due to trivial changes (1)

AIUsage.md

🚧 Files skipped from review as they are similar to previous changes (3)

include/embedder.hpp
include/layer.hpp
include/model.hpp

Bean91 added 2 commits June 20, 2026 19:09

added gradient finding, need to add the weight changing next.

5e81b42

finished backprop

dcc57bc

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Bean91 and others added 3 commits June 20, 2026 20:03

Update include/embedder.hpp

d7c0559

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update include/layer.hpp

d7efe3d

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update include/neural_network.hpp

e0de468

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Comment thread include/embedder.hpp Outdated

Bean91 added 4 commits June 21, 2026 10:33

bug fixes

f770a6d

Merge branch 'dev' of github.com:Bean91/Open-Chat into dev

cd1d1f1

bug fixes

5fd98cf

removed workflows - uneeded

5654136

coderabbitai Bot reviewed Jun 21, 2026

View reviewed changes

Bean91 merged commit c3d5287 into main Jun 21, 2026
1 check passed

Conversation

Bean91 commented Jun 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bean91 commented Jun 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 21, 2026 •

edited

Loading