Conversation
📝 WalkthroughWalkthroughBackpropagation is added end-to-end across ChangesBackpropagation and Training
CI and Documentation Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@include/embedder.hpp`:
- Around line 111-117: The backward method has two issues: it uses += instead of
-= for gradient updates (inconsistent with layer::changeOne and
selfAttention::changeOne), and it calls std::next(this->toks.begin(), i) inside
the nested loop, causing O(n²) complexity for a forward_list. Fix this by
changing the += operator to -= to match the gradient descent pattern, and
advance a single iterator incrementally before the inner loop starts, reusing it
across loop iterations instead of calling std::next(this->toks.begin(), i)
repeatedly.
In `@include/layer.hpp`:
- Around line 77-82: The feedForward method in the Layer class fails when
processing multi-row input matrices because utility::add requires exact
dimension matching, but the result of utility::dot(x, this->weights) has shape
(x.rows, n_out) while this->biases has shape (1, n_out). Instead of using
utility::add directly on these mismatched dimensions, manually broadcast the
biases across all rows of the dot product result before adding them together,
ensuring the bias vector is applied consistently to each row of the output.
- Around line 84-101: The backward method does not account for the ReLU
activation derivative, which means gradients are not properly masked. Add a
member variable to cache the pre-ReLU activation values during the forward pass,
then in the backward method, create a derivative mask where elements are 1 where
pre-activation values are greater than 0 and 0 elsewhere. Apply this mask to dZ
before computing dW, dX, and db to ensure gradients are zeroed where the
pre-activation was less than or equal to 0.
In `@include/model.hpp`:
- Around line 115-125: The inner loop variable l in all three matrix update
loops is incorrectly bounded by adW[n].rows when it should be bounded by
adW[n].cols. This affects the three changeOne calls for updating the 'q', 'k',
and 'v' matrices in the gradient descent updates. Replace the inner loop
condition for each of the three loops (where k iterates over rows for adW[0],
adW[1], and adW[2]) so that the l variable iterates up to the column count
(.cols) instead of the row count (.rows) to properly traverse all matrix
columns.
- Around line 84-88: The oneHot matrix creation has incorrect dimensions. Since
dist has shape (1, vocab_size), the oneHot matrix should also be created with
shape (1, vocab_size) instead of (1, 1). Change the oneHot initialization from
utility::matrix(dist.rows, 1) to utility::matrix(dist.rows, dist.cols).
Additionally, update the indexing to set the hot value from oneHot[next][0] = 1
to oneHot[0][next] = 1 so the element at column position next (the token index)
is set correctly in the single row matrix.
- Around line 106-113: The backward loop in the backpropagation code uses size_t
for the layer variable, which is an unsigned type, causing an infinite loop when
layer decrements below zero due to unsigned integer underflow. Change the loop
variable from size_t layer to a signed integer type such as int or ssize_t to
allow proper decrement behavior when iterating backward through the layers in
the ndW loop. Alternatively, you could restructure the loop to iterate forward
instead of backward.
- Around line 93-127: The gradient collection loop iterates through blocks in
reverse order (using blocks.rbegin()) and stores gradients in bdW, but the
gradient application loop iterates through blocks in forward order (for (block&
b : this->blocks)), causing gradients to be applied to the wrong blocks. Fix
this by changing the gradient application loop to also iterate in reverse order,
so that bdW[0] (which contains gradients from the last block) is applied to the
last block, not the first block. Use reverse iteration similar to the collection
phase when applying the gradients from bdW to each block in the blocks
container.
In `@include/neural_network.hpp`:
- Around line 58-59: The layerNorm function incorrectly uses the std::vector
range constructor by passing float values x[start] and x[end] instead of
iterators, causing undefined behavior. Replace the vector construction with
proper iterator arithmetic by using x.begin() + start and x.begin() + end as the
iterator arguments to the range constructor, which will correctly create a new
vector containing elements from the specified range.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bb18cb7c-c20d-4f45-89bc-8a081d86ee66
📒 Files selected for processing (7)
include/block.hppinclude/embedder.hppinclude/layer.hppinclude/model.hppinclude/neural_network.hppinclude/self_attention.hppinclude/utility.hpp
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@include/embedder.hpp`:
- Line 118: The statement using the unary plus operator on tok_it is a typo that
prevents compilation. Replace the unary plus operator `+` with the pre-increment
operator `++` in the tok_it expression to properly advance the iterator to the
next token in the forward_list. This will allow the code to compile and function
correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea4bbca9-bf41-4463-9c96-388d6e616d56
📒 Files selected for processing (3)
include/embedder.hppinclude/layer.hppinclude/neural_network.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
- include/layer.hpp
- include/neural_network.hpp
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/documentation.yml:
- Around line 14-15: Replace the mutable action references in the
documentation.yml workflow file with their corresponding full commit SHAs for
improved supply-chain security. Update the actions/checkout@v2 and
actions/setup-python@v2 uses statements (and any other `@v2` or `@v3` refs mentioned
in the file, including at line 30) to pin them to specific commit SHAs instead.
Additionally, add the parameter persist-credentials: false to the
actions/checkout step to prevent credential persistence in the deploy pipeline,
which further hardens the security posture of the publishing workflow.
In @.github/workflows/macos.yml:
- Line 13: Update the `actions/checkout@v2` action reference in the macos.yml
workflow to address security and maintenance concerns. Replace the deprecated v2
reference with v4 pinned to a specific commit hash instead of just the version
tag to prevent supply-chain risks. Additionally, add the `persist-credentials:
false` configuration option to disable credential persistence, which is a
security best practice for public repositories.
In @.github/workflows/pre-commit.yml:
- Around line 13-15: The GitHub Actions in the pre-commit.yml workflow are using
outdated and unpinned versions that pose maintenance and security risks. Update
the actions/checkout action from v2 to v4 and add the persist-credentials: false
parameter to enhance security. Upgrade the actions/setup-python action from v2
to v4. Finally, replace the unpinned pre-commit/action@v2.0.0 with a pinned
commit hash reference based on the v3.0.0 tag from the official
pre-commit/action repository to ensure supply-chain security and consistency.
In @.github/workflows/ubuntu.yml:
- Line 13: Update the actions/checkout action in all GitHub workflow files
(ubuntu.yml, windows.yml, pre-commit.yml, macos.yml, and documentation.yml) from
the outdated v2 to v4, pin it to a specific immutable commit SHA instead of a
version tag, and add the with configuration to set persist-credentials to false.
This prevents credential leakage between workflow runs and ensures you are using
a verified and secure version of the action. Replace each occurrence of
actions/checkout@v2 with actions/checkout@v4 pinned to a commit SHA and include
the persist-credentials configuration in the with block.
- Line 31: The workflow on line 31 uses an unsafe curl piping pattern to execute
a remote Codecov upload script, which presents a supply-chain security risk.
Replace the bash command that pipes curl output directly with the official
Codecov GitHub action, ensuring it is pinned to a specific version tag or full
SHA for reproducibility. Configure the action with appropriate authentication
credentials to enable secure uploads to Codecov.
In @.github/workflows/windows.yml:
- Line 13: Update the `actions/checkout@v2` action on line 13 to use a recent
pinned version with a commit SHA instead of a version tag, and add the
`persist-credentials: false` parameter to the uses statement. This reduces the
risk of token exposure by preventing the action from persisting the GitHub token
to the local git configuration. Replace the current checkout action with the
latest stable version pinned to its full commit SHA and include the security
parameter in the with block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d05cc6e9-5622-4b48-ad58-cf301a792215
⛔ Files ignored due to path filters (1)
.DS_Storeis excluded by!**/.DS_Store
📒 Files selected for processing (9)
.github/workflows/documentation.yml.github/workflows/macos.yml.github/workflows/pre-commit.yml.github/workflows/ubuntu.yml.github/workflows/windows.ymlAIUsage.mdinclude/embedder.hppinclude/layer.hppinclude/model.hpp
✅ Files skipped from review due to trivial changes (1)
- AIUsage.md
🚧 Files skipped from review as they are similar to previous changes (3)
- include/embedder.hpp
- include/layer.hpp
- include/model.hpp
Summary by CodeRabbit
Release Notes