Skip to content

add support for Operators for Generic target needed in MAGIA#193

Open
marchioa wants to merge 16 commits into
pulp-platform:develfrom
FondazioneChipsIT:am/magia-kernels
Open

add support for Operators for Generic target needed in MAGIA#193
marchioa wants to merge 16 commits into
pulp-platform:develfrom
FondazioneChipsIT:am/magia-kernels

Conversation

@marchioa
Copy link
Copy Markdown
Contributor

@marchioa marchioa commented May 13, 2026

This PR wants add the support for Operators needed for MAGIA that are not available in the Generic target. Together with the operators supports this PR also adds the correspondant tests.

Added

Changed

Fixed

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@marchioa marchioa force-pushed the am/magia-kernels branch from 95cc3f4 to 3b58bf9 Compare May 14, 2026 09:00
@marchioa marchioa force-pushed the am/magia-kernels branch from c938552 to 801df44 Compare May 14, 2026 16:38
@marchioa marchioa changed the title add support for Ceil operator for Generic target add support for Operators for Generic target needed in MAGIA May 15, 2026
@marchioa
Copy link
Copy Markdown
Contributor Author

Everytime I push a change, snitch CI raises a different error I cannot replicate locally (I run pytest -m snitch in a local container from the docker image ghcr.io/pulp-platform/deeploy:main).

This last CI error is test_snitch_kernels[Kernels/Integer/GEMM/Regular_RQPerRow] but I have also seen errrors from test_snitch_kernels[Kernels/Integer/iNoNorm]. However, I am not able to replicate those errors.

@marchioa marchioa marked this pull request as ready for review May 16, 2026 09:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added support for new generic operators: Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool, GlobalAveragePool, GlobalMaxPool.
  • Tests
    • Expanded kernel test coverage to include the newly supported FP32 and integer kernels.

Walkthrough

Adds parser base classes and parsers, codegen templates, bindings, layer op-count models, platform mapper registrations, kernel headers and C implementations, test entries, and changelog entries to support Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool (1D/2D), GlobalAveragePool, and GlobalMaxPool on the Generic target.

Changes

Generic target operator support

Layer / File(s) Summary
Parser infrastructure and unary operators
Deeploy/Targets/Generic/Parsers.py
Introduces UnaryElementWiseParser base class for shared metadata extraction (size, data_in/data_out), refactors ReluParser to inherit from it, adds 8 unary element-wise parsers (Sqrt, Ceil, Floor, Exp, Sigmoid, Swish, HardSigmoid, HardSwish) with optional alpha/beta attribute handling, and establishes SubParser alias.
Clip, normalization, and pooling parsers
Deeploy/Targets/Generic/Parsers.py
Updates ClipParser to validate min/max bounds, adds NormalizationParser base class and InstanceNorm/GroupNorm implementations capturing epsilon and num_groups, implements comprehensive AveragePoolParser with attribute validation and spatial dimension handling, and introduces AveragePool1DParser/AveragePool2DParser variants and global pooling parsers.
Float template implementations
Deeploy/Targets/Generic/Templates/Float*.py
Defines multiple template classes (CeilTemplate, FloorTemplate, ClipTemplate, ExpTemplate, SigmoidTemplate, SwishTemplate, HardSigmoidTemplate, HardSwishTemplate, Instance/GroupNormTemplate, GlobalAveragePool/GlobalMaxPool, AveragePool 1D/2D) implementing alignToContext to compute type_width and size metadata and embedding parameterized reference code templates.
Subtraction templates
Deeploy/Targets/Generic/Templates/{Sub,FloatSub}Template.py
Defines SubTemplate for fixed-point signed subtraction with offset computation and FloatSubTemplate for element-wise float subtraction.
Type bindings and template wiring
Deeploy/Targets/Generic/Bindings.py
Expands imports to include new template symbols and establishes binding lists wiring type patterns to the corresponding templates (BasicSubBindings, BasicCeil/Floor/Clip/Exp, BasicSigmoid/Swish/HardSigmoid/HardSwish, BasicInstanceNorm/GroupNorm, BasicAveragePool1D/2D/GlobalAveragePool/GlobalMaxPool).
Operation counting layers
Deeploy/Targets/Generic/Layers.py
Adds multiple ONNXLayer subclasses implementing computeOps() from operator representation metadata and creates SubLayer alias.
Platform integration
Deeploy/Targets/Generic/Platform.py
Updates imports for new bindings/layers/parsers, establishes mapper instances for each new op, and registers all new operators in GenericMapping.
Kernel API headers
TargetLibraries/Generic/inc/kernel/*.h
Adds header files declaring FP32 kernel signatures for pooling, elementwise math, activations, and normalization operations.
Kernel implementations
TargetLibraries/Generic/src/*.c
Implements FP32 kernels using elementwise loops, pooling window scans, and normalization math (ceilf/floorf/expf, mean/variance, etc.).
Header includes and documentation
TargetLibraries/Generic/inc/DeeployBasicMath.h, CHANGELOG.md, DeeployTest/test_generic_config.py
Updates master math header with new kernel includes, documents operator support in CHANGELOG, and registers test kernel paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

Feature

Suggested reviewers

  • Victor-Jung
  • Xeratec
  • lukamac
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding support for multiple ONNX operators (Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool, GlobalAveragePool, GlobalMaxPool) required for MAGIA on the Generic target.
Description check ✅ Passed The description clearly relates to the changeset by listing all operators added, explaining the purpose (support for MAGIA), and indicating that tests are included. The content is directly relevant to the PR changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
Deeploy/Targets/Generic/Platform.py (1)

150-151: ⚡ Quick win

Wire HardSigmoid/HardSwish to their dedicated layer classes.

HardSigmoidLayer and HardSwishLayer exist but are not used here. Using SigmoidLayer/SwishLayer is fragile and can drift from op-specific accounting later.

Suggested patch
-from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \
+from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \
     ConcatLayer, ConvLayer, ConvTransposeLayer, DebugPrintLayer, DequantLayer, DivLayer, ExpLayer, FloorLayer, \
-    GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, InstanceNormLayer, \
+    GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, HardSigmoidLayer, \
+    HardSwishLayer, InstanceNormLayer, \
     ITAMaxLayer, LayerNormLayer, MatMulLayer, MaxPoolLayer, MulLayer, PadLayer, PowLayer, QuantLayer, ReduceMeanLayer, \
     ReduceSumLayer, ReluLayer, RequantShiftLayer, ReshapeLayer, RQIntegerDivLayer, RQSiGELULayer, SigmoidLayer, \
     SliceLayer, SoftmaxLayer, SqrtLayer, SubLayer, SwishLayer, TransposeLayer
...
-    'HardSigmoid': SigmoidLayer([HardSigmoidMapper]),
-    'HardSwish': SwishLayer([HardSwishMapper]),
+    'HardSigmoid': HardSigmoidLayer([HardSigmoidMapper]),
+    'HardSwish': HardSwishLayer([HardSwishMapper]),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Platform.py` around lines 150 - 151, The mapping
currently registers 'HardSigmoid' -> SigmoidLayer and 'HardSwish' -> SwishLayer;
replace those with their dedicated classes HardSigmoidLayer and HardSwishLayer
to ensure op-specific behavior. Update the dictionary entries that reference
SigmoidLayer([HardSigmoidMapper]) and SwishLayer([HardSwishMapper]) to use
HardSigmoidLayer([HardSigmoidMapper]) and HardSwishLayer([HardSwishMapper])
respectively, keeping the mapper lists unchanged so existing mappers
(HardSigmoidMapper, HardSwishMapper) are preserved.
TargetLibraries/Generic/src/Swish_fp32.c (1)

10-16: ⚡ Quick win

Consider using a numerically stable swish formulation.

Similar to the sigmoid implementation, this formulation can encounter numerical issues when alpha * x is a large negative value, causing expf(-alpha * x) to overflow. A more stable approach branches on the sign:

  • For alpha * x >= 0: x / (1 + expf(-alpha * x))
  • For alpha * x < 0: x * expf(alpha * x) / (1 + expf(alpha * x))

This avoids computing very large exponentials and improves precision, especially important since the alpha parameter can amplify input values.

📊 Proposed numerically stable implementation
 void Swish_fp32_fp32(float32_t *data_in, float32_t *data_out, float alpha,
                      int32_t size) {
   for (int i = 0; i < size; i++) {
     float32_t x = data_in[i];
-    data_out[i] = x / (1 + expf(-alpha * x));
+    float32_t ax = alpha * x;
+    if (ax >= 0) {
+      data_out[i] = x / (1.0f + expf(-ax));
+    } else {
+      float32_t exp_ax = expf(ax);
+      data_out[i] = x * exp_ax / (1.0f + exp_ax);
+    }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Generic/src/Swish_fp32.c` around lines 10 - 16, The current
Swish_fp32_fp32 implementation can overflow when computing expf(-alpha * x);
update the loop in Swish_fp32_fp32 to use a numerically stable branch on sign of
t = alpha * x: if t >= 0 compute data_out[i] = x / (1 + expf(-t)), else compute
data_out[i] = x * expf(t) / (1 + expf(t)); keep using float32_t variables (t, x)
and unchanged arguments (data_in, data_out, alpha, size) to avoid large
exponentials and improve precision.
TargetLibraries/Generic/src/Sigmoid_fp32.c (1)

10-14: ⚡ Quick win

Consider using a numerically stable sigmoid formulation.

The current implementation 1/(1+exp(-x)) can encounter numerical issues when x is a large negative value, causing exp(-x) to overflow. While IEEE 754 produces the correct limiting value (0), a more stable and efficient formulation branches on the sign of x:

  • For x >= 0: 1 / (1 + expf(-x))
  • For x < 0: expf(x) / (1 + expf(x))

This avoids computing very large exponentials and improves precision across the input range.

📊 Proposed numerically stable implementation
 void Sigmoid_fp32_fp32(float32_t *data_in, float32_t *data_out, int32_t size) {
   for (int i = 0; i < size; i++) {
-    data_out[i] = 1 / (1 + expf(-data_in[i]));
+    float32_t x = data_in[i];
+    if (x >= 0) {
+      data_out[i] = 1.0f / (1.0f + expf(-x));
+    } else {
+      float32_t exp_x = expf(x);
+      data_out[i] = exp_x / (1.0f + exp_x);
+    }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Generic/src/Sigmoid_fp32.c` around lines 10 - 14, The
Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x)) which can compute huge
expf(-x) for large negative x; change the loop in Sigmoid_fp32_fp32 to use a
numerically stable branch: for each element x = data_in[i] if x >= 0 compute
data_out[i] = 1.0f / (1.0f + expf(-x)) else compute data_out[i] = expf(x) /
(1.0f + expf(x)); keep the same float32_t types and indexing and preserve the
for-loop over size to avoid overflow and improve precision.
Deeploy/Targets/Generic/Parsers.py (1)

2973-2978: 💤 Low value

Remove duplicate assignment of data_in.

Line 2977 duplicates the assignment from line 2974. This redundancy should be removed.

♻️ Proposed fix
     def parseNodeCtxt(self,
                       ctxt: NetworkContext,
                       node: gs.Node,
                       channels_first: bool = True) -> Tuple[NetworkContext, bool]:
         data_in = ctxt.lookup(node.inputs[0].name)
         self.operatorRepresentation['data_in'] = data_in.name
         self.operatorRepresentation['scale'] = ctxt.lookup(node.inputs[1].name).name
         self.operatorRepresentation['bias'] = ctxt.lookup(node.inputs[2].name).name
-        self.operatorRepresentation['data_in'] = data_in.name
         self.operatorRepresentation['data_out'] = ctxt.lookup(node.outputs[0].name).name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Parsers.py` around lines 2973 - 2978, The code
assigns self.operatorRepresentation['data_in'] twice for the same node input;
remove the duplicate assignment (the second
self.operatorRepresentation['data_in'] = data_in.name) so only the initial
assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate
this in the parser code where data_in is set alongside 'scale', 'bias', and
'data_out' (look for the data_in variable and self.operatorRepresentation
assignments) and delete the redundant line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 19: Update the mismatched PR URL in CHANGELOG.md so the reference [`#193`]
points to the correct PR URL by replacing the `pull/189` part with `pull/193` in
the line containing "Add support for Operators for Generic target needed in
MAGIA [`#193`]( https://github.com/pulp-platform/Deeploy/pull/189)"; ensure the
displayed number and the URL both refer to PR 193.

In `@Deeploy/Targets/Generic/Layers.py`:
- Around line 744-758: Comments in computeOps use confusable Unicode characters
(σ, α) which trigger lint RUF003; update those comments in the methods
SwishLayer.computeOps and HardSigmoidLayer.computeOps (and any adjacent comment
using σ or α) to use plain ASCII words or names (e.g., "sigma" or "sigmoid"
instead of "σ", and "alpha" instead of "α") while preserving the original
meaning (e.g., "sigma(x) = 1 / (1 + exp(-x))" and "alpha·x + beta" or "alpha * x
+ beta"); ensure the rest of the file still references the same functions/class
names and keep comment content otherwise unchanged.

In `@Deeploy/Targets/Generic/Parsers.py`:
- Line 3022: The code is reading node.attrs with a misspelled key
'count_include_pad ' (trailing space) so the lookup always falls back to 0;
update the lookup in the parser where count_include_pad is set (the line using
node.attrs.get('count_include_pad ', 0)) to use the correct key
'count_include_pad' so the parser honors user-specified values; keep the default
0 behavior if the attribute is absent.
- Around line 2911-2914: The code that sets
self.operatorRepresentation['min_val'] and ['max_val'] directly calls
.values.item() on node.inputs[1] and node.inputs[2] without checking their type;
update the Clip parsing logic (the block handling node.inputs in the parser
where 'min_val' and 'max_val' are assigned) to first verify each input is a
constant (e.g., isinstance(node.inputs[1], gs.Constant)) before calling
.values.item(), and if not a constant (variable/dynamic tensor) simply skip
assigning min_val/max_val (or leave them unset) to avoid AttributeError when
inputs are produced by other ops. Ensure you reference the same keys ('min_val',
'max_val') and input indices (1 and 2) so the change is localized to that Clip
input handling.

In `@Deeploy/Targets/Generic/Templates/SubTemplate.py`:
- Around line 17-27: Adjust the sign of the computed quantization offsets so Sub
produces q1 - q2 - z1 + z2 + zout: invert the sign of input_1_offset and remove
the leading negatives on input_2_offset and output_offset. Specifically, in
SubTemplate.py change the input_1_offset assignment that currently uses
(data_in_1._signed == 0) * int(data_in_1.nLevels / 2) to a negative version, and
change input_2_offset and output_offset to use positive (data_in_2._signed == 0)
* int(...) and (data_out._signed == 0) * int(...) respectively, then keep
operatorRepresentation['offset'] = input_1_offset + input_2_offset +
output_offset.

In `@TargetLibraries/Generic/inc/kernel/Clip.h`:
- Around line 16-18: The section banner in the header incorrectly reads "Ceil"
while this file defines clipping utilities (Clip); update the comment block
above the Clip declarations to read "Clip" instead of "Ceil" so the banner
matches the symbol being defined (look for the banner block with "Ceil" near the
Clip.h top and change it to "Clip").

In `@TargetLibraries/Generic/inc/kernel/GlobalMaxPool.h`:
- Around line 7-9: The include guard macro in GlobalMaxPool.h is incorrectly
using __DEEPLOY_BASIC_MATH_GLOBALAVERAGEPOOL_KERNEL_HEADER_ (collides with
GlobalAveragePool.h); update the guard macro to a unique name like
__DEEPLOY_BASIC_MATH_GLOBALMAXPOOL_KERNEL_HEADER_ at the `#ifndef` and `#define`
lines and update the corresponding `#endif` comment if present so the header no
longer conflicts with GlobalAveragePool.h.

In `@TargetLibraries/Generic/src/AveragePool_fp32.c`:
- Around line 17-20: The AveragePool2d_fp32_fp32 function currently misses
checks for kernel_h == 0 and kernel_w == 0 and divides by variable count without
ensuring count > 0; update the initial guard (the early-return condition around
N/C/stride/pad/kernel size) to also return when kernel_h == 0 or kernel_w == 0,
and in the pooling loop where the output is computed (the place that divides by
count), ensure you protect against count == 0 by skipping the division or
returning a safe value (e.g., set output to 0 or continue) when count is zero;
reference AveragePool2d_fp32_fp32 and the local variable count to locate the
changes.
- Around line 61-66: The initial validity check incorrectly rejects cases where
padding makes the effective length sufficient and also risks divide-by-zero when
computing averages; change the condition from "if (N == 0 || C == 0 || L <
kernel_len || stride == 0)" to check the padded length: "if (N == 0 || C == 0 ||
(L + pad_left + pad_right) < kernel_len || stride == 0)". Then before dividing
by count in the averaging loop (the division at the use of variable count),
ensure count is > 0 (skip division or continue when count == 0) to prevent
divide-by-zero, updating the code that computes L_out and the averaging step
accordingly (use L_out = (L + pad_left + pad_right - kernel_len) / stride + 1 as
shown and guard the division by checking count).

In `@TargetLibraries/Generic/src/GlobalAveragePool_fp32.c`:
- Around line 19-23: Add an explicit guard for spatial_size == 0 before
performing the reduction in GlobalAveragePool_fp32.c: inside the loop over
channels (the code using spatial_size, sum, x, dst, n, C, c), check if
spatial_size is zero and handle it (e.g., set dst[n * C + c] = 0.0f or another
defined fallback) and continue/skip the summation loop to avoid division by
zero; ensure the early branch replaces the sum/division path so no
sum/spatial_size division occurs when spatial_size == 0.

In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c`:
- Around line 17-20: The code dereferences x[0] unconditionally in
GlobalMaxPool_fp32 (variable x and float32_t max) which will read out-of-bounds
when spatial_size == 0; fix by checking spatial_size before accessing x[0]
(e.g., if spatial_size == 0 handle that case — set max to a safe value or skip
this channel — and continue) and only run the for-loop and initial assignment
when spatial_size > 0, ensuring no reads occur when spatial_size is zero.

In `@TargetLibraries/Generic/src/GroupNormalization_fp32.c`:
- Around line 17-39: Before computing group statistics add validation to guard
against invalid shapes: check that num_groups > 0, that num_channels %
num_groups == 0, and that spatial > 0 (so group_elements > 0) before calculating
channels_per_group, group_elements, or entering the batch/group loops; if any
check fails, return an error code or early-fail (consistent with the surrounding
API) rather than proceeding to the mean/variance loops. Update the pre-loop
logic around the variables channels_per_group, group_elements, slice and the
outer loops (referencing num_groups, num_channels, spatial, batch_size,
group_elements) to perform these checks and handle the error path.

In `@TargetLibraries/Generic/src/InstanceNormalization_fp32.c`:
- Around line 27-37: The mean/variance calculation divides by the variable
spatial without guarding against spatial == 0; update the reduction in
InstanceNormalization_fp32.c to check spatial before dividing: if spatial is
zero, set mean and var to 0 (or appropriate neutral values) and skip the loops,
otherwise perform the existing summation and divide by spatial. Ensure you
reference and update the computations that assign mean (from sum / spatial) and
var (from var / spatial) and avoid any division when spatial == 0.

---

Nitpick comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2973-2978: The code assigns self.operatorRepresentation['data_in']
twice for the same node input; remove the duplicate assignment (the second
self.operatorRepresentation['data_in'] = data_in.name) so only the initial
assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate
this in the parser code where data_in is set alongside 'scale', 'bias', and
'data_out' (look for the data_in variable and self.operatorRepresentation
assignments) and delete the redundant line.

In `@Deeploy/Targets/Generic/Platform.py`:
- Around line 150-151: The mapping currently registers 'HardSigmoid' ->
SigmoidLayer and 'HardSwish' -> SwishLayer; replace those with their dedicated
classes HardSigmoidLayer and HardSwishLayer to ensure op-specific behavior.
Update the dictionary entries that reference SigmoidLayer([HardSigmoidMapper])
and SwishLayer([HardSwishMapper]) to use HardSigmoidLayer([HardSigmoidMapper])
and HardSwishLayer([HardSwishMapper]) respectively, keeping the mapper lists
unchanged so existing mappers (HardSigmoidMapper, HardSwishMapper) are
preserved.

In `@TargetLibraries/Generic/src/Sigmoid_fp32.c`:
- Around line 10-14: The Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x))
which can compute huge expf(-x) for large negative x; change the loop in
Sigmoid_fp32_fp32 to use a numerically stable branch: for each element x =
data_in[i] if x >= 0 compute data_out[i] = 1.0f / (1.0f + expf(-x)) else compute
data_out[i] = expf(x) / (1.0f + expf(x)); keep the same float32_t types and
indexing and preserve the for-loop over size to avoid overflow and improve
precision.

In `@TargetLibraries/Generic/src/Swish_fp32.c`:
- Around line 10-16: The current Swish_fp32_fp32 implementation can overflow
when computing expf(-alpha * x); update the loop in Swish_fp32_fp32 to use a
numerically stable branch on sign of t = alpha * x: if t >= 0 compute
data_out[i] = x / (1 + expf(-t)), else compute data_out[i] = x * expf(t) / (1 +
expf(t)); keep using float32_t variables (t, x) and unchanged arguments
(data_in, data_out, alpha, size) to avoid large exponentials and improve
precision.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c4cc84a6-6698-4450-b7df-b2d472dcd8be

📥 Commits

Reviewing files that changed from the base of the PR and between c4870e1 and bbaffb7.

📒 Files selected for processing (96)
  • CHANGELOG.md
  • Deeploy/Targets/Generic/Bindings.py
  • Deeploy/Targets/Generic/Layers.py
  • Deeploy/Targets/Generic/Parsers.py
  • Deeploy/Targets/Generic/Platform.py
  • Deeploy/Targets/Generic/Templates/FloatAveragePoolTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatCeilTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatClipTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatExpTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatFloorTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatGlobalAveragePoolTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatGlobalMaxPoolTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatGroupNormTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatHardSigmoidTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatHardSwishTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatInstanceNormTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatSigmoidTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatSubTemplate.py
  • Deeploy/Targets/Generic/Templates/FloatSwishTemplate.py
  • Deeploy/Targets/Generic/Templates/SubTemplate.py
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/network.onnx
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/network.onnx
  • DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Ceil/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Ceil/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Ceil/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Clip/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Clip/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Clip/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Exp/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Exp/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Exp/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Floor/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Floor/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Floor/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/network.onnx
  • DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/network.onnx
  • DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/GroupNorm/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/GroupNorm/network.onnx
  • DeeployTest/Tests/Kernels/FP32/GroupNorm/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/HardSigmoid/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/HardSigmoid/network.onnx
  • DeeployTest/Tests/Kernels/FP32/HardSigmoid/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/HardSwish/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/HardSwish/network.onnx
  • DeeployTest/Tests/Kernels/FP32/HardSwish/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/InstanceNorm/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/InstanceNorm/network.onnx
  • DeeployTest/Tests/Kernels/FP32/InstanceNorm/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Sigmoid/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Sigmoid/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Sigmoid/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Sub/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Sub/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Sub/outputs.npz
  • DeeployTest/Tests/Kernels/FP32/Swish/inputs.npz
  • DeeployTest/Tests/Kernels/FP32/Swish/network.onnx
  • DeeployTest/Tests/Kernels/FP32/Swish/outputs.npz
  • DeeployTest/Tests/Kernels/Integer/Sub/inputs.npz
  • DeeployTest/Tests/Kernels/Integer/Sub/network.onnx
  • DeeployTest/Tests/Kernels/Integer/Sub/outputs.npz
  • DeeployTest/test_generic_config.py
  • TargetLibraries/Generic/inc/DeeployBasicMath.h
  • TargetLibraries/Generic/inc/kernel/AveragePool.h
  • TargetLibraries/Generic/inc/kernel/Ceil.h
  • TargetLibraries/Generic/inc/kernel/Clip.h
  • TargetLibraries/Generic/inc/kernel/Exp.h
  • TargetLibraries/Generic/inc/kernel/Floor.h
  • TargetLibraries/Generic/inc/kernel/GlobalAveragePool.h
  • TargetLibraries/Generic/inc/kernel/GlobalMaxPool.h
  • TargetLibraries/Generic/inc/kernel/GroupNorm.h
  • TargetLibraries/Generic/inc/kernel/HardSigmoid.h
  • TargetLibraries/Generic/inc/kernel/HardSwish.h
  • TargetLibraries/Generic/inc/kernel/InstanceNorm.h
  • TargetLibraries/Generic/inc/kernel/Sigmoid.h
  • TargetLibraries/Generic/inc/kernel/Swish.h
  • TargetLibraries/Generic/src/AveragePool_fp32.c
  • TargetLibraries/Generic/src/Ceil_fp32.c
  • TargetLibraries/Generic/src/Clip_fp32.c
  • TargetLibraries/Generic/src/Exp_fp32.c
  • TargetLibraries/Generic/src/Floor_fp32.c
  • TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
  • TargetLibraries/Generic/src/GlobalMaxPool_fp32.c
  • TargetLibraries/Generic/src/GroupNormalization_fp32.c
  • TargetLibraries/Generic/src/HardSigmoid_fp32.c
  • TargetLibraries/Generic/src/HardSwish_fp32.c
  • TargetLibraries/Generic/src/InstanceNormalization_fp32.c
  • TargetLibraries/Generic/src/Sigmoid_fp32.c
  • TargetLibraries/Generic/src/Swish_fp32.c

Comment thread CHANGELOG.md Outdated
Comment thread Deeploy/Targets/Generic/Layers.py Outdated
Comment thread Deeploy/Targets/Generic/Parsers.py Outdated
Comment thread Deeploy/Targets/Generic/Parsers.py Outdated
Comment thread Deeploy/Targets/Generic/Templates/SubTemplate.py
Comment thread TargetLibraries/Generic/src/AveragePool_fp32.c Outdated
Comment thread TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
Comment on lines +17 to +20
const float32_t *x = src + (n * C + c) * spatial_size;

float32_t max = x[0];
for (uint32_t i = 1; i < spatial_size; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Prevent out-of-bounds read when spatial_size == 0.

At Line 19, x[0] is dereferenced unconditionally. If spatial_size is zero, this is invalid memory access.

Proposed fix
 void GlobalMaxPool_fp32_fp32(float32_t const *__restrict__ src,
                              float32_t *__restrict__ dst, uint32_t N,
                              uint32_t C, uint32_t spatial_size) {
+  if (spatial_size == 0) {
+    return; // invalid shape for max pooling
+  }

   for (uint32_t n = 0; n < N; n++) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c` around lines 17 - 20, The
code dereferences x[0] unconditionally in GlobalMaxPool_fp32 (variable x and
float32_t max) which will read out-of-bounds when spatial_size == 0; fix by
checking spatial_size before accessing x[0] (e.g., if spatial_size == 0 handle
that case — set max to a safe value or skip this channel — and continue) and
only run the for-loop and initial assignment when spatial_size > 0, ensuring no
reads occur when spatial_size is zero.

Comment thread TargetLibraries/Generic/src/GroupNormalization_fp32.c
Comment thread TargetLibraries/Generic/src/InstanceNormalization_fp32.c
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2979-2981: The code currently assumes NCHW when deriving
batch_size/num_channels/spatial from data_in; update the logic that sets
operatorRepresentation['batch_size'], ['num_channels'], ['spatial'] (and any
length/height/width or spatial_size calculations) to respect a channels_first
boolean: always set batch_size = data_in.shape[0]; set channel_axis = 1 if
channels_first else -1; set num_channels = data_in.shape[channel_axis]; set
spatial_axes = tuple(range(2, data_in.ndim)) if channels_first else
tuple(range(1, data_in.ndim-1)); compute spatial = np.prod([data_in.shape[i] for
i in spatial_axes]); apply the same fix to the other identical blocks that set
these fields (the blocks around the other occurrences noted).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d251f5e-c829-41fa-8db7-6e6bf6f276f0

📥 Commits

Reviewing files that changed from the base of the PR and between bbaffb7 and 5678b47.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • Deeploy/Targets/Generic/Layers.py
  • Deeploy/Targets/Generic/Parsers.py
  • Deeploy/Targets/Generic/Templates/SubTemplate.py
  • TargetLibraries/Generic/src/AveragePool_fp32.c
  • TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
  • TargetLibraries/Generic/src/GroupNormalization_fp32.c
  • TargetLibraries/Generic/src/InstanceNormalization_fp32.c
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
  • Deeploy/Targets/Generic/Templates/SubTemplate.py
  • TargetLibraries/Generic/src/AveragePool_fp32.c
  • TargetLibraries/Generic/src/GroupNormalization_fp32.c
  • TargetLibraries/Generic/src/InstanceNormalization_fp32.c
  • Deeploy/Targets/Generic/Layers.py

Comment on lines +2979 to +2981
self.operatorRepresentation['batch_size'] = data_in.shape[0]
self.operatorRepresentation['num_channels'] = data_in.shape[1]
self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Respect channels_first when deriving channel/spatial metadata.

These new parser paths always read channels/spatial as NCHW (shape[1], shape[2:]). When channels_first=False, num_channels, spatial, length/height/width, and spatial_size become wrong, which can mis-parameterize generated kernels.

Proposed fix
 class NormalizationParser(NodeParser):
@@
     def parseNodeCtxt(self,
                       ctxt: NetworkContext,
                       node: gs.Node,
                       channels_first: bool = True) -> Tuple[NetworkContext, bool]:
         data_in = ctxt.lookup(node.inputs[0].name)
@@
-        self.operatorRepresentation['batch_size'] = data_in.shape[0]
-        self.operatorRepresentation['num_channels'] = data_in.shape[1]
-        self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:])
+        self.operatorRepresentation['batch_size'] = data_in.shape[0]
+        if channels_first:
+            self.operatorRepresentation['num_channels'] = data_in.shape[1]
+            spatial_shape = data_in.shape[2:]
+        else:
+            self.operatorRepresentation['num_channels'] = data_in.shape[-1]
+            spatial_shape = data_in.shape[1:-1]
+        self.operatorRepresentation['spatial'] = np.prod(spatial_shape)
         return ctxt, True
@@
 class AveragePoolParser(NodeParser):
@@
     def parseNodeCtxt(self,
                       ctxt: NetworkContext,
                       node: gs.Node,
                       channels_first: bool = True) -> Tuple[NetworkContext, bool]:
@@
-        self.operatorRepresentation['batch_size'] = data_in.shape[0]
-        self.operatorRepresentation['num_channels'] = data_in.shape[1]
+        self.operatorRepresentation['batch_size'] = data_in.shape[0]
+        self.operatorRepresentation['num_channels'] = data_in.shape[1] if channels_first else data_in.shape[-1]
         self.operatorRepresentation['data_out_size'] = int(np.prod(data_out.shape))
 
-        spatial_shape = data_in.shape[2:]
+        spatial_shape = data_in.shape[2:] if channels_first else data_in.shape[1:-1]
         if len(self.operatorRepresentation['kernel_shape']) != len(spatial_shape):
             return ctxt, False
@@
 class GlobalPoolParser(NodeParser):
@@
     def parseNodeCtxt(self,
                       ctxt: NetworkContext,
                       node: gs.Node,
                       channels_first: bool = True) -> Tuple[NetworkContext, bool]:
@@
-        self.operatorRepresentation['batch_size'] = data_in.shape[0]
-        self.operatorRepresentation['num_channels'] = data_in.shape[1]
-        self.operatorRepresentation['spatial_size'] = np.prod(data_in.shape[2:])
+        self.operatorRepresentation['batch_size'] = data_in.shape[0]
+        if channels_first:
+            self.operatorRepresentation['num_channels'] = data_in.shape[1]
+            spatial_shape = data_in.shape[2:]
+        else:
+            self.operatorRepresentation['num_channels'] = data_in.shape[-1]
+            spatial_shape = data_in.shape[1:-1]
+        self.operatorRepresentation['spatial_size'] = np.prod(spatial_shape)

Also applies to: 3059-3071, 3104-3106

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Parsers.py` around lines 2979 - 2981, The code
currently assumes NCHW when deriving batch_size/num_channels/spatial from
data_in; update the logic that sets operatorRepresentation['batch_size'],
['num_channels'], ['spatial'] (and any length/height/width or spatial_size
calculations) to respect a channels_first boolean: always set batch_size =
data_in.shape[0]; set channel_axis = 1 if channels_first else -1; set
num_channels = data_in.shape[channel_axis]; set spatial_axes = tuple(range(2,
data_in.ndim)) if channels_first else tuple(range(1, data_in.ndim-1)); compute
spatial = np.prod([data_in.shape[i] for i in spatial_axes]); apply the same fix
to the other identical blocks that set these fields (the blocks around the other
occurrences noted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant