add support for Operators for Generic target needed in MAGIA by marchioa · Pull Request #193 · pulp-platform/Deeploy

marchioa · 2026-05-13T10:06:06Z

This PR wants add the support for Operators needed for MAGIA that are not available in the Generic target. Together with the operators supports this PR also adds the correspondant tests.

Added

Changed

Fixed

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

…th other targets

…c target

…rators for Generic target

marchioa · 2026-05-16T09:09:21Z

Everytime I push a change, snitch CI raises a different error I cannot replicate locally (I run pytest -m snitch in a local container from the docker image ghcr.io/pulp-platform/deeploy:main).

This last CI error is test_snitch_kernels[Kernels/Integer/GEMM/Regular_RQPerRow] but I have also seen errrors from test_snitch_kernels[Kernels/Integer/iNoNorm]. However, I am not able to replicate those errors.

coderabbitai · 2026-05-16T09:15:40Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added support for new generic operators: Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool, GlobalAveragePool, GlobalMaxPool.
Tests
- Expanded kernel test coverage to include the newly supported FP32 and integer kernels.

Walkthrough

Adds parser base classes and parsers, codegen templates, bindings, layer op-count models, platform mapper registrations, kernel headers and C implementations, test entries, and changelog entries to support Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool (1D/2D), GlobalAveragePool, and GlobalMaxPool on the Generic target.

Changes

Generic target operator support

Layer / File(s)	Summary
Parser infrastructure and unary operators `Deeploy/Targets/Generic/Parsers.py`	Introduces `UnaryElementWiseParser` base class for shared metadata extraction (size, data_in/data_out), refactors `ReluParser` to inherit from it, adds 8 unary element-wise parsers (Sqrt, Ceil, Floor, Exp, Sigmoid, Swish, HardSigmoid, HardSwish) with optional alpha/beta attribute handling, and establishes `SubParser` alias.
Clip, normalization, and pooling parsers `Deeploy/Targets/Generic/Parsers.py`	Updates `ClipParser` to validate min/max bounds, adds `NormalizationParser` base class and InstanceNorm/GroupNorm implementations capturing epsilon and num_groups, implements comprehensive `AveragePoolParser` with attribute validation and spatial dimension handling, and introduces `AveragePool1DParser`/`AveragePool2DParser` variants and global pooling parsers.
Float template implementations `Deeploy/Targets/Generic/Templates/Float*.py`	Defines multiple template classes (CeilTemplate, FloorTemplate, ClipTemplate, ExpTemplate, SigmoidTemplate, SwishTemplate, HardSigmoidTemplate, HardSwishTemplate, Instance/GroupNormTemplate, GlobalAveragePool/GlobalMaxPool, AveragePool 1D/2D) implementing `alignToContext` to compute type_width and size metadata and embedding parameterized reference code templates.
Subtraction templates `Deeploy/Targets/Generic/Templates/{Sub,FloatSub}Template.py`	Defines `SubTemplate` for fixed-point signed subtraction with offset computation and `FloatSubTemplate` for element-wise float subtraction.
Type bindings and template wiring `Deeploy/Targets/Generic/Bindings.py`	Expands imports to include new template symbols and establishes binding lists wiring type patterns to the corresponding templates (BasicSubBindings, BasicCeil/Floor/Clip/Exp, BasicSigmoid/Swish/HardSigmoid/HardSwish, BasicInstanceNorm/GroupNorm, BasicAveragePool1D/2D/GlobalAveragePool/GlobalMaxPool).
Operation counting layers `Deeploy/Targets/Generic/Layers.py`	Adds multiple `ONNXLayer` subclasses implementing `computeOps()` from operator representation metadata and creates `SubLayer` alias.
Platform integration `Deeploy/Targets/Generic/Platform.py`	Updates imports for new bindings/layers/parsers, establishes mapper instances for each new op, and registers all new operators in `GenericMapping`.
Kernel API headers `TargetLibraries/Generic/inc/kernel/*.h`	Adds header files declaring FP32 kernel signatures for pooling, elementwise math, activations, and normalization operations.
Kernel implementations `TargetLibraries/Generic/src/*.c`	Implements FP32 kernels using elementwise loops, pooling window scans, and normalization math (ceilf/floorf/expf, mean/variance, etc.).
Header includes and documentation `TargetLibraries/Generic/inc/DeeployBasicMath.h`, `CHANGELOG.md`, `DeeployTest/test_generic_config.py`	Updates master math header with new kernel includes, documents operator support in CHANGELOG, and registers test kernel paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

Feature

Suggested reviewers

Victor-Jung
Xeratec
lukamac

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding support for multiple ONNX operators (Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool, GlobalAveragePool, GlobalMaxPool) required for MAGIA on the Generic target.
Description check	✅ Passed	The description clearly relates to the changeset by listing all operators added, explaining the purpose (support for MAGIA), and indicating that tests are included. The content is directly relevant to the PR changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (4)

Deeploy/Targets/Generic/Platform.py (1)

150-151: ⚡ Quick win

Wire HardSigmoid/HardSwish to their dedicated layer classes.

HardSigmoidLayer and HardSwishLayer exist but are not used here. Using SigmoidLayer/SwishLayer is fragile and can drift from op-specific accounting later.

Suggested patch

-from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \
+from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \
     ConcatLayer, ConvLayer, ConvTransposeLayer, DebugPrintLayer, DequantLayer, DivLayer, ExpLayer, FloorLayer, \
-    GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, InstanceNormLayer, \
+    GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, HardSigmoidLayer, \
+    HardSwishLayer, InstanceNormLayer, \
     ITAMaxLayer, LayerNormLayer, MatMulLayer, MaxPoolLayer, MulLayer, PadLayer, PowLayer, QuantLayer, ReduceMeanLayer, \
     ReduceSumLayer, ReluLayer, RequantShiftLayer, ReshapeLayer, RQIntegerDivLayer, RQSiGELULayer, SigmoidLayer, \
     SliceLayer, SoftmaxLayer, SqrtLayer, SubLayer, SwishLayer, TransposeLayer
...
-    'HardSigmoid': SigmoidLayer([HardSigmoidMapper]),
-    'HardSwish': SwishLayer([HardSwishMapper]),
+    'HardSigmoid': HardSigmoidLayer([HardSigmoidMapper]),
+    'HardSwish': HardSwishLayer([HardSwishMapper]),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Platform.py` around lines 150 - 151, The mapping
currently registers 'HardSigmoid' -> SigmoidLayer and 'HardSwish' -> SwishLayer;
replace those with their dedicated classes HardSigmoidLayer and HardSwishLayer
to ensure op-specific behavior. Update the dictionary entries that reference
SigmoidLayer([HardSigmoidMapper]) and SwishLayer([HardSwishMapper]) to use
HardSigmoidLayer([HardSigmoidMapper]) and HardSwishLayer([HardSwishMapper])
respectively, keeping the mapper lists unchanged so existing mappers
(HardSigmoidMapper, HardSwishMapper) are preserved.

TargetLibraries/Generic/src/Swish_fp32.c (1)

10-16: ⚡ Quick win

Consider using a numerically stable swish formulation.

Similar to the sigmoid implementation, this formulation can encounter numerical issues when alpha * x is a large negative value, causing expf(-alpha * x) to overflow. A more stable approach branches on the sign:

For alpha * x >= 0: x / (1 + expf(-alpha * x))

For alpha * x < 0: x * expf(alpha * x) / (1 + expf(alpha * x))

This avoids computing very large exponentials and improves precision, especially important since the alpha parameter can amplify input values.
📊 Proposed numerically stable implementation
 void Swish_fp32_fp32(float32_t *data_in, float32_t *data_out, float alpha,
                      int32_t size) {
   for (int i = 0; i < size; i++) {
     float32_t x = data_in[i];
-    data_out[i] = x / (1 + expf(-alpha * x));
+    float32_t ax = alpha * x;
+    if (ax >= 0) {
+      data_out[i] = x / (1.0f + expf(-ax));
+    } else {
+      float32_t exp_ax = expf(ax);
+      data_out[i] = x * exp_ax / (1.0f + exp_ax);
+    }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Generic/src/Swish_fp32.c` around lines 10 - 16, The current
Swish_fp32_fp32 implementation can overflow when computing expf(-alpha * x);
update the loop in Swish_fp32_fp32 to use a numerically stable branch on sign of
t = alpha * x: if t >= 0 compute data_out[i] = x / (1 + expf(-t)), else compute
data_out[i] = x * expf(t) / (1 + expf(t)); keep using float32_t variables (t, x)
and unchanged arguments (data_in, data_out, alpha, size) to avoid large
exponentials and improve precision.

TargetLibraries/Generic/src/Sigmoid_fp32.c (1)

10-14: ⚡ Quick win

Consider using a numerically stable sigmoid formulation.

The current implementation 1/(1+exp(-x)) can encounter numerical issues when x is a large negative value, causing exp(-x) to overflow. While IEEE 754 produces the correct limiting value (0), a more stable and efficient formulation branches on the sign of x:

For x >= 0: 1 / (1 + expf(-x))

For x < 0: expf(x) / (1 + expf(x))

This avoids computing very large exponentials and improves precision across the input range.
📊 Proposed numerically stable implementation
 void Sigmoid_fp32_fp32(float32_t *data_in, float32_t *data_out, int32_t size) {
   for (int i = 0; i < size; i++) {
-    data_out[i] = 1 / (1 + expf(-data_in[i]));
+    float32_t x = data_in[i];
+    if (x >= 0) {
+      data_out[i] = 1.0f / (1.0f + expf(-x));
+    } else {
+      float32_t exp_x = expf(x);
+      data_out[i] = exp_x / (1.0f + exp_x);
+    }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@TargetLibraries/Generic/src/Sigmoid_fp32.c` around lines 10 - 14, The
Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x)) which can compute huge
expf(-x) for large negative x; change the loop in Sigmoid_fp32_fp32 to use a
numerically stable branch: for each element x = data_in[i] if x >= 0 compute
data_out[i] = 1.0f / (1.0f + expf(-x)) else compute data_out[i] = expf(x) /
(1.0f + expf(x)); keep the same float32_t types and indexing and preserve the
for-loop over size to avoid overflow and improve precision.

Deeploy/Targets/Generic/Parsers.py (1)

2973-2978: 💤 Low value

Remove duplicate assignment of data_in.

Line 2977 duplicates the assignment from line 2974. This redundancy should be removed.

♻️ Proposed fix

     def parseNodeCtxt(self,
                       ctxt: NetworkContext,
                       node: gs.Node,
                       channels_first: bool = True) -> Tuple[NetworkContext, bool]:
         data_in = ctxt.lookup(node.inputs[0].name)
         self.operatorRepresentation['data_in'] = data_in.name
         self.operatorRepresentation['scale'] = ctxt.lookup(node.inputs[1].name).name
         self.operatorRepresentation['bias'] = ctxt.lookup(node.inputs[2].name).name
-        self.operatorRepresentation['data_in'] = data_in.name
         self.operatorRepresentation['data_out'] = ctxt.lookup(node.outputs[0].name).name

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Deeploy/Targets/Generic/Parsers.py` around lines 2973 - 2978, The code
assigns self.operatorRepresentation['data_in'] twice for the same node input;
remove the duplicate assignment (the second
self.operatorRepresentation['data_in'] = data_in.name) so only the initial
assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate
this in the parser code where data_in is set alongside 'scale', 'bias', and
'data_out' (look for the data_in variable and self.operatorRepresentation
assignments) and delete the redundant line.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 19: Update the mismatched PR URL in CHANGELOG.md so the reference [`#193`]
points to the correct PR URL by replacing the `pull/189` part with `pull/193` in
the line containing "Add support for Operators for Generic target needed in
MAGIA [`#193`]( https://github.com/pulp-platform/Deeploy/pull/189)"; ensure the
displayed number and the URL both refer to PR 193.

In `@Deeploy/Targets/Generic/Layers.py`:
- Around line 744-758: Comments in computeOps use confusable Unicode characters
(σ, α) which trigger lint RUF003; update those comments in the methods
SwishLayer.computeOps and HardSigmoidLayer.computeOps (and any adjacent comment
using σ or α) to use plain ASCII words or names (e.g., "sigma" or "sigmoid"
instead of "σ", and "alpha" instead of "α") while preserving the original
meaning (e.g., "sigma(x) = 1 / (1 + exp(-x))" and "alpha·x + beta" or "alpha * x
+ beta"); ensure the rest of the file still references the same functions/class
names and keep comment content otherwise unchanged.

In `@Deeploy/Targets/Generic/Parsers.py`:
- Line 3022: The code is reading node.attrs with a misspelled key
'count_include_pad ' (trailing space) so the lookup always falls back to 0;
update the lookup in the parser where count_include_pad is set (the line using
node.attrs.get('count_include_pad ', 0)) to use the correct key
'count_include_pad' so the parser honors user-specified values; keep the default
0 behavior if the attribute is absent.
- Around line 2911-2914: The code that sets
self.operatorRepresentation['min_val'] and ['max_val'] directly calls
.values.item() on node.inputs[1] and node.inputs[2] without checking their type;
update the Clip parsing logic (the block handling node.inputs in the parser
where 'min_val' and 'max_val' are assigned) to first verify each input is a
constant (e.g., isinstance(node.inputs[1], gs.Constant)) before calling
.values.item(), and if not a constant (variable/dynamic tensor) simply skip
assigning min_val/max_val (or leave them unset) to avoid AttributeError when
inputs are produced by other ops. Ensure you reference the same keys ('min_val',
'max_val') and input indices (1 and 2) so the change is localized to that Clip
input handling.

In `@Deeploy/Targets/Generic/Templates/SubTemplate.py`:
- Around line 17-27: Adjust the sign of the computed quantization offsets so Sub
produces q1 - q2 - z1 + z2 + zout: invert the sign of input_1_offset and remove
the leading negatives on input_2_offset and output_offset. Specifically, in
SubTemplate.py change the input_1_offset assignment that currently uses
(data_in_1._signed == 0) * int(data_in_1.nLevels / 2) to a negative version, and
change input_2_offset and output_offset to use positive (data_in_2._signed == 0)
* int(...) and (data_out._signed == 0) * int(...) respectively, then keep
operatorRepresentation['offset'] = input_1_offset + input_2_offset +
output_offset.

In `@TargetLibraries/Generic/inc/kernel/Clip.h`:
- Around line 16-18: The section banner in the header incorrectly reads "Ceil"
while this file defines clipping utilities (Clip); update the comment block
above the Clip declarations to read "Clip" instead of "Ceil" so the banner
matches the symbol being defined (look for the banner block with "Ceil" near the
Clip.h top and change it to "Clip").

In `@TargetLibraries/Generic/inc/kernel/GlobalMaxPool.h`:
- Around line 7-9: The include guard macro in GlobalMaxPool.h is incorrectly
using __DEEPLOY_BASIC_MATH_GLOBALAVERAGEPOOL_KERNEL_HEADER_ (collides with
GlobalAveragePool.h); update the guard macro to a unique name like
__DEEPLOY_BASIC_MATH_GLOBALMAXPOOL_KERNEL_HEADER_ at the `#ifndef` and `#define`
lines and update the corresponding `#endif` comment if present so the header no
longer conflicts with GlobalAveragePool.h.

In `@TargetLibraries/Generic/src/AveragePool_fp32.c`:
- Around line 17-20: The AveragePool2d_fp32_fp32 function currently misses
checks for kernel_h == 0 and kernel_w == 0 and divides by variable count without
ensuring count > 0; update the initial guard (the early-return condition around
N/C/stride/pad/kernel size) to also return when kernel_h == 0 or kernel_w == 0,
and in the pooling loop where the output is computed (the place that divides by
count), ensure you protect against count == 0 by skipping the division or
returning a safe value (e.g., set output to 0 or continue) when count is zero;
reference AveragePool2d_fp32_fp32 and the local variable count to locate the
changes.
- Around line 61-66: The initial validity check incorrectly rejects cases where
padding makes the effective length sufficient and also risks divide-by-zero when
computing averages; change the condition from "if (N == 0 || C == 0 || L <
kernel_len || stride == 0)" to check the padded length: "if (N == 0 || C == 0 ||
(L + pad_left + pad_right) < kernel_len || stride == 0)". Then before dividing
by count in the averaging loop (the division at the use of variable count),
ensure count is > 0 (skip division or continue when count == 0) to prevent
divide-by-zero, updating the code that computes L_out and the averaging step
accordingly (use L_out = (L + pad_left + pad_right - kernel_len) / stride + 1 as
shown and guard the division by checking count).

In `@TargetLibraries/Generic/src/GlobalAveragePool_fp32.c`:
- Around line 19-23: Add an explicit guard for spatial_size == 0 before
performing the reduction in GlobalAveragePool_fp32.c: inside the loop over
channels (the code using spatial_size, sum, x, dst, n, C, c), check if
spatial_size is zero and handle it (e.g., set dst[n * C + c] = 0.0f or another
defined fallback) and continue/skip the summation loop to avoid division by
zero; ensure the early branch replaces the sum/division path so no
sum/spatial_size division occurs when spatial_size == 0.

In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c`:
- Around line 17-20: The code dereferences x[0] unconditionally in
GlobalMaxPool_fp32 (variable x and float32_t max) which will read out-of-bounds
when spatial_size == 0; fix by checking spatial_size before accessing x[0]
(e.g., if spatial_size == 0 handle that case — set max to a safe value or skip
this channel — and continue) and only run the for-loop and initial assignment
when spatial_size > 0, ensuring no reads occur when spatial_size is zero.

In `@TargetLibraries/Generic/src/GroupNormalization_fp32.c`:
- Around line 17-39: Before computing group statistics add validation to guard
against invalid shapes: check that num_groups > 0, that num_channels %
num_groups == 0, and that spatial > 0 (so group_elements > 0) before calculating
channels_per_group, group_elements, or entering the batch/group loops; if any
check fails, return an error code or early-fail (consistent with the surrounding
API) rather than proceeding to the mean/variance loops. Update the pre-loop
logic around the variables channels_per_group, group_elements, slice and the
outer loops (referencing num_groups, num_channels, spatial, batch_size,
group_elements) to perform these checks and handle the error path.

In `@TargetLibraries/Generic/src/InstanceNormalization_fp32.c`:
- Around line 27-37: The mean/variance calculation divides by the variable
spatial without guarding against spatial == 0; update the reduction in
InstanceNormalization_fp32.c to check spatial before dividing: if spatial is
zero, set mean and var to 0 (or appropriate neutral values) and skip the loops,
otherwise perform the existing summation and divide by spatial. Ensure you
reference and update the computations that assign mean (from sum / spatial) and
var (from var / spatial) and avoid any division when spatial == 0.

---

Nitpick comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2973-2978: The code assigns self.operatorRepresentation['data_in']
twice for the same node input; remove the duplicate assignment (the second
self.operatorRepresentation['data_in'] = data_in.name) so only the initial
assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate
this in the parser code where data_in is set alongside 'scale', 'bias', and
'data_out' (look for the data_in variable and self.operatorRepresentation
assignments) and delete the redundant line.

In `@Deeploy/Targets/Generic/Platform.py`:
- Around line 150-151: The mapping currently registers 'HardSigmoid' ->
SigmoidLayer and 'HardSwish' -> SwishLayer; replace those with their dedicated
classes HardSigmoidLayer and HardSwishLayer to ensure op-specific behavior.
Update the dictionary entries that reference SigmoidLayer([HardSigmoidMapper])
and SwishLayer([HardSwishMapper]) to use HardSigmoidLayer([HardSigmoidMapper])
and HardSwishLayer([HardSwishMapper]) respectively, keeping the mapper lists
unchanged so existing mappers (HardSigmoidMapper, HardSwishMapper) are
preserved.

In `@TargetLibraries/Generic/src/Sigmoid_fp32.c`:
- Around line 10-14: The Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x))
which can compute huge expf(-x) for large negative x; change the loop in
Sigmoid_fp32_fp32 to use a numerically stable branch: for each element x =
data_in[i] if x >= 0 compute data_out[i] = 1.0f / (1.0f + expf(-x)) else compute
data_out[i] = expf(x) / (1.0f + expf(x)); keep the same float32_t types and
indexing and preserve the for-loop over size to avoid overflow and improve
precision.

In `@TargetLibraries/Generic/src/Swish_fp32.c`:
- Around line 10-16: The current Swish_fp32_fp32 implementation can overflow
when computing expf(-alpha * x); update the loop in Swish_fp32_fp32 to use a
numerically stable branch on sign of t = alpha * x: if t >= 0 compute
data_out[i] = x / (1 + expf(-t)), else compute data_out[i] = x * expf(t) / (1 +
expf(t)); keep using float32_t variables (t, x) and unchanged arguments
(data_in, data_out, alpha, size) to avoid large exponentials and improve
precision.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c4cc84a6-6698-4450-b7df-b2d472dcd8be

📥 Commits

Reviewing files that changed from the base of the PR and between c4870e1 and bbaffb7.

📒 Files selected for processing (96)

CHANGELOG.md
Deeploy/Targets/Generic/Bindings.py
Deeploy/Targets/Generic/Layers.py
Deeploy/Targets/Generic/Parsers.py
Deeploy/Targets/Generic/Platform.py
Deeploy/Targets/Generic/Templates/FloatAveragePoolTemplate.py
Deeploy/Targets/Generic/Templates/FloatCeilTemplate.py
Deeploy/Targets/Generic/Templates/FloatClipTemplate.py
Deeploy/Targets/Generic/Templates/FloatExpTemplate.py
Deeploy/Targets/Generic/Templates/FloatFloorTemplate.py
Deeploy/Targets/Generic/Templates/FloatGlobalAveragePoolTemplate.py
Deeploy/Targets/Generic/Templates/FloatGlobalMaxPoolTemplate.py
Deeploy/Targets/Generic/Templates/FloatGroupNormTemplate.py
Deeploy/Targets/Generic/Templates/FloatHardSigmoidTemplate.py
Deeploy/Targets/Generic/Templates/FloatHardSwishTemplate.py
Deeploy/Targets/Generic/Templates/FloatInstanceNormTemplate.py
Deeploy/Targets/Generic/Templates/FloatSigmoidTemplate.py
Deeploy/Targets/Generic/Templates/FloatSubTemplate.py
Deeploy/Targets/Generic/Templates/FloatSwishTemplate.py
Deeploy/Targets/Generic/Templates/SubTemplate.py
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/inputs.npz
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/network.onnx
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/outputs.npz
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/inputs.npz
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/network.onnx
DeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/outputs.npz
DeeployTest/Tests/Kernels/FP32/Ceil/inputs.npz
DeeployTest/Tests/Kernels/FP32/Ceil/network.onnx
DeeployTest/Tests/Kernels/FP32/Ceil/outputs.npz
DeeployTest/Tests/Kernels/FP32/Clip/inputs.npz
DeeployTest/Tests/Kernels/FP32/Clip/network.onnx
DeeployTest/Tests/Kernels/FP32/Clip/outputs.npz
DeeployTest/Tests/Kernels/FP32/Exp/inputs.npz
DeeployTest/Tests/Kernels/FP32/Exp/network.onnx
DeeployTest/Tests/Kernels/FP32/Exp/outputs.npz
DeeployTest/Tests/Kernels/FP32/Floor/inputs.npz
DeeployTest/Tests/Kernels/FP32/Floor/network.onnx
DeeployTest/Tests/Kernels/FP32/Floor/outputs.npz
DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/inputs.npz
DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/network.onnx
DeeployTest/Tests/Kernels/FP32/GlobalAveragePool/outputs.npz
DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/inputs.npz
DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/network.onnx
DeeployTest/Tests/Kernels/FP32/GlobalMaxPool/outputs.npz
DeeployTest/Tests/Kernels/FP32/GroupNorm/inputs.npz
DeeployTest/Tests/Kernels/FP32/GroupNorm/network.onnx
DeeployTest/Tests/Kernels/FP32/GroupNorm/outputs.npz
DeeployTest/Tests/Kernels/FP32/HardSigmoid/inputs.npz
DeeployTest/Tests/Kernels/FP32/HardSigmoid/network.onnx
DeeployTest/Tests/Kernels/FP32/HardSigmoid/outputs.npz
DeeployTest/Tests/Kernels/FP32/HardSwish/inputs.npz
DeeployTest/Tests/Kernels/FP32/HardSwish/network.onnx
DeeployTest/Tests/Kernels/FP32/HardSwish/outputs.npz
DeeployTest/Tests/Kernels/FP32/InstanceNorm/inputs.npz
DeeployTest/Tests/Kernels/FP32/InstanceNorm/network.onnx
DeeployTest/Tests/Kernels/FP32/InstanceNorm/outputs.npz
DeeployTest/Tests/Kernels/FP32/Sigmoid/inputs.npz
DeeployTest/Tests/Kernels/FP32/Sigmoid/network.onnx
DeeployTest/Tests/Kernels/FP32/Sigmoid/outputs.npz
DeeployTest/Tests/Kernels/FP32/Sub/inputs.npz
DeeployTest/Tests/Kernels/FP32/Sub/network.onnx
DeeployTest/Tests/Kernels/FP32/Sub/outputs.npz
DeeployTest/Tests/Kernels/FP32/Swish/inputs.npz
DeeployTest/Tests/Kernels/FP32/Swish/network.onnx
DeeployTest/Tests/Kernels/FP32/Swish/outputs.npz
DeeployTest/Tests/Kernels/Integer/Sub/inputs.npz
DeeployTest/Tests/Kernels/Integer/Sub/network.onnx
DeeployTest/Tests/Kernels/Integer/Sub/outputs.npz
DeeployTest/test_generic_config.py
TargetLibraries/Generic/inc/DeeployBasicMath.h
TargetLibraries/Generic/inc/kernel/AveragePool.h
TargetLibraries/Generic/inc/kernel/Ceil.h
TargetLibraries/Generic/inc/kernel/Clip.h
TargetLibraries/Generic/inc/kernel/Exp.h
TargetLibraries/Generic/inc/kernel/Floor.h
TargetLibraries/Generic/inc/kernel/GlobalAveragePool.h
TargetLibraries/Generic/inc/kernel/GlobalMaxPool.h
TargetLibraries/Generic/inc/kernel/GroupNorm.h
TargetLibraries/Generic/inc/kernel/HardSigmoid.h
TargetLibraries/Generic/inc/kernel/HardSwish.h
TargetLibraries/Generic/inc/kernel/InstanceNorm.h
TargetLibraries/Generic/inc/kernel/Sigmoid.h
TargetLibraries/Generic/inc/kernel/Swish.h
TargetLibraries/Generic/src/AveragePool_fp32.c
TargetLibraries/Generic/src/Ceil_fp32.c
TargetLibraries/Generic/src/Clip_fp32.c
TargetLibraries/Generic/src/Exp_fp32.c
TargetLibraries/Generic/src/Floor_fp32.c
TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
TargetLibraries/Generic/src/GlobalMaxPool_fp32.c
TargetLibraries/Generic/src/GroupNormalization_fp32.c
TargetLibraries/Generic/src/HardSigmoid_fp32.c
TargetLibraries/Generic/src/HardSwish_fp32.c
TargetLibraries/Generic/src/InstanceNormalization_fp32.c
TargetLibraries/Generic/src/Sigmoid_fp32.c
TargetLibraries/Generic/src/Swish_fp32.c

coderabbitai · 2026-05-16T09:15:44Z

+      const float32_t *x = src + (n * C + c) * spatial_size;
+
+      float32_t max = x[0];
+      for (uint32_t i = 1; i < spatial_size; i++) {


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Prevent out-of-bounds read when spatial_size == 0.

At Line 19, x[0] is dereferenced unconditionally. If spatial_size is zero, this is invalid memory access.

Proposed fix

void GlobalMaxPool_fp32_fp32(float32_t const *__restrict__ src, float32_t *__restrict__ dst, uint32_t N, uint32_t C, uint32_t spatial_size) { + if (spatial_size == 0) { + return; // invalid shape for max pooling + } for (uint32_t n = 0; n < N; n++) {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c` around lines 17 - 20, The code dereferences x[0] unconditionally in GlobalMaxPool_fp32 (variable x and float32_t max) which will read out-of-bounds when spatial_size == 0; fix by checking spatial_size before accessing x[0] (e.g., if spatial_size == 0 handle that case — set max to a safe value or skip this channel — and continue) and only run the for-loop and initial assignment when spatial_size > 0, ensuring no reads occur when spatial_size is zero.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2979-2981: The code currently assumes NCHW when deriving
batch_size/num_channels/spatial from data_in; update the logic that sets
operatorRepresentation['batch_size'], ['num_channels'], ['spatial'] (and any
length/height/width or spatial_size calculations) to respect a channels_first
boolean: always set batch_size = data_in.shape[0]; set channel_axis = 1 if
channels_first else -1; set num_channels = data_in.shape[channel_axis]; set
spatial_axes = tuple(range(2, data_in.ndim)) if channels_first else
tuple(range(1, data_in.ndim-1)); compute spatial = np.prod([data_in.shape[i] for
i in spatial_axes]); apply the same fix to the other identical blocks that set
these fields (the blocks around the other occurrences noted).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d251f5e-c829-41fa-8db7-6e6bf6f276f0

📥 Commits

Reviewing files that changed from the base of the PR and between bbaffb7 and 5678b47.

📒 Files selected for processing (8)

CHANGELOG.md
Deeploy/Targets/Generic/Layers.py
Deeploy/Targets/Generic/Parsers.py
Deeploy/Targets/Generic/Templates/SubTemplate.py
TargetLibraries/Generic/src/AveragePool_fp32.c
TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
TargetLibraries/Generic/src/GroupNormalization_fp32.c
TargetLibraries/Generic/src/InstanceNormalization_fp32.c

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (6)

TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
Deeploy/Targets/Generic/Templates/SubTemplate.py
TargetLibraries/Generic/src/AveragePool_fp32.c
TargetLibraries/Generic/src/GroupNormalization_fp32.c
TargetLibraries/Generic/src/InstanceNormalization_fp32.c
Deeploy/Targets/Generic/Layers.py

coderabbitai · 2026-05-16T11:07:35Z

+        self.operatorRepresentation['batch_size'] = data_in.shape[0]
+        self.operatorRepresentation['num_channels'] = data_in.shape[1]
+        self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:])


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Respect channels_first when deriving channel/spatial metadata.

These new parser paths always read channels/spatial as NCHW (shape[1], shape[2:]). When channels_first=False, num_channels, spatial, length/height/width, and spatial_size become wrong, which can mis-parameterize generated kernels.

Proposed fix

class NormalizationParser(NodeParser): @@ def parseNodeCtxt(self, ctxt: NetworkContext, node: gs.Node, channels_first: bool = True) -> Tuple[NetworkContext, bool]: data_in = ctxt.lookup(node.inputs[0].name) @@ - self.operatorRepresentation['batch_size'] = data_in.shape[0] - self.operatorRepresentation['num_channels'] = data_in.shape[1] - self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:]) + self.operatorRepresentation['batch_size'] = data_in.shape[0] + if channels_first: + self.operatorRepresentation['num_channels'] = data_in.shape[1] + spatial_shape = data_in.shape[2:] + else: + self.operatorRepresentation['num_channels'] = data_in.shape[-1] + spatial_shape = data_in.shape[1:-1] + self.operatorRepresentation['spatial'] = np.prod(spatial_shape) return ctxt, True @@ class AveragePoolParser(NodeParser): @@ def parseNodeCtxt(self, ctxt: NetworkContext, node: gs.Node, channels_first: bool = True) -> Tuple[NetworkContext, bool]: @@ - self.operatorRepresentation['batch_size'] = data_in.shape[0] - self.operatorRepresentation['num_channels'] = data_in.shape[1] + self.operatorRepresentation['batch_size'] = data_in.shape[0] + self.operatorRepresentation['num_channels'] = data_in.shape[1] if channels_first else data_in.shape[-1] self.operatorRepresentation['data_out_size'] = int(np.prod(data_out.shape)) - spatial_shape = data_in.shape[2:] + spatial_shape = data_in.shape[2:] if channels_first else data_in.shape[1:-1] if len(self.operatorRepresentation['kernel_shape']) != len(spatial_shape): return ctxt, False @@ class GlobalPoolParser(NodeParser): @@ def parseNodeCtxt(self, ctxt: NetworkContext, node: gs.Node, channels_first: bool = True) -> Tuple[NetworkContext, bool]: @@ - self.operatorRepresentation['batch_size'] = data_in.shape[0] - self.operatorRepresentation['num_channels'] = data_in.shape[1] - self.operatorRepresentation['spatial_size'] = np.prod(data_in.shape[2:]) + self.operatorRepresentation['batch_size'] = data_in.shape[0] + if channels_first: + self.operatorRepresentation['num_channels'] = data_in.shape[1] + spatial_shape = data_in.shape[2:] + else: + self.operatorRepresentation['num_channels'] = data_in.shape[-1] + spatial_shape = data_in.shape[1:-1] + self.operatorRepresentation['spatial_size'] = np.prod(spatial_shape)

Also applies to: 3059-3071, 3104-3106

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Parsers.py` around lines 2979 - 2981, The code currently assumes NCHW when deriving batch_size/num_channels/spatial from data_in; update the logic that sets operatorRepresentation['batch_size'], ['num_channels'], ['spatial'] (and any length/height/width or spatial_size calculations) to respect a channels_first boolean: always set batch_size = data_in.shape[0]; set channel_axis = 1 if channels_first else -1; set num_channels = data_in.shape[channel_axis]; set spatial_axes = tuple(range(2, data_in.ndim)) if channels_first else tuple(range(1, data_in.ndim-1)); compute spatial = np.prod([data_in.shape[i] for i in spatial_axes]); apply the same fix to the other identical blocks that set these fields (the blocks around the other occurrences noted).

marchioa added 3 commits May 13, 2026 11:58

add support for Ceil operator for Generic target

fc26fa2

add support for Clip operator for Generic target

f1929a4

move math.h out from Generic DeeployBasicMath.h to avoid conflicts wi…

3b58bf9

…th other targets

marchioa force-pushed the am/magia-kernels branch from 95cc3f4 to 3b58bf9 Compare May 14, 2026 09:00

marchioa added 4 commits May 14, 2026 10:45

add support for Floor operator for Generic target

ecdbe88

add support for Sub operator for Generic target

52c1e73

add support for Exp operator for Generic target

6bfface

add support for Sigmoid operator for Generic target

801df44

marchioa force-pushed the am/magia-kernels branch from c938552 to 801df44 Compare May 14, 2026 16:38

add support for Swish operator for Generic target

e3f0ba6

marchioa changed the title ~~add support for Ceil operator for Generic target~~ add support for Operators for Generic target needed in MAGIA May 15, 2026

marchioa added 7 commits May 15, 2026 08:26

add support for HardSigmoid and HardSwish operator for Generic target

bc73051

in Swish parser make alpha an optional attribute

6eaa19f

add support for Instance and Group Normalization operators for Generi…

cf7d2ca

…c target

add support for AveragePool, GlobalAveragePool, and GlobalMaxPool ope…

f83550f

…rators for Generic target

add computation of operation to new supported layers

aac6eb4

fix minor type issue in AveragePool kernel

e0d0e73

update CHANGELOG

bbaffb7

marchioa marked this pull request as ready for review May 16, 2026 09:09

marchioa requested review from Victor-Jung, Xeratec and runwangdl as code owners May 16, 2026 09:09

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

minor fixes (implementing coderubbit suggestions)

5678b47

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Conversation

marchioa commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

marchioa commented May 16, 2026

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marchioa commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading