add support for Operators for Generic target needed in MAGIA#193
add support for Operators for Generic target needed in MAGIA#193marchioa wants to merge 16 commits into
Conversation
95cc3f4 to
3b58bf9
Compare
c938552 to
801df44
Compare
…rators for Generic target
|
Everytime I push a change, snitch CI raises a different error I cannot replicate locally (I run This last CI error is |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds parser base classes and parsers, codegen templates, bindings, layer op-count models, platform mapper registrations, kernel headers and C implementations, test entries, and changelog entries to support Ceil, Floor, Clip, Sub, Exp, Sigmoid, Swish, HardSigmoid, HardSwish, InstanceNormalization, GroupNormalization, AveragePool (1D/2D), GlobalAveragePool, and GlobalMaxPool on the Generic target. ChangesGeneric target operator support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (4)
Deeploy/Targets/Generic/Platform.py (1)
150-151: ⚡ Quick winWire HardSigmoid/HardSwish to their dedicated layer classes.
HardSigmoidLayerandHardSwishLayerexist but are not used here. UsingSigmoidLayer/SwishLayeris fragile and can drift from op-specific accounting later.Suggested patch
-from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \ +from Deeploy.Targets.Generic.Layers import AddLayer, AveragePoolLayer, BatchNormalizationLayer, CeilLayer, ClipLayer, \ ConcatLayer, ConvLayer, ConvTransposeLayer, DebugPrintLayer, DequantLayer, DivLayer, ExpLayer, FloorLayer, \ - GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, InstanceNormLayer, \ + GatherLayer, GELULayer, GEMMLayer, GlobalAveragePoolLayer, GlobalMaxPoolLayer, GroupNormLayer, HardSigmoidLayer, \ + HardSwishLayer, InstanceNormLayer, \ ITAMaxLayer, LayerNormLayer, MatMulLayer, MaxPoolLayer, MulLayer, PadLayer, PowLayer, QuantLayer, ReduceMeanLayer, \ ReduceSumLayer, ReluLayer, RequantShiftLayer, ReshapeLayer, RQIntegerDivLayer, RQSiGELULayer, SigmoidLayer, \ SliceLayer, SoftmaxLayer, SqrtLayer, SubLayer, SwishLayer, TransposeLayer ... - 'HardSigmoid': SigmoidLayer([HardSigmoidMapper]), - 'HardSwish': SwishLayer([HardSwishMapper]), + 'HardSigmoid': HardSigmoidLayer([HardSigmoidMapper]), + 'HardSwish': HardSwishLayer([HardSwishMapper]),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Platform.py` around lines 150 - 151, The mapping currently registers 'HardSigmoid' -> SigmoidLayer and 'HardSwish' -> SwishLayer; replace those with their dedicated classes HardSigmoidLayer and HardSwishLayer to ensure op-specific behavior. Update the dictionary entries that reference SigmoidLayer([HardSigmoidMapper]) and SwishLayer([HardSwishMapper]) to use HardSigmoidLayer([HardSigmoidMapper]) and HardSwishLayer([HardSwishMapper]) respectively, keeping the mapper lists unchanged so existing mappers (HardSigmoidMapper, HardSwishMapper) are preserved.TargetLibraries/Generic/src/Swish_fp32.c (1)
10-16: ⚡ Quick winConsider using a numerically stable swish formulation.
Similar to the sigmoid implementation, this formulation can encounter numerical issues when
alpha * xis a large negative value, causingexpf(-alpha * x)to overflow. A more stable approach branches on the sign:
- For
alpha * x >= 0:x / (1 + expf(-alpha * x))- For
alpha * x < 0:x * expf(alpha * x) / (1 + expf(alpha * x))This avoids computing very large exponentials and improves precision, especially important since the
alphaparameter can amplify input values.📊 Proposed numerically stable implementation
void Swish_fp32_fp32(float32_t *data_in, float32_t *data_out, float alpha, int32_t size) { for (int i = 0; i < size; i++) { float32_t x = data_in[i]; - data_out[i] = x / (1 + expf(-alpha * x)); + float32_t ax = alpha * x; + if (ax >= 0) { + data_out[i] = x / (1.0f + expf(-ax)); + } else { + float32_t exp_ax = expf(ax); + data_out[i] = x * exp_ax / (1.0f + exp_ax); + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Generic/src/Swish_fp32.c` around lines 10 - 16, The current Swish_fp32_fp32 implementation can overflow when computing expf(-alpha * x); update the loop in Swish_fp32_fp32 to use a numerically stable branch on sign of t = alpha * x: if t >= 0 compute data_out[i] = x / (1 + expf(-t)), else compute data_out[i] = x * expf(t) / (1 + expf(t)); keep using float32_t variables (t, x) and unchanged arguments (data_in, data_out, alpha, size) to avoid large exponentials and improve precision.TargetLibraries/Generic/src/Sigmoid_fp32.c (1)
10-14: ⚡ Quick winConsider using a numerically stable sigmoid formulation.
The current implementation
1/(1+exp(-x))can encounter numerical issues whenxis a large negative value, causingexp(-x)to overflow. While IEEE 754 produces the correct limiting value (0), a more stable and efficient formulation branches on the sign ofx:
- For
x >= 0:1 / (1 + expf(-x))- For
x < 0:expf(x) / (1 + expf(x))This avoids computing very large exponentials and improves precision across the input range.
📊 Proposed numerically stable implementation
void Sigmoid_fp32_fp32(float32_t *data_in, float32_t *data_out, int32_t size) { for (int i = 0; i < size; i++) { - data_out[i] = 1 / (1 + expf(-data_in[i])); + float32_t x = data_in[i]; + if (x >= 0) { + data_out[i] = 1.0f / (1.0f + expf(-x)); + } else { + float32_t exp_x = expf(x); + data_out[i] = exp_x / (1.0f + exp_x); + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@TargetLibraries/Generic/src/Sigmoid_fp32.c` around lines 10 - 14, The Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x)) which can compute huge expf(-x) for large negative x; change the loop in Sigmoid_fp32_fp32 to use a numerically stable branch: for each element x = data_in[i] if x >= 0 compute data_out[i] = 1.0f / (1.0f + expf(-x)) else compute data_out[i] = expf(x) / (1.0f + expf(x)); keep the same float32_t types and indexing and preserve the for-loop over size to avoid overflow and improve precision.Deeploy/Targets/Generic/Parsers.py (1)
2973-2978: 💤 Low valueRemove duplicate assignment of
data_in.Line 2977 duplicates the assignment from line 2974. This redundancy should be removed.
♻️ Proposed fix
def parseNodeCtxt(self, ctxt: NetworkContext, node: gs.Node, channels_first: bool = True) -> Tuple[NetworkContext, bool]: data_in = ctxt.lookup(node.inputs[0].name) self.operatorRepresentation['data_in'] = data_in.name self.operatorRepresentation['scale'] = ctxt.lookup(node.inputs[1].name).name self.operatorRepresentation['bias'] = ctxt.lookup(node.inputs[2].name).name - self.operatorRepresentation['data_in'] = data_in.name self.operatorRepresentation['data_out'] = ctxt.lookup(node.outputs[0].name).name🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Deeploy/Targets/Generic/Parsers.py` around lines 2973 - 2978, The code assigns self.operatorRepresentation['data_in'] twice for the same node input; remove the duplicate assignment (the second self.operatorRepresentation['data_in'] = data_in.name) so only the initial assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate this in the parser code where data_in is set alongside 'scale', 'bias', and 'data_out' (look for the data_in variable and self.operatorRepresentation assignments) and delete the redundant line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 19: Update the mismatched PR URL in CHANGELOG.md so the reference [`#193`]
points to the correct PR URL by replacing the `pull/189` part with `pull/193` in
the line containing "Add support for Operators for Generic target needed in
MAGIA [`#193`]( https://github.com/pulp-platform/Deeploy/pull/189)"; ensure the
displayed number and the URL both refer to PR 193.
In `@Deeploy/Targets/Generic/Layers.py`:
- Around line 744-758: Comments in computeOps use confusable Unicode characters
(σ, α) which trigger lint RUF003; update those comments in the methods
SwishLayer.computeOps and HardSigmoidLayer.computeOps (and any adjacent comment
using σ or α) to use plain ASCII words or names (e.g., "sigma" or "sigmoid"
instead of "σ", and "alpha" instead of "α") while preserving the original
meaning (e.g., "sigma(x) = 1 / (1 + exp(-x))" and "alpha·x + beta" or "alpha * x
+ beta"); ensure the rest of the file still references the same functions/class
names and keep comment content otherwise unchanged.
In `@Deeploy/Targets/Generic/Parsers.py`:
- Line 3022: The code is reading node.attrs with a misspelled key
'count_include_pad ' (trailing space) so the lookup always falls back to 0;
update the lookup in the parser where count_include_pad is set (the line using
node.attrs.get('count_include_pad ', 0)) to use the correct key
'count_include_pad' so the parser honors user-specified values; keep the default
0 behavior if the attribute is absent.
- Around line 2911-2914: The code that sets
self.operatorRepresentation['min_val'] and ['max_val'] directly calls
.values.item() on node.inputs[1] and node.inputs[2] without checking their type;
update the Clip parsing logic (the block handling node.inputs in the parser
where 'min_val' and 'max_val' are assigned) to first verify each input is a
constant (e.g., isinstance(node.inputs[1], gs.Constant)) before calling
.values.item(), and if not a constant (variable/dynamic tensor) simply skip
assigning min_val/max_val (or leave them unset) to avoid AttributeError when
inputs are produced by other ops. Ensure you reference the same keys ('min_val',
'max_val') and input indices (1 and 2) so the change is localized to that Clip
input handling.
In `@Deeploy/Targets/Generic/Templates/SubTemplate.py`:
- Around line 17-27: Adjust the sign of the computed quantization offsets so Sub
produces q1 - q2 - z1 + z2 + zout: invert the sign of input_1_offset and remove
the leading negatives on input_2_offset and output_offset. Specifically, in
SubTemplate.py change the input_1_offset assignment that currently uses
(data_in_1._signed == 0) * int(data_in_1.nLevels / 2) to a negative version, and
change input_2_offset and output_offset to use positive (data_in_2._signed == 0)
* int(...) and (data_out._signed == 0) * int(...) respectively, then keep
operatorRepresentation['offset'] = input_1_offset + input_2_offset +
output_offset.
In `@TargetLibraries/Generic/inc/kernel/Clip.h`:
- Around line 16-18: The section banner in the header incorrectly reads "Ceil"
while this file defines clipping utilities (Clip); update the comment block
above the Clip declarations to read "Clip" instead of "Ceil" so the banner
matches the symbol being defined (look for the banner block with "Ceil" near the
Clip.h top and change it to "Clip").
In `@TargetLibraries/Generic/inc/kernel/GlobalMaxPool.h`:
- Around line 7-9: The include guard macro in GlobalMaxPool.h is incorrectly
using __DEEPLOY_BASIC_MATH_GLOBALAVERAGEPOOL_KERNEL_HEADER_ (collides with
GlobalAveragePool.h); update the guard macro to a unique name like
__DEEPLOY_BASIC_MATH_GLOBALMAXPOOL_KERNEL_HEADER_ at the `#ifndef` and `#define`
lines and update the corresponding `#endif` comment if present so the header no
longer conflicts with GlobalAveragePool.h.
In `@TargetLibraries/Generic/src/AveragePool_fp32.c`:
- Around line 17-20: The AveragePool2d_fp32_fp32 function currently misses
checks for kernel_h == 0 and kernel_w == 0 and divides by variable count without
ensuring count > 0; update the initial guard (the early-return condition around
N/C/stride/pad/kernel size) to also return when kernel_h == 0 or kernel_w == 0,
and in the pooling loop where the output is computed (the place that divides by
count), ensure you protect against count == 0 by skipping the division or
returning a safe value (e.g., set output to 0 or continue) when count is zero;
reference AveragePool2d_fp32_fp32 and the local variable count to locate the
changes.
- Around line 61-66: The initial validity check incorrectly rejects cases where
padding makes the effective length sufficient and also risks divide-by-zero when
computing averages; change the condition from "if (N == 0 || C == 0 || L <
kernel_len || stride == 0)" to check the padded length: "if (N == 0 || C == 0 ||
(L + pad_left + pad_right) < kernel_len || stride == 0)". Then before dividing
by count in the averaging loop (the division at the use of variable count),
ensure count is > 0 (skip division or continue when count == 0) to prevent
divide-by-zero, updating the code that computes L_out and the averaging step
accordingly (use L_out = (L + pad_left + pad_right - kernel_len) / stride + 1 as
shown and guard the division by checking count).
In `@TargetLibraries/Generic/src/GlobalAveragePool_fp32.c`:
- Around line 19-23: Add an explicit guard for spatial_size == 0 before
performing the reduction in GlobalAveragePool_fp32.c: inside the loop over
channels (the code using spatial_size, sum, x, dst, n, C, c), check if
spatial_size is zero and handle it (e.g., set dst[n * C + c] = 0.0f or another
defined fallback) and continue/skip the summation loop to avoid division by
zero; ensure the early branch replaces the sum/division path so no
sum/spatial_size division occurs when spatial_size == 0.
In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c`:
- Around line 17-20: The code dereferences x[0] unconditionally in
GlobalMaxPool_fp32 (variable x and float32_t max) which will read out-of-bounds
when spatial_size == 0; fix by checking spatial_size before accessing x[0]
(e.g., if spatial_size == 0 handle that case — set max to a safe value or skip
this channel — and continue) and only run the for-loop and initial assignment
when spatial_size > 0, ensuring no reads occur when spatial_size is zero.
In `@TargetLibraries/Generic/src/GroupNormalization_fp32.c`:
- Around line 17-39: Before computing group statistics add validation to guard
against invalid shapes: check that num_groups > 0, that num_channels %
num_groups == 0, and that spatial > 0 (so group_elements > 0) before calculating
channels_per_group, group_elements, or entering the batch/group loops; if any
check fails, return an error code or early-fail (consistent with the surrounding
API) rather than proceeding to the mean/variance loops. Update the pre-loop
logic around the variables channels_per_group, group_elements, slice and the
outer loops (referencing num_groups, num_channels, spatial, batch_size,
group_elements) to perform these checks and handle the error path.
In `@TargetLibraries/Generic/src/InstanceNormalization_fp32.c`:
- Around line 27-37: The mean/variance calculation divides by the variable
spatial without guarding against spatial == 0; update the reduction in
InstanceNormalization_fp32.c to check spatial before dividing: if spatial is
zero, set mean and var to 0 (or appropriate neutral values) and skip the loops,
otherwise perform the existing summation and divide by spatial. Ensure you
reference and update the computations that assign mean (from sum / spatial) and
var (from var / spatial) and avoid any division when spatial == 0.
---
Nitpick comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2973-2978: The code assigns self.operatorRepresentation['data_in']
twice for the same node input; remove the duplicate assignment (the second
self.operatorRepresentation['data_in'] = data_in.name) so only the initial
assignment using data_in (from ctxt.lookup(node.inputs[0].name)) remains; locate
this in the parser code where data_in is set alongside 'scale', 'bias', and
'data_out' (look for the data_in variable and self.operatorRepresentation
assignments) and delete the redundant line.
In `@Deeploy/Targets/Generic/Platform.py`:
- Around line 150-151: The mapping currently registers 'HardSigmoid' ->
SigmoidLayer and 'HardSwish' -> SwishLayer; replace those with their dedicated
classes HardSigmoidLayer and HardSwishLayer to ensure op-specific behavior.
Update the dictionary entries that reference SigmoidLayer([HardSigmoidMapper])
and SwishLayer([HardSwishMapper]) to use HardSigmoidLayer([HardSigmoidMapper])
and HardSwishLayer([HardSwishMapper]) respectively, keeping the mapper lists
unchanged so existing mappers (HardSigmoidMapper, HardSwishMapper) are
preserved.
In `@TargetLibraries/Generic/src/Sigmoid_fp32.c`:
- Around line 10-14: The Sigmoid_fp32_fp32 implementation uses 1/(1+expf(-x))
which can compute huge expf(-x) for large negative x; change the loop in
Sigmoid_fp32_fp32 to use a numerically stable branch: for each element x =
data_in[i] if x >= 0 compute data_out[i] = 1.0f / (1.0f + expf(-x)) else compute
data_out[i] = expf(x) / (1.0f + expf(x)); keep the same float32_t types and
indexing and preserve the for-loop over size to avoid overflow and improve
precision.
In `@TargetLibraries/Generic/src/Swish_fp32.c`:
- Around line 10-16: The current Swish_fp32_fp32 implementation can overflow
when computing expf(-alpha * x); update the loop in Swish_fp32_fp32 to use a
numerically stable branch on sign of t = alpha * x: if t >= 0 compute
data_out[i] = x / (1 + expf(-t)), else compute data_out[i] = x * expf(t) / (1 +
expf(t)); keep using float32_t variables (t, x) and unchanged arguments
(data_in, data_out, alpha, size) to avoid large exponentials and improve
precision.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c4cc84a6-6698-4450-b7df-b2d472dcd8be
📒 Files selected for processing (96)
CHANGELOG.mdDeeploy/Targets/Generic/Bindings.pyDeeploy/Targets/Generic/Layers.pyDeeploy/Targets/Generic/Parsers.pyDeeploy/Targets/Generic/Platform.pyDeeploy/Targets/Generic/Templates/FloatAveragePoolTemplate.pyDeeploy/Targets/Generic/Templates/FloatCeilTemplate.pyDeeploy/Targets/Generic/Templates/FloatClipTemplate.pyDeeploy/Targets/Generic/Templates/FloatExpTemplate.pyDeeploy/Targets/Generic/Templates/FloatFloorTemplate.pyDeeploy/Targets/Generic/Templates/FloatGlobalAveragePoolTemplate.pyDeeploy/Targets/Generic/Templates/FloatGlobalMaxPoolTemplate.pyDeeploy/Targets/Generic/Templates/FloatGroupNormTemplate.pyDeeploy/Targets/Generic/Templates/FloatHardSigmoidTemplate.pyDeeploy/Targets/Generic/Templates/FloatHardSwishTemplate.pyDeeploy/Targets/Generic/Templates/FloatInstanceNormTemplate.pyDeeploy/Targets/Generic/Templates/FloatSigmoidTemplate.pyDeeploy/Targets/Generic/Templates/FloatSubTemplate.pyDeeploy/Targets/Generic/Templates/FloatSwishTemplate.pyDeeploy/Targets/Generic/Templates/SubTemplate.pyDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/inputs.npzDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/network.onnxDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_1D/outputs.npzDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/inputs.npzDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/network.onnxDeeployTest/Tests/Kernels/FP32/AveragePool/Regular_2D/outputs.npzDeeployTest/Tests/Kernels/FP32/Ceil/inputs.npzDeeployTest/Tests/Kernels/FP32/Ceil/network.onnxDeeployTest/Tests/Kernels/FP32/Ceil/outputs.npzDeeployTest/Tests/Kernels/FP32/Clip/inputs.npzDeeployTest/Tests/Kernels/FP32/Clip/network.onnxDeeployTest/Tests/Kernels/FP32/Clip/outputs.npzDeeployTest/Tests/Kernels/FP32/Exp/inputs.npzDeeployTest/Tests/Kernels/FP32/Exp/network.onnxDeeployTest/Tests/Kernels/FP32/Exp/outputs.npzDeeployTest/Tests/Kernels/FP32/Floor/inputs.npzDeeployTest/Tests/Kernels/FP32/Floor/network.onnxDeeployTest/Tests/Kernels/FP32/Floor/outputs.npzDeeployTest/Tests/Kernels/FP32/GlobalAveragePool/inputs.npzDeeployTest/Tests/Kernels/FP32/GlobalAveragePool/network.onnxDeeployTest/Tests/Kernels/FP32/GlobalAveragePool/outputs.npzDeeployTest/Tests/Kernels/FP32/GlobalMaxPool/inputs.npzDeeployTest/Tests/Kernels/FP32/GlobalMaxPool/network.onnxDeeployTest/Tests/Kernels/FP32/GlobalMaxPool/outputs.npzDeeployTest/Tests/Kernels/FP32/GroupNorm/inputs.npzDeeployTest/Tests/Kernels/FP32/GroupNorm/network.onnxDeeployTest/Tests/Kernels/FP32/GroupNorm/outputs.npzDeeployTest/Tests/Kernels/FP32/HardSigmoid/inputs.npzDeeployTest/Tests/Kernels/FP32/HardSigmoid/network.onnxDeeployTest/Tests/Kernels/FP32/HardSigmoid/outputs.npzDeeployTest/Tests/Kernels/FP32/HardSwish/inputs.npzDeeployTest/Tests/Kernels/FP32/HardSwish/network.onnxDeeployTest/Tests/Kernels/FP32/HardSwish/outputs.npzDeeployTest/Tests/Kernels/FP32/InstanceNorm/inputs.npzDeeployTest/Tests/Kernels/FP32/InstanceNorm/network.onnxDeeployTest/Tests/Kernels/FP32/InstanceNorm/outputs.npzDeeployTest/Tests/Kernels/FP32/Sigmoid/inputs.npzDeeployTest/Tests/Kernels/FP32/Sigmoid/network.onnxDeeployTest/Tests/Kernels/FP32/Sigmoid/outputs.npzDeeployTest/Tests/Kernels/FP32/Sub/inputs.npzDeeployTest/Tests/Kernels/FP32/Sub/network.onnxDeeployTest/Tests/Kernels/FP32/Sub/outputs.npzDeeployTest/Tests/Kernels/FP32/Swish/inputs.npzDeeployTest/Tests/Kernels/FP32/Swish/network.onnxDeeployTest/Tests/Kernels/FP32/Swish/outputs.npzDeeployTest/Tests/Kernels/Integer/Sub/inputs.npzDeeployTest/Tests/Kernels/Integer/Sub/network.onnxDeeployTest/Tests/Kernels/Integer/Sub/outputs.npzDeeployTest/test_generic_config.pyTargetLibraries/Generic/inc/DeeployBasicMath.hTargetLibraries/Generic/inc/kernel/AveragePool.hTargetLibraries/Generic/inc/kernel/Ceil.hTargetLibraries/Generic/inc/kernel/Clip.hTargetLibraries/Generic/inc/kernel/Exp.hTargetLibraries/Generic/inc/kernel/Floor.hTargetLibraries/Generic/inc/kernel/GlobalAveragePool.hTargetLibraries/Generic/inc/kernel/GlobalMaxPool.hTargetLibraries/Generic/inc/kernel/GroupNorm.hTargetLibraries/Generic/inc/kernel/HardSigmoid.hTargetLibraries/Generic/inc/kernel/HardSwish.hTargetLibraries/Generic/inc/kernel/InstanceNorm.hTargetLibraries/Generic/inc/kernel/Sigmoid.hTargetLibraries/Generic/inc/kernel/Swish.hTargetLibraries/Generic/src/AveragePool_fp32.cTargetLibraries/Generic/src/Ceil_fp32.cTargetLibraries/Generic/src/Clip_fp32.cTargetLibraries/Generic/src/Exp_fp32.cTargetLibraries/Generic/src/Floor_fp32.cTargetLibraries/Generic/src/GlobalAveragePool_fp32.cTargetLibraries/Generic/src/GlobalMaxPool_fp32.cTargetLibraries/Generic/src/GroupNormalization_fp32.cTargetLibraries/Generic/src/HardSigmoid_fp32.cTargetLibraries/Generic/src/HardSwish_fp32.cTargetLibraries/Generic/src/InstanceNormalization_fp32.cTargetLibraries/Generic/src/Sigmoid_fp32.cTargetLibraries/Generic/src/Swish_fp32.c
| const float32_t *x = src + (n * C + c) * spatial_size; | ||
|
|
||
| float32_t max = x[0]; | ||
| for (uint32_t i = 1; i < spatial_size; i++) { |
There was a problem hiding this comment.
Prevent out-of-bounds read when spatial_size == 0.
At Line 19, x[0] is dereferenced unconditionally. If spatial_size is zero, this is invalid memory access.
Proposed fix
void GlobalMaxPool_fp32_fp32(float32_t const *__restrict__ src,
float32_t *__restrict__ dst, uint32_t N,
uint32_t C, uint32_t spatial_size) {
+ if (spatial_size == 0) {
+ return; // invalid shape for max pooling
+ }
for (uint32_t n = 0; n < N; n++) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@TargetLibraries/Generic/src/GlobalMaxPool_fp32.c` around lines 17 - 20, The
code dereferences x[0] unconditionally in GlobalMaxPool_fp32 (variable x and
float32_t max) which will read out-of-bounds when spatial_size == 0; fix by
checking spatial_size before accessing x[0] (e.g., if spatial_size == 0 handle
that case — set max to a safe value or skip this channel — and continue) and
only run the for-loop and initial assignment when spatial_size > 0, ensuring no
reads occur when spatial_size is zero.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 2979-2981: The code currently assumes NCHW when deriving
batch_size/num_channels/spatial from data_in; update the logic that sets
operatorRepresentation['batch_size'], ['num_channels'], ['spatial'] (and any
length/height/width or spatial_size calculations) to respect a channels_first
boolean: always set batch_size = data_in.shape[0]; set channel_axis = 1 if
channels_first else -1; set num_channels = data_in.shape[channel_axis]; set
spatial_axes = tuple(range(2, data_in.ndim)) if channels_first else
tuple(range(1, data_in.ndim-1)); compute spatial = np.prod([data_in.shape[i] for
i in spatial_axes]); apply the same fix to the other identical blocks that set
these fields (the blocks around the other occurrences noted).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4d251f5e-c829-41fa-8db7-6e6bf6f276f0
📒 Files selected for processing (8)
CHANGELOG.mdDeeploy/Targets/Generic/Layers.pyDeeploy/Targets/Generic/Parsers.pyDeeploy/Targets/Generic/Templates/SubTemplate.pyTargetLibraries/Generic/src/AveragePool_fp32.cTargetLibraries/Generic/src/GlobalAveragePool_fp32.cTargetLibraries/Generic/src/GroupNormalization_fp32.cTargetLibraries/Generic/src/InstanceNormalization_fp32.c
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (6)
- TargetLibraries/Generic/src/GlobalAveragePool_fp32.c
- Deeploy/Targets/Generic/Templates/SubTemplate.py
- TargetLibraries/Generic/src/AveragePool_fp32.c
- TargetLibraries/Generic/src/GroupNormalization_fp32.c
- TargetLibraries/Generic/src/InstanceNormalization_fp32.c
- Deeploy/Targets/Generic/Layers.py
| self.operatorRepresentation['batch_size'] = data_in.shape[0] | ||
| self.operatorRepresentation['num_channels'] = data_in.shape[1] | ||
| self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:]) |
There was a problem hiding this comment.
Respect channels_first when deriving channel/spatial metadata.
These new parser paths always read channels/spatial as NCHW (shape[1], shape[2:]). When channels_first=False, num_channels, spatial, length/height/width, and spatial_size become wrong, which can mis-parameterize generated kernels.
Proposed fix
class NormalizationParser(NodeParser):
@@
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
data_in = ctxt.lookup(node.inputs[0].name)
@@
- self.operatorRepresentation['batch_size'] = data_in.shape[0]
- self.operatorRepresentation['num_channels'] = data_in.shape[1]
- self.operatorRepresentation['spatial'] = np.prod(data_in.shape[2:])
+ self.operatorRepresentation['batch_size'] = data_in.shape[0]
+ if channels_first:
+ self.operatorRepresentation['num_channels'] = data_in.shape[1]
+ spatial_shape = data_in.shape[2:]
+ else:
+ self.operatorRepresentation['num_channels'] = data_in.shape[-1]
+ spatial_shape = data_in.shape[1:-1]
+ self.operatorRepresentation['spatial'] = np.prod(spatial_shape)
return ctxt, True
@@
class AveragePoolParser(NodeParser):
@@
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
@@
- self.operatorRepresentation['batch_size'] = data_in.shape[0]
- self.operatorRepresentation['num_channels'] = data_in.shape[1]
+ self.operatorRepresentation['batch_size'] = data_in.shape[0]
+ self.operatorRepresentation['num_channels'] = data_in.shape[1] if channels_first else data_in.shape[-1]
self.operatorRepresentation['data_out_size'] = int(np.prod(data_out.shape))
- spatial_shape = data_in.shape[2:]
+ spatial_shape = data_in.shape[2:] if channels_first else data_in.shape[1:-1]
if len(self.operatorRepresentation['kernel_shape']) != len(spatial_shape):
return ctxt, False
@@
class GlobalPoolParser(NodeParser):
@@
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
@@
- self.operatorRepresentation['batch_size'] = data_in.shape[0]
- self.operatorRepresentation['num_channels'] = data_in.shape[1]
- self.operatorRepresentation['spatial_size'] = np.prod(data_in.shape[2:])
+ self.operatorRepresentation['batch_size'] = data_in.shape[0]
+ if channels_first:
+ self.operatorRepresentation['num_channels'] = data_in.shape[1]
+ spatial_shape = data_in.shape[2:]
+ else:
+ self.operatorRepresentation['num_channels'] = data_in.shape[-1]
+ spatial_shape = data_in.shape[1:-1]
+ self.operatorRepresentation['spatial_size'] = np.prod(spatial_shape)Also applies to: 3059-3071, 3104-3106
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Deeploy/Targets/Generic/Parsers.py` around lines 2979 - 2981, The code
currently assumes NCHW when deriving batch_size/num_channels/spatial from
data_in; update the logic that sets operatorRepresentation['batch_size'],
['num_channels'], ['spatial'] (and any length/height/width or spatial_size
calculations) to respect a channels_first boolean: always set batch_size =
data_in.shape[0]; set channel_axis = 1 if channels_first else -1; set
num_channels = data_in.shape[channel_axis]; set spatial_axes = tuple(range(2,
data_in.ndim)) if channels_first else tuple(range(1, data_in.ndim-1)); compute
spatial = np.prod([data_in.shape[i] for i in spatial_axes]); apply the same fix
to the other identical blocks that set these fields (the blocks around the other
occurrences noted).
This PR wants add the support for Operators needed for MAGIA that are not available in the Generic target. Together with the operators supports this PR also adds the correspondant tests.
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.