FP32 ReduceMean operator improvement #137

diaconuccalin · 2025-12-10T14:29:07Z

This PR fixes the tiling implementation for the PULPOpen FP32 ReduceMean operator (Issue #134) and adds parallelization support. It also extends ReduceMean to support ONNX opset >= 18, reorganizes the FP32 TinyViT kernel test suite, and adds extensive ReduceMean test coverage.

Added

Support for unknown number of data dimensions in the tiler
Parallelization support for the FP32 ReduceMean operator on PULPOpen
Extensive testing for the ReduceMean operator
Pass to remove ReduceMean operators that don't change data content, but only its shape

Changed

Structure of Tests subdir for improved ordering
Structure of .gitignore file for improved ordering

Fixed

Fixed ReduceMean parallelization and tiling issues described in Issue #134.

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

coderabbitai · 2025-12-10T14:32:44Z

📝 Walkthrough

Walkthrough

Adds opset-aware ReduceMean support (parsers, bindings), a PULPOpen-specific ReduceMean parser and parallel FloatReduceMean template, a pass to remove ReduceMean over only-singleton axes, tiler/constraint updates and wiring for PULPOpen, plus broad CI workflows migrated to hierarchical Kernels/Models/Others test paths.

Changes

Cohort / File(s)	Summary
CI workflows `/.github/workflows/` `*/.github/workflows/ci-platform-.yml`, `*/.github/workflows/ci-deeploy.yml`, `**/.github/workflows/infra-generate-ccache.yml`	Replaced many flat test-name lists with hierarchical path-based identifiers (Kernels/FP32, Kernels/Integer, Models, Others) and updated test targets across CI jobs and matrices.
Project metadata `/.gitignore`, `CHANGELOG.md`	Reorganized `.gitignore` patterns and added Unreleased CHANGELOG entries documenting ReduceMean, tiler, and testing changes.
Code transformation `Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py`, `Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py`	Guarded closure-argument cast to run only when closure args exist; extended introspective extraction to include mako `pageargs[...]` keyed expressions.
Lowering / optimizer passes `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`, `Deeploy/Targets/Generic/Platform.py`, `Deeploy/Targets/PULPOpen/Platform.py`	Added RemoveOnlySingletonReduceMeanPass to delete ReduceMean nodes that only reduce singleton dims; registered it in Generic and PULPOpen optimizer sequences.
Generic ReduceMean parsing & bindings `Deeploy/Targets/Generic/Parsers.py`, `Deeploy/Targets/Generic/Bindings.py`, `Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py`	Make ReduceMeanParser opset-aware (accept 1- or 2-input forms), normalize axes to numpy arrays, add opset-specific bindings, and refactor generic FloatReduceMeanTemplate to be axis-aware with explicit reduceLength/restDims/offset handling.
PULPOpen integration `Deeploy/Targets/PULPOpen/Parsers.py`, `Deeploy/Targets/PULPOpen/Bindings.py`, `Deeploy/Targets/PULPOpen/Platform.py`	Added `PULPReduceMeanParser` (injects `dim_in_<axis>` entries), moved/adjusted FloatReduceMeanTemplate import, switched PULP ReduceMean transformer to `ForkTransformer`, wired parser/template into PULPOpen pipeline, and registered the optimizer pass.
PULPOpen templates & tiler constraints `Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py`, `Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py`, `Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py`	Added parallel PULPOpen FloatReduceMean template with per-core chunking; ReduceMeanTileConstraint exposes per-axis `dim_in_<axis>`, implements `addPolicyConstraint`, and serializes per-axis replacements; DWConv constraint received clarifying comments.
Minor edits & tooling `Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py`, `.vscode/launch.json`	Small comment/test-path updates and launch.json labelTransform adjustment.

Sequence Diagram(s)

sequenceDiagram
    participant ONNX as ONNX Node
    participant GenParser as ReduceMeanParser (Generic)
    participant PulpParser as PULPReduceMeanParser
    participant Optimizer as RemoveOnlySingletonReduceMeanPass
    participant Bindings as ReduceMeanBindings
    participant Template as FloatReduceMeanTemplate
    participant Tiler as ReduceMeanTileConstraint

    ONNX->>GenParser: parseNode (opset-aware: 1 or 2 inputs)
    GenParser->>GenParser: normalize axes -> operatorRepresentation (numpy axes)
    GenParser->>PulpParser: hand off ctxt & operatorRepresentation
    PulpParser->>PulpParser: add dim_in_<axis> entries for non-reduced axes
    PulpParser->>Optimizer: lowering/optimizations run
    Optimizer->>Optimizer: inspect axes & input shapes
    alt all reduced dims == 1
        Optimizer-->>Bindings: remove ReduceMean node
    else
        Optimizer-->>Bindings: keep ReduceMean node
        Bindings->>Template: select opset-specific template & transformer
        Template->>Tiler: expose restDims/reduceLength/offsets
        Tiler->>Tiler: addPolicyConstraint & serialize per-axis replacements
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Demo TinyViT compatibility with tiled Siracusa #124 — Overlapping ReduceMean tiling/parsing/templates and PULPOpen tiler/bindings changes.
Bug fixes, API Cleanup and Reduce Compiler Warning on PULP #112 — Related edits touching code-transformation and parser areas that intersect with this PR.
Split CI Workflows by Platform and Task, Improve Formatting and Linting Reliability #108 — Prior CI workflow reorganizations that this PR continues/refines.

Suggested labels

Bug

Suggested reviewers

Victor-Jung
runwangdl

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'FP32 ReduceMean operator improvement' is directly related to the main changes in the PR, which focus on fixing and improving the ReduceMean operator for FP32 with parallelization support and tiling fixes.
Description check	✅ Passed	The PR description clearly relates to the changeset, covering ReduceMean operator fixes, parallelization, ONNX opset support, test reorganization, and .gitignore updates.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
64-89: Policy constraint correctly limits tiling to largest non-reduced dimension.

The implementation properly identifies the biggest non-reduced dimension and constrains tiling to that axis only. The edge case where all dimensions are reduced (biggestNonReducedDim = -1) is handled by constraining all dimensions to their full size, which effectively disables tiling—a sensible fallback.

Regarding the unused ctxt parameter flagged by static analysis: this is likely kept for API consistency with other TileConstraint methods. Consider adding a comment or using _ = ctxt to explicitly acknowledge this.
     @staticmethod
     def addPolicyConstraint(tilerModel: TilerModel, parseDict: Dict, ctxt: NetworkContext) -> TilerModel:
+        _ = ctxt  # Unused but kept for API consistency
         # ===== GET NECESSARY INFORMATION =====
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)
34-36: Clarify intent of dimension key access loop.

This loop accesses dim_in_ keys but discards the result. If the intent is to validate key existence, consider adding a comment or using an explicit check. The current pattern is non-obvious.
         for ax in range(len(operatorRepresentation['data_in_shape'])):
             if ax not in operatorRepresentation['axes']:
-                _ = operatorRepresentation['dim_in_' + str(ax)]
+                # Ensure non-reduced dimension keys exist for tiling
+                assert 'dim_in_' + str(ax) in operatorRepresentation

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 363b7d5 and 849dc04.

📒 Files selected for processing (19)

.github/workflows/_runner-siracusa-tiled-sequential.yml (1 hunks)
.github/workflows/ci-platform-generic.yml (1 hunks)
.github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
.github/workflows/ci-platform-siracusa.yml (1 hunks)
.gitignore (1 hunks)
CHANGELOG.md (4 hunks)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py (1 hunks)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1 hunks)
Deeploy/Targets/Generic/Bindings.py (1 hunks)
Deeploy/Targets/Generic/Parsers.py (2 hunks)
Deeploy/Targets/Generic/Platform.py (2 hunks)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2 hunks)
Deeploy/Targets/PULPOpen/Bindings.py (3 hunks)
Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
Deeploy/Targets/PULPOpen/Platform.py (5 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1 hunks)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/_runner-siracusa-tiled-sequential.yml

📚 Learning: 2025-09-09T15:58:06.454Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.

Applied to files:

CHANGELOG.md

📚 Learning: 2025-12-02T13:54:22.700Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.

Applied to files:

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py
Deeploy/Targets/PULPOpen/Bindings.py

🧬 Code graph analysis (7)

Deeploy/Targets/Generic/Platform.py (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

RemoveEmptyConvBiasPass (518-523)

RemoveOnlySingletonReduceMeanPass (547-552)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/DeeployTypes.py (1)

inputs (2503-2520)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

deleteNode (67-86)

contextagnostic (285-298)

ReplaceSequentialPatternPass (265-282)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)

Deeploy/AbstractDataTypes.py (1)

nLevels (194-195)

Deeploy/Targets/PULPOpen/Platform.py (2)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

RemoveEmptyConvBiasPass (518-523)

RemoveOnlySingletonReduceMeanPass (547-552)

Deeploy/Targets/PULPOpen/Parsers.py (2)

PULPReduceMeanParser (467-487)

PULPTallGEMMParser (443-464)

Deeploy/Targets/Generic/Bindings.py (3)

Deeploy/DeeployTypes.py (1)

NodeBinding (1512-1657)

Deeploy/Targets/Generic/TypeCheckers.py (1)

ReduceMeanChecker (310-324)

Deeploy/AbstractDataTypes.py (1)

PointerClass (536-559)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/DeeployTypes.py (3)

inputs (2503-2520)

outputs (2522-2539)

lookup (720-752)

Deeploy/Targets/PULPOpen/Parsers.py (1)

Deeploy/Targets/Generic/Parsers.py (16)

ReduceMeanParser (526-599)

parseNodeCtxt (29-41)

parseNodeCtxt (60-76)

parseNodeCtxt (114-155)

parseNodeCtxt (171-185)

parseNodeCtxt (209-221)

parseNodeCtxt (245-263)

parseNodeCtxt (295-325)

parseNodeCtxt (353-365)

parseNodeCtxt (387-414)

parseNodeCtxt (436-461)

parseNodeCtxt (475-488)

parseNodeCtxt (509-523)

parseNodeCtxt (549-599)

parseNodeCtxt (613-619)

parseNodeCtxt (632-649)

🪛 Ruff (0.14.8)

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

65-65: Unused static method argument: ctxt

(ARG004)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

526-526: Unused function argument: name

(ARG001)

🔇 Additional comments (25)

Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1)

242-243: LGTM! Good documentation practice.

The added comment clearly documents the inheritance of policy constraints from the parent Conv2D implementation, improving code readability without any functional impact.

Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py (1)

145-162: Verify pageargs string parsing robustness.

The string extraction logic assumes simple quoted string literals without handling escaped quotes (e.g., pageargs["key\"with\"quotes"]), string concatenation (e.g., pageargs["prefix" + suffix]), or non-string keys. If templates only use simple string literals for pageargs keys, this limitation may be acceptable. However, if any of these patterns exist in actual template usage, the parsing will fail or produce incorrect results. Consider either: confirming that templates restrict pageargs to simple quoted strings, or implementing more robust parsing using ast.literal_eval().

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)

23-26: LGTM! Proper guard for empty closure arguments.

The conditional check prevents generating an unused args variable when there are no closure struct arguments, avoiding potential compiler warnings and improving code cleanliness.

.github/workflows/ci-platform-siracusa.yml (1)

38-74: Well-organized test expansion for FP32 kernels.

The test matrix restructuring follows a clear hierarchical naming convention and provides comprehensive coverage for ReduceMean with various axis configurations and keepdims settings. The organization with blank lines between categories improves maintainability.

CHANGELOG.md (1)

7-7: Changelog entries are comprehensive and well-documented.

All additions accurately reflect the PR scope: tiler flexibility, parallelization support, test coverage expansion, and the fix for Issue #134. The entries follow the existing formatting conventions.

Also applies to: 31-34, 86-88, 128-128

.gitignore (1)

5-54: Clean reorganization with section headers.

The categorical organization with comment headers improves readability and makes future modifications easier. The negation pattern for docs/_templates/* correctly preserves template files while ignoring other HTML files.

Deeploy/Targets/Generic/Parsers.py (3)

576-582: Thoughtful handling of singleton dimension reduction.

The logic to skip reduction over singleton dimensions while preserving at least one axis is a good optimization. This avoids unnecessary computation when reducing dimensions that don't change the output.

554-556: Clean ONNX opset >= 18 handling.

The parser correctly identifies the opset version by checking for the absence of the axes attribute and properly marks the axes tensor as non-deployable since it's only needed at compile-time for template generation.

Also applies to: 561-566

568-572: Verify axis normalization before sorting.

Negative axis indices should be normalized to positive values before sorting. ONNX allows negative indices (e.g., -1 for last dimension), and sorting negative integers produces incorrect ordering (they sort before positive values). Ensure axes are converted to positive indices relative to the tensor rank before calling sort(), or confirm that the input axes are already normalized at this point.

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (2)

20-24: Helpful docstring clarifying optimization scope.

The warning about TinyViT-specific optimization (49 elements in reduced axis) is valuable for future maintainers who might need different strategies for larger reduction sizes.

146-153: Per-axis dimension tracking aligns with parser changes.

The replacement scheme now correctly tracks dim_in_<axis> for each non-reduced axis, matching the PULPReduceMeanParser that populates these in the operator representation. Using uint32_t pointers is appropriate for dimension values that may exceed 16-bit limits.

Also applies to: 165-168

.github/workflows/ci-platform-generic.yml (1)

40-74: Comprehensive ReduceMean test coverage added.

The new test entries provide thorough coverage of ReduceMean scenarios with both KeepDims and NoKeepDims variants, covering various axis configurations (single axis, multiple axes, all axes). The test organization follows a clear hierarchical structure.

.github/workflows/ci-platform-siracusa-tiled.yml (2)

40-71: Well-structured FP32 kernel test configurations for tiled execution.

The test configurations include appropriate L1 memory allocations for various ReduceMean scenarios. Tests with multiple L1 values (e.g., [5000, 50000]) allow testing different memory pressure scenarios.

185-186: TinyViT ReduceMean layer test added to model matrix.

The L1 values [200, 40000] provide good coverage for both constrained and relaxed memory scenarios.

Deeploy/Targets/Generic/Platform.py (2)

5-6: Proper integration of RemoveOnlySingletonReduceMeanPass.

The new optimization pass is correctly imported and will remove ReduceMean operations that only reduce over singleton dimensions, which is a no-op.

166-168: Pass ordering looks correct.

Placing RemoveOnlySingletonReduceMeanPass after RemoveEmptyConvBiasPass is appropriate since both are cleanup passes that simplify the graph before downstream processing.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

546-551: Pass implementation follows established patterns.

The class follows the same structure as RemoveEmptyConvBiasPass and uses the @contextagnostic decorator appropriately.

Deeploy/Targets/PULPOpen/Parsers.py (1)

467-487: Clean extension of ReduceMeanParser for PULP-specific needs.

The parser correctly extends the generic ReduceMeanParser and adds dimension information for non-reduced axes. This enables the tiler to handle variable-dimension inputs properly.

The dynamic key generation (dim_in_ + str(ax)) allows flexible handling of tensors with different numbers of dimensions.

Deeploy/Targets/Generic/Bindings.py (1)

211-220: Correct ONNX opset version handling for ReduceMean bindings.

The distinction between OPSET < 18 (axes as integer constant input) and OPSET >= 18 (axes as optional input or attribute) is properly handled with separate binding configurations.

Deeploy/Targets/PULPOpen/Bindings.py (2)

296-304: ForkTransformer enables parallelization for FP32 ReduceMean.

Changing from ClusterTransformer to ForkTransformer enables the parallel execution path via pi_cl_team_fork, which aligns with the PR objective of adding parallelization support for the FP32 ReduceMean operator.

30-35: Verify FloatReduceMeanTemplate uses explicit bitwidth types in PULPOpen implementation.

The import has been moved from Generic.Templates to PULPOpen.Templates for platform-specific implementation. Per template best practices, ensure the FloatReduceMeanTemplate uses explicit bitwidth types (e.g., float${...type.referencedType.typeWidth}_t*) rather than hardcoded types like float* to maintain type consistency with templated kernel calls.

Deeploy/Targets/PULPOpen/Platform.py (2)

8-9: LGTM! Clean integration of PULPReduceMeanParser and RemoveOnlySingletonReduceMeanPass.

The import updates and mapper changes correctly wire up the new ReduceMean parsing pathway for PULPOpen. The PULPReduceMeanParser extends the base parser with dimension information needed for proper tiling.

Also applies to: 36-36, 68-68

237-237: LGTM! Optimization pass added to correct location.

Adding RemoveOnlySingletonReduceMeanPass() to PULPOptimizer is appropriate for removing no-op ReduceMean operations that only change shape.

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)

10-15: Helpful parallelization caveat documented.

The warning about TinyViT optimization (49 elements in reduced axis) is useful context for future maintainers.

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)

84-88: Verify output_offset asymmetry in keepdims condition.

output_offset is only applied when keepdims is True. Determine whether this asymmetry is intentional or if it should apply consistently in both branches. Compare with the corresponding PULPOpen template to clarify the intended behavior.

.github/workflows/_runner-siracusa-tiled-sequential.yml

Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
33-58: Make restDims iteration deterministic and consider a small rank‑1 safeguard

Two minor template-preparation points:

restDims is built as a set and later iterated via list(restDims), which makes the loop nest order depend on Python’s set iteration order. This doesn’t change numerical results but can make generated C code non‑reproducible between runs. Consider sorting once when you compute it:
-## Compute the remaining dimensions after reduction
-restDims = set(list(range(len(data_in_shape)))).difference(set(axes))
+## Compute the remaining dimensions after reduction (deterministic order)
+restDims = sorted(set(range(len(data_in_shape))).difference(set(axes)))
For rank‑1 inputs, data_in_shape[1:] is empty, so shapeStr stays '' and the cast becomes (${data_in_type.referencedType.typeName} (*) )${data_in}. Most compilers will accept this, but if you ever target rank‑1 ReduceMean heavily it may be clearer to special‑case that rank (e.g., cast to ${data_in_type.referencedType.typeName}* instead).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 849dc04 and 195ed2d.

📒 Files selected for processing (2)

Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py (1 hunks)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-02T13:54:22.700Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.

Applied to files:

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py

🧬 Code graph analysis (1)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)

Deeploy/AbstractDataTypes.py (1)

nLevels (194-195)

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)

_FloatReduceMeanTemplate (10-38)

🔇 Additional comments (1)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)

21-27: Input/output offset computation in alignToContext looks correct now

Using data_in for input_offset and data_out for output_offset with the _signed/nLevels guards fixes the earlier bug and correctly derives offsets from the respective tensor types. No further changes needed here.

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py

Xeratec

Thanks for this work! Very valuable.

I only have one comment regarding the restructuring of the tests. I would either consistently restructure the tests or remove it from this PR and do it in another one.
If you decide to do the restructuring here, please also apply the attached patch to improve selecting the correct test in VS Code. Please let me know how you want to proceed.

launch.patch

.github/workflows/ci-platform-generic.yml

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1)

50-55: Comment-only change looks correct; consider avoiding hardcoded test paths in templates.

Updating the example command to -t=Tests/Others/Transformer matches the test-suite reorg and is low-risk. If this pattern appears across templates, consider referencing a canonical doc/README snippet (or a single shared constant) to reduce future comment drift.

.github/workflows/ci-deeploy.yml (1)

62-66: Minor: normalize -t path style (Tests/... vs ./Tests/...) for consistency.
These run after cd DeeployTest, so both likely work, but standardizing reduces future footguns if the working directory changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 195ed2d and 7d0d72d.

📒 Files selected for processing (13)

.github/workflows/ci-deeploy.yml (6 hunks)
.github/workflows/ci-platform-chimera.yml (1 hunks)
.github/workflows/ci-platform-cortexm.yml (2 hunks)
.github/workflows/ci-platform-generic.yml (2 hunks)
.github/workflows/ci-platform-mempool.yml (2 hunks)
.github/workflows/ci-platform-siracusa-neureka-tiled.yml (6 hunks)
.github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
.github/workflows/ci-platform-siracusa.yml (2 hunks)
.github/workflows/ci-platform-snitch-tiled.yml (1 hunks)
.github/workflows/ci-platform-snitch.yml (1 hunks)
.github/workflows/ci-platform-softhier.yml (1 hunks)
.github/workflows/infra-generate-ccache.yml (1 hunks)
Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/ci-platform-cortexm.yml
.github/workflows/ci-deeploy.yml
.github/workflows/ci-platform-siracusa.yml
.github/workflows/ci-platform-snitch.yml
.github/workflows/ci-platform-mempool.yml
.github/workflows/infra-generate-ccache.yml
.github/workflows/ci-platform-generic.yml

🔇 Additional comments (18)

.github/workflows/infra-generate-ccache.yml (1)

37-45: Verify that the cache generation scope aligns with the PR's ReduceMean focus.

The cache generation workflow now runs only IntKernels/Add/Regular tests across all test runners. However, this PR adds extensive ReduceMean test coverage and fixes ReduceMean tiling/parallelization. Given that the cache is refreshed daily to support CI workflows, running only Add tests may not provide optimal coverage for ReduceMean-related changes and subsequent CI runs.

Consider whether the cache generation should include ReduceMean test paths (e.g., ./Tests/FP32Kernels/ReduceMean/Regular or similar) to ensure the cache benefits the newly added ReduceMean functionality.

Can you confirm:

Is IntKernels/Add/Regular intentionally chosen as a lightweight representative test, or is this an oversight?

Should ReduceMean test paths be included in the cache generation?

If only Add is needed, is there documentation explaining why Add alone is sufficient for all workflows?

.github/workflows/ci-deeploy.yml (5)

84-88: LGTM: state-equality tests now target the reorganized Others/SimpleRegression path.
(Assuming the directory move is part of this PR’s test reorg.)

106-110: LGTM: memory-level extension tests updated to Others/SimpleRegression.

128-136: LGTM: tiler-extension coverage broadened; just ensure all referenced Tests/... directories exist.

154-161: LGTM: memory-allocation-extension coverage broadened; consider deduping if runtime becomes an issue.

197-201: LGTM: debug print tests now point at Others/SimpleRegression.

.github/workflows/ci-platform-siracusa-tiled.yml (3)

104-155: Same ask for the doublebuffer tests-config: parse JSON + ensure directories exist.

164-199: Double-check casing/renames for newly referenced model paths (TinyViT ReduceMean, microLlama, CCT FP32).
These strings are effectively filesystem paths; please ensure the directory names match exactly (notably FP32Kernels/Add/large vs .../Large conventions, and any regular/Regular mismatches).

Also applies to: 213-239, 255-285

39-95: JSON syntax in both tests-config blocks is valid.

Both embedded JSON arrays parse successfully:

Block 1 (lines 39-95): 45 entries ✓

Block 2 (lines 105-155): 41 entries ✓

No syntax errors detected (missing commas, malformed objects, etc.).

.github/workflows/ci-platform-cortexm.yml (2)

57-59: The test directories referenced in the workflow exist and are correctly specified.

38-49: All listed Cortex-M kernel test paths are valid and correctly formatted. Verification confirms all 11 directories exist under DeeployTest/Tests/ with the exact casing shown in the workflow, including the lowercase variants IntKernels/MatMul/regular and IntKernels/MatMul/add. No corrections needed.

.github/workflows/ci-platform-mempool.yml (1)

75-85: All MemPool model test directories verified to exist under DeeployTest/Tests/ with correct case sensitivity. No action needed.

.github/workflows/ci-platform-chimera.yml (1)

38-40: The test path DeeployTest/Tests/IntKernels/Add/Regular exists and is correctly referenced in the workflow.

.github/workflows/ci-platform-siracusa.yml (2)

55-73: Extensive ReduceMean test coverage with both KeepDims and NoKeepDims variants.

Lines 55–73 add comprehensive ReduceMean coverage with 18 test entries spanning both keepdims modes and multiple axis configurations (AllAxes, Axes1_2_3, Axes1_3, Axes2_1, Axis0, Axis2, and composition tests like Add_ReduceMean). This directly addresses the PR objective of adding extensive ReduceMean test coverage and aligns with the FP32 ReduceMean operator improvements described in Issue #134.

39-110: Hierarchical test path restructuring is consistent and comprehensive.

The restructuring from granular legacy test names to structured FP32Kernels/IntKernels/Models/Others categorization is uniformly applied across both siracusa-kernels (lines 39–110) and siracusa-models (lines 119–135) jobs. All listed test paths correspond to actual directories in DeeployTest/Tests/, including the extensive ReduceMean variants. Visual grouping with blank lines aids readability, and the naming scheme is consistent with other CI workflows and aligns with past feedback requesting comprehensive test reorganization.

.github/workflows/ci-platform-generic.yml (2)

62-80: ReduceMean test coverage mirrors Siracusa workflow with identical structure.

The FP32Kernels/ReduceMean entries in lines 62–80 are structurally identical to those in ci-platform-siracusa.yml (lines 55–73), providing the same comprehensive coverage of both KeepDims and NoKeepDims variants with multiple axis configurations. This consistency reinforces the systematic addition of extensive ReduceMean test coverage across platforms.

39-120: Generic workflow test restructuring extends beyond Siracusa with additional IntKernels coverage.

The generic-kernels job (lines 39–120) includes all hierarchical FP32Kernels paths plus additional entries for IntKernels/ReduceMean, IntKernels/ReduceSum, IntKernels/Slice (lines 108–110), and FP32Kernels/Pow/Scalar, FP32Kernels/Pow/Vector variants (lines 59–60), appropriately expanding test coverage for the generic platform. All referenced test directories exist in DeeployTest/Tests/. The generic-models job (lines 128–145) follows the same hierarchical organization with Models and Others categories.

.github/workflows/ci-platform-siracusa-neureka-tiled.yml (1)

40-43: Test paths are valid and consistent across all Siracusa Neureka Tiled workflow jobs.

All referenced test directories exist in DeeployTest/Tests/: Others/RequantizedLinear, IntKernels/Conv/2D/PW/* variants, Models/miniMobileNet, IntKernels/Attention, Others/Transformer, and Models/microLlama/microLlama1. L1 memory configurations are preserved uniformly across the six workflow jobs (lines 40-43, 55-58, 70-73, 91-93, 115-118, 129-132). The commented-out Transformer test in wmem-doublebuffer-L3 (line 131) aligns with the noted LMACAN memory constraint.

.github/workflows/ci-platform-snitch-tiled.yml

.github/workflows/ci-platform-snitch.yml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.vscode/launch.json (1)

69-72: Keep JSON style consistent ("find": spacing) if the repo enforces formatting.
Not functional, but this file often gets auto-formatted; matching surrounding style reduces noisy diffs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d336243 and bb1ae8a.

📒 Files selected for processing (2)

.github/workflows/ci-platform-snitch-tiled.yml (1 hunks)
.vscode/launch.json (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/ci-platform-snitch-tiled.yml

🔇 Additional comments (1)

.vscode/launch.json (1)
68-72: Remove find/replace and use apply array — labelTransform API structure is incorrect.

The labelTransform object cannot have direct find and replace fields. According to the Command Variable extension documentation, the correct structure uses an apply array:
"labelTransform": {
  "text": "${fileDirname}",
  "apply": [
    { "find": "...", "replace": "", "flags": "g" }
  ]
}
Also, find must be a regex pattern (not a literal path string). If the goal is to strip ${workspaceFolder}/DeeployTest/Tests/ from the label, you'll need an escaped regex pattern. To handle Windows path separators correctly, include a separate regex step: { "find": "\\\\", "replace": "/", "flags": "g" } before the path-stripping operation.

Likely an incorrect or invalid review comment.

…axes parameter). Added ReduceMean tests. Other fixes

… fix

… and related tests

…ing the existent layer in TinyViT 5M

…float one

…r ReduceMean PULPOpen FP operation

…straints. Added FP32 ReduceMean tests for Siracusa single buffer. Changed ReduceMean TinyViT test location. Decreased size of some ReduceMean tests

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)

44-44: Non-deterministic iteration order due to unordered set.

As discussed in the previous review, while the set difference operation is correct, iterating over restDims (lines 68, 89) can produce different loop orderings across Python runs, leading to non-reproducible generated C code. Consider using sorted() for deterministic output, matching the PULPOpen template pattern.

🧹 Nitpick comments (2)

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (2)
34-36: Clarify intent: accessing without assignment.

The loop accesses operatorRepresentation['dim_in_' + str(ax)] but doesn't use the returned value. If this is meant to validate that these keys exist (raising KeyError if missing), consider making the intent explicit:
 for ax in range(len(operatorRepresentation['data_in_shape'])):
     if ax not in operatorRepresentation['axes']:
-        _ = operatorRepresentation['dim_in_' + str(ax)]
+        # Validate that dimension metadata exists for non-reduced axes
+        assert f'dim_in_{ax}' in operatorRepresentation, \
+            f"Missing dimension metadata for axis {ax}"
If validation isn't the goal, please clarify the purpose of this access.

11-15: Consider tracking optimization scope for future work.

The warning indicates this implementation is optimized for TinyViT's specific use case (49 elements in the reduced axis). For broader applicability, you might want to track investigation of alternative parallelization strategies for larger reduction sizes.

Would you like me to open a follow-up issue to track investigation of parallelization strategies for larger reduction dimensions?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d776090 and 4fca212.

📒 Files selected for processing (30)

.github/workflows/ci-deeploy-testing.yml (1 hunks)
.github/workflows/ci-deeploy.yml (6 hunks)
.github/workflows/ci-platform-chimera.yml (1 hunks)
.github/workflows/ci-platform-cortexm.yml (2 hunks)
.github/workflows/ci-platform-generic.yml (2 hunks)
.github/workflows/ci-platform-mempool.yml (2 hunks)
.github/workflows/ci-platform-siracusa-neureka-tiled.yml (6 hunks)
.github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
.github/workflows/ci-platform-siracusa.yml (2 hunks)
.github/workflows/ci-platform-snitch-tiled.yml (1 hunks)
.github/workflows/ci-platform-snitch.yml (1 hunks)
.github/workflows/ci-platform-softhier.yml (1 hunks)
.github/workflows/infra-generate-ccache.yml (1 hunks)
.gitignore (1 hunks)
.vscode/launch.json (1 hunks)
CHANGELOG.md (4 hunks)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py (1 hunks)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1 hunks)
Deeploy/Targets/Generic/Bindings.py (1 hunks)
Deeploy/Targets/Generic/Parsers.py (2 hunks)
Deeploy/Targets/Generic/Platform.py (2 hunks)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/Bindings.py (3 hunks)
Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
Deeploy/Targets/PULPOpen/Platform.py (4 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1 hunks)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py
.github/workflows/infra-generate-ccache.yml
CHANGELOG.md
.github/workflows/ci-platform-siracusa-neureka-tiled.yml
Deeploy/Targets/Generic/Parsers.py
.github/workflows/ci-deeploy.yml
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
Deeploy/Targets/Generic/Bindings.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/ci-platform-snitch.yml
.github/workflows/ci-platform-siracusa.yml
.github/workflows/ci-platform-mempool.yml
.github/workflows/ci-platform-cortexm.yml
.github/workflows/ci-deeploy-testing.yml
.github/workflows/ci-platform-generic.yml

📚 Learning: 2025-12-02T13:54:22.700Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.

Applied to files:

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py
Deeploy/Targets/PULPOpen/Bindings.py

🧬 Code graph analysis (8)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/DeeployTypes.py (1)

inputs (2503-2520)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

deleteNode (67-86)

contextagnostic (285-298)

ReplaceSequentialPatternPass (265-282)

Deeploy/Targets/PULPOpen/Parsers.py (2)

Deeploy/Targets/Generic/Parsers.py (16)

ReduceMeanParser (526-599)

parseNodeCtxt (29-41)

parseNodeCtxt (60-76)

parseNodeCtxt (114-155)

parseNodeCtxt (171-185)

parseNodeCtxt (209-221)

parseNodeCtxt (245-263)

parseNodeCtxt (295-325)

parseNodeCtxt (353-365)

parseNodeCtxt (387-414)

parseNodeCtxt (436-461)

parseNodeCtxt (475-488)

parseNodeCtxt (509-523)

parseNodeCtxt (549-599)

parseNodeCtxt (613-619)

parseNodeCtxt (632-649)

Deeploy/DeeployTypes.py (3)

parseNodeCtxt (1051-1076)

NetworkContext (508-1020)

lookup (720-752)

Deeploy/Targets/Generic/Platform.py (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

RemoveEmptyConvBiasPass (518-523)

RemoveOnlySingletonReduceMeanPass (547-552)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)

Deeploy/AbstractDataTypes.py (1)

nLevels (194-195)

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)

_FloatReduceMeanTemplate (10-38)

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (4)

Deeploy/DeeployTypes.py (2)

NetworkContext (508-1020)

NodeTemplate (87-229)

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)

_FloatReduceMeanTemplate (10-29)

alignToContext (15-29)

Deeploy/Targets/Generic/Templates/ReduceMeanTemplate.py (1)

alignToContext (15-27)

Deeploy/AbstractDataTypes.py (1)

nLevels (194-195)

Deeploy/Targets/PULPOpen/Platform.py (3)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

RemoveEmptyConvBiasPass (518-523)

RemoveOnlySingletonReduceMeanPass (547-552)

Deeploy/Targets/PULPOpen/Parsers.py (2)

PULPReduceMeanParser (467-487)

PULPTallGEMMParser (443-464)

Deeploy/DeeployTypes.py (1)

NodeMapper (1660-1816)

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (5)

Deeploy/CommonExtensions/DataTypes.py (1)

uint32_t (48-51)

Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (3)

addPolicyConstraint (85-124)

addPolicyConstraint (241-256)

constructSymbolicNodeRep (127-138)

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

addPolicyConstraint (77-93)

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (4)

addPolicyConstraint (78-116)

addPolicyConstraint (337-386)

constructSymbolicNodeRep (119-130)

constructSymbolicNodeRep (389-407)

Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)

constructSymbolicNodeRep (62-66)

Deeploy/Targets/PULPOpen/Bindings.py (2)

Deeploy/Targets/Snitch/Templates/FloatSoftmaxTemplate.py (1)

FloatSoftmaxTemplate (10-24)

Deeploy/Targets/Generic/Templates/RQAddTemplate.py (1)

RQAddTemplate (10-25)

🪛 Ruff (0.14.8)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

526-526: Unused function argument: name

(ARG001)

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

65-65: Unused static method argument: ctxt

(ARG004)

🔇 Additional comments (38)

.vscode/launch.json (1)

70-70: LGTM—aligns with test path reorganization.

The update from a regex pattern to an absolute path selector is appropriate for the new hierarchical test structure. This improves clarity and maintainability of the debug configuration's file picker labels.

.github/workflows/ci-platform-mempool.yml (3)

38-85: Improved test organization and ReduceMean test addition align with PR objectives.

The restructuring from flat test names to hierarchical, path-based test selection improves maintainability and clarity. The addition of IntKernels/ReduceMean (line 58) is consistent with the PR's objective to extend ReduceMean support and add extensive test coverage. The blank-line grouping further enhances readability without changing test logic.

75-85: All model test paths are valid and correspond to existing directories.

Verification confirms that all seven test paths listed in the workflow file (Models/CCT/Int/ICCT, Models/CCT/Int/ICCT_8, Models/CCT/Int/ICCT_ITA, Models/miniMobileNet, Models/miniMobileNetv2, Models/simpleCNN, and Others/SimpleRegression) exist as actual directories under DeeployTest/Tests/.

38-67: All hierarchical test paths are correctly mapped to actual directories.

The test path migration is verified. All referenced directories under DeeployTest/Tests/ exist and match the workflow configuration: IntKernels paths (Add, Conv, GEMM, MatMul, MaxPool, Pad, ReduceMean, ReduceSum, Slice) and Others paths (RequantizedConv2D, RequantizedDWConv, RQConv, RQGEMM, RQMatMul) are properly structured and accessible for test execution.

Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1)

53-53: LGTM!

Comment correctly updated to reflect the new hierarchical test path structure (Tests/Others/Transformer), consistent with the PR's test directory reorganization.

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (4)

20-24: LGTM!

The docstring appropriately warns about the optimization being tailored for TinyViT ReduceMean layers (49 elements in the reduced axis), which helps maintainers understand when to revisit this implementation.

64-89: LGTM!

The policy constraint implementation correctly:

Identifies the biggest non-reduced dimension for tiling

Constrains all other dimensions to their full size (no tiling)

Handles the edge case where all dimensions are reduced (biggestNonReducedDim = -1) by disabling tiling entirely

The unused ctxt parameter follows the established interface pattern used by other TileConstraint subclasses (e.g., MatMulTileConstraint.addPolicyConstraint, ConvTileConstraint.addPolicyConstraint).

91-102: LGTM!

The symbolic node representation correctly exposes dim_in_<axis> variables for non-reduced axes, enabling the tiler to work with dynamic dimension values during tiling.

146-168: LGTM!

The serialization correctly:

Uses uint32_t pointers for dimension replacements (supporting larger dimension values)

Builds replacement entries only for non-reduced axes

Populates the replacement values from computed input cube dimensions

This aligns well with the dim_in_<axis> pattern established in PULPReduceMeanParser and constructSymbolicNodeRep.

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)

10-29: LGTM!

The alignToContext method correctly:

Initializes offsets to 0 for float tensors

Only computes non-zero offsets when quantization attributes (_signed, nLevels) exist

Uses data_out.nLevels for output_offset (previously fixed bug)

Based on learnings, explicit bitwidth types should be used in templates, which is correctly done here via ${data_out_type.referencedType.typeName}.

62-91: LGTM!

The template correctly implements the ReduceMean operation:

Nested loops iterate over non-reduced dimensions first, then reduced dimensions

Accumulator is properly initialized with input_offset * reduceLength

Mean is computed by dividing by reduceLength

Output is written with a dummy pointer that auto-increments

For FP32 tensors (the target of this template), offsets are always 0, so the keepdims branching is effectively a no-op but maintains consistency with the quantized code path.

Deeploy/Targets/PULPOpen/Parsers.py (2)

11-12: LGTM!

Import correctly updated to include ReduceMeanParser for the new PULPReduceMeanParser class.

467-487: LGTM!

The PULPReduceMeanParser correctly extends ReduceMeanParser to add dim_in_<axis> entries for non-reduced axes:

Delegates to parent parseNodeCtxt for standard parsing

Retrieves the original input shape from context

Adds dim_in_<axis> for each non-reduced axis, matching the pattern expected by ReduceMeanTileConstraint.constructSymbolicNodeRep and serializeTilingSolution

This enables the tiler to work with per-axis dimension information for flexible tiling strategies.

Deeploy/Targets/Generic/Platform.py (2)

5-6: LGTM!

Import correctly updated to include RemoveOnlySingletonReduceMeanPass alongside existing RemoveEmptyConvBiasPass.

167-167: LGTM!

The RemoveOnlySingletonReduceMeanPass is appropriately placed in the optimizer pass sequence after RemoveEmptyConvBiasPass. This pass will remove ReduceMean operators that only reduce over singleton dimensions (size 1), which don't change data content—just shape—as described in the PR objectives.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

526-543: LGTM! Solid implementation of singleton ReduceMean removal.

The optimization pass correctly removes ReduceMean operations that only reduce over singleton dimensions (size 1), which don't change data content. The logic properly:

Preserves single-node graphs

Handles both attribute-based and input-based axis specifications

Guards against missing shape information

Uses the existing deleteNode helper

Note: The unused name parameter flagged by static analysis is part of the expected signature for ReplaceSequentialPatternPass replacement functions and can be ignored.

546-552: LGTM! Properly structured optimization pass.

The class follows the established pattern for ReplaceSequentialPatternPass subclasses in this codebase and is correctly decorated with @contextagnostic.

Deeploy/Targets/PULPOpen/Bindings.py (2)

17-18: LGTM! Import reorganization supports PULPOpen-specific template.

The import change correctly moves FloatReduceMeanTemplate from the generic template path to the PULPOpen-specific implementation, enabling platform-specific optimizations and parallelization.

Also applies to: 32-35

300-304: LGTM! Transformer change enables parallelization.

Switching from ClusterTransformer to ForkTransformer is appropriate for the new parallelized ReduceMean implementation. ForkTransformer provides the necessary fork-based parallel execution support (pi_cl_team_fork) that the new template requires.

Deeploy/Targets/PULPOpen/Platform.py (1)

9-9: LGTM! Proper integration of PULPOpen-specific ReduceMean support.

All changes are consistent and correctly integrate the new ReduceMean functionality:

Imports the new optimization pass and PULPOpen-specific parser

Updates the ReduceMeanMapper to use PULPReduceMeanParser()

Adds the singleton removal pass to the optimization pipeline in a logical position

Also applies to: 37-37, 71-71, 243-243

Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (2)

42-91: LGTM! Well-structured precomputation logic.

The Mako precomputation block correctly:

Updates shapes based on tiling constraints

Normalizes negative axis indices

Computes the reduction length and remaining dimensions

Sorts dimensions heuristically for efficient parallelization

Builds C array access patterns

93-142: LGTM! Solid parallelization implementation with proper type handling.

The template correctly implements core-based parallelization:

Divides work into chunks across NUM_CORES

Handles both compile-time constants and runtime dimensions (via pointer dereference)

Properly initializes accumulators with offset correction

Uses type introspection (${data_in_type.referencedType.typeName}) for explicit bitwidth types

The conditional output_offset application based on keepdims was confirmed as intentional in past reviews.

Based on learnings, the template correctly uses explicit bitwidth types via referencedType.typeName introspection.

.github/workflows/ci-platform-chimera.yml (1)

38-40: Test path restructuring aligns with hierarchical naming convention.

The change from Adder to IntKernels/Add/Regular follows the standardized path-based test-naming pattern across the PR. Based on learnings, test names correspond to DeeployTest/Tests/ directory paths.

Verify that the test directory DeeployTest/Tests/IntKernels/Add/Regular exists in the repository.

.github/workflows/ci-platform-softhier.yml (1)

38-39: Test path restructuring aligns with hierarchical naming convention.

The change from Adder to IntKernels/Add/Regular mirrors the standardized pattern across the PR.

Verify that the test directory DeeployTest/Tests/IntKernels/Add/Regular exists in the repository.

.github/workflows/ci-deeploy-testing.yml (1)

43-43: Consistent test path updates for type inference testing.

All three test entries (fail-input0, fail-input2, pass) are updated from testTypeInferenceDifferentTypes to Others/TypeInference, maintaining consistency. The path construction at line 74 (./Tests/${{ matrix.test }}) properly integrates the updated paths.

Verify that the test directory DeeployTest/Tests/Others/TypeInference exists in the repository.

Also applies to: 49-49, 55-55

.github/workflows/ci-platform-siracusa.yml (2)

38-111: Comprehensive hierarchical test restructuring with extensive ReduceMean coverage.

The siracusa-kernels job expands test coverage significantly with 70+ hierarchical test paths organized into FP32Kernels, IntKernels, Models, and Others categories. Notably:

ReduceMean receives comprehensive testing: 18 variants (9 KeepDims + 9 NoKeepDims) covering different axes and compositions

Proper visual grouping with blank-line separators between categories

Consistent indentation and path formatting throughout

Verify that all hierarchical test paths exist in DeeployTest/Tests/, especially the extensive ReduceMean variants (FP32Kernels/ReduceMean/KeepDims/* and FP32Kernels/ReduceMean/NoKeepDims/*).

120-137: Models section aligned with hierarchical structure.

The siracusa-models job properly includes the specific TinyViT ReduceMean test (line 102) and maintains consistent hierarchical paths for model-level tests.

.github/workflows/ci-platform-cortexm.yml (2)

38-49: Cortex-M kernel tests properly restructured with hierarchical paths.

The test-names list for cortexm-kernels follows the standardized path-based naming, with appropriate mix of IntKernels and Others categories. The inclusion of IntKernels/ReduceMean and IntKernels/Slice aligns with the PR's focus on ReduceMean and related operator improvements.

Verify that all paths exist in DeeployTest/Tests/, particularly IntKernels/ReduceMean and IntKernels/Slice.

57-59: Cortex-M models tests aligned with platform focus.

Models section includes appropriate test cases for the platform.

.github/workflows/ci-platform-snitch.yml (1)

38-52: Snitch-kernel test paths properly restructured with visual category grouping.

The test-names list uses hierarchical paths with blank-line separators for readability, covering FP32Kernels Softmax, IntKernels Add/Activations/MatMul, and Others categories. Per the past review comment regarding whitespace-only lines, the formatting appears corrected here.

Verify that blank lines in the test-names block contain no trailing whitespace, as the bash loop condition [[ -n "$testName" ]] will treat whitespace-only strings as non-empty, causing malformed test-path errors. Run a verification check to ensure all blank lines in this section are truly empty.

.github/workflows/ci-platform-generic.yml (2)

38-122: Comprehensive hierarchical test restructuring across generic-kernels.

The generic-kernels job implements the extensive test restructuring requested in the past review, organizing 80+ test paths into standardized FP32Kernels, IntKernels, and Others categories. Notably:

ReduceMean receives broad coverage with 18 test variants across KeepDims and NoKeepDims modes

All major operator categories represented (Activations, Conv2D, GEMM, MatMul, Norm, ReduceMean, etc.)

Consistent blank-line grouping for readability

Aligns with Xeratec's request for comprehensive, hierarchical test restructuring across all test categories

Verify that all 80+ hierarchical test paths exist in DeeployTest/Tests/, particularly the extensive ReduceMean variants and FP32Kernels subcategories.

129-145: Generic-models test paths properly structured with ReduceMean coverage.

The generic-models job includes comprehensive model testing (Autoencoder1D, CCT variants, miniMobileNet, TinyViT, WaveFormer) with dedicated TinyViT ReduceMean testing (line 113), aligning with PR objectives.

.gitignore (1)

5-54: Thoughtful reorganization of .gitignore with improved section clarity.

The restructuring groups related patterns into labeled sections (Editor/OS, Python, Build, Node, Documentation, DeeployTest) for better maintainability. Key observations:

CHANGELOG_GEN.md explicitly tracked (line 54), aligning with PR's changelog workflow updates

Build artifacts (compile_commands.json, dist, build) properly exposed for IDE/CI integration

Python caches correctly grouped under Python section

DeeployTest patterns remain appropriately restrictive for generated artifacts

Confirm that CHANGELOG_GEN.md is a generated artifact from a changelog management tool used in the project (e.g., git-cliff, changelog automation) and should be tracked in version control.

.github/workflows/ci-platform-siracusa-tiled.yml (5)

105-156: Double-buffer L2 job tests-config mirrors singlebuffer variant with appropriately scaled L1 memory values.

The tests-config for siracusa-kernels-tiled-doublebuffer-L2 (lines 105–156) maintains consistent test coverage and path naming, with L1 allocations appropriately increased relative to the singlebuffer variant (e.g., 50000 → 100000 for AllAxes ReduceMean tests). JSON syntax is valid.

160-207: Matrix-based model test organization with extensive TinyViT/ReduceMean coverage aligns with PR objectives.

The matrix strategy for singlebuffer-L2 model tests introduces a clean path-based organization (Models/CCT, Models/microLlama, Models/miniMobileNet, Models/MLPerT, Models/TinyViT) and explicitly adds TinyViT/5M/Layers/FP32/ReduceMean testing (lines 191–192), directly supporting the PR's goal of improving ReduceMean test coverage.

213-250: Singlebuffer-L3 job maintains consistent test structure with L3 memory configuration.

The singlebuffer-L3 matrix job (lines 213–250) preserves test coverage and path naming consistency, sets default-memory-level to L3, and includes the same TinyViT/ReduceMean test configuration as the L2 variant (lines 230–231).

256-298: Double-buffer L3 job extends coverage to double-buffer + L3 configuration.

The doublebuffer-L3 matrix job (lines 256–298) follows the established organizational pattern, sets both double-buffer and L3 memory level configurations, and includes TinyViT/ReduceMean testing (lines 276–277) consistent with other variants.

38-96: Hierarchical test reorganization with comprehensive ReduceMean coverage is well-structured.

The refactored tests-config for the singlebuffer-L2 job properly transitions from per-test names to hierarchical paths (FP32Kernels, IntKernels, Others), and the extensive ReduceMean test additions (lines 58–76) with both KeepDims and NoKeepDims variants address the PR's goal of improving ReduceMean operator testing. JSON syntax is valid throughout.

Test path verification (e.g., FP32Kernels/ReduceMean/KeepDims/Axes1_3, FP32Kernels/ReduceMean/NoKeepDims/AllAxes) requires access to the test suite definition to confirm paths are defined and resolvable at runtime.

.github/workflows/ci-platform-snitch-tiled.yml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

Deeploy/Targets/PULPOpen/Platform.py (1)

224-243: New ReduceMean optimization pass placement + minor pass-list cleanup opportunity

Adding RemoveOnlySingletonReduceMeanPass() immediately after RemoveEmptyConvBiasPass() is a sensible place for a shape-only ReduceMean elimination: it keeps these “no-op on data” cleanups together and ensures such ReduceMean nodes are gone before PULPOpen-specific mapping/tiling.

Two follow-ups to consider:

Please confirm that this pass ordering matches your expectations w.r.t. quantization and ReduceMean lowering (i.e., graphs still contain the relevant ReduceMean nodes at this stage so the pass can fire, and it does not interfere with later ReduceMean-specific tiling/parallelization logic).

The two consecutive SkipUnityRequantPass(previous_op_regex = "Reshape|Transpose", num_inputs = 1) entries at lines 229–230 are identical; one of them is likely redundant and can be removed without behavior change, simplifying the optimizer configuration.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fca212 and 5b8afcf.

📒 Files selected for processing (3)

.github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
.github/workflows/ci-platform-siracusa.yml (2 hunks)
Deeploy/Targets/PULPOpen/Platform.py (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/ci-platform-siracusa.yml

🧬 Code graph analysis (1)

Deeploy/Targets/PULPOpen/Platform.py (2)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

RemoveEmptyConvBiasPass (518-523)

RemoveOnlySingletonReduceMeanPass (547-552)

Deeploy/Targets/PULPOpen/Parsers.py (1)

PULPReduceMeanParser (467-487)

🔇 Additional comments (9)

.github/workflows/ci-platform-siracusa.yml (2)

38-111: Test structure is correctly organized and all referenced directories exist.

The hierarchical test paths are properly formatted and the test coverage is well-structured across the four main categories: FP32Kernels, IntKernels, Models, and Others. The ReduceMean test coverage is particularly comprehensive, with both KeepDims/NoKeepDims variants across various axis configurations (Axis0, Axis2, AllAxes, Axes1_2_3, Axes1_3, Axes2_1) as well as compound tests, which aligns well with the PR's objective of improving ReduceMean operator support.

120-135: All referenced test directories exist and paths are correctly formatted.

The siracusa-models job test-names are valid. All paths have been verified to exist in the test hierarchy: IntKernels/Attention, Models/CCT/FP32 subdirectories (CCT_1_16_16_8, CCT_2_32_32_128_Opset20), Models/miniMobileNet and miniMobileNetv2, Models/MLPerf subdirectories (KeywordSpotting, ImageClassification, AnomalyDetection), Models/TinyViT/Demo, and Others/SimpleRegression. The TinyViT ReduceMean test (Models/TinyViT/5M/Layers/FP32/ReduceMean) in the first job is also present.

Deeploy/Targets/PULPOpen/Platform.py (4)

8-9: Wiring in RemoveOnlySingletonReduceMeanPass looks correct

Importing RemoveOnlySingletonReduceMeanPass alongside RemoveEmptyConvBiasPass keeps lowering passes centralized and consistent with how they’re used later in the optimizer. No issues here from a wiring perspective.

21-26: Cleanup of unused ReduceMeanParser import

Dropping ReduceMeanParser from the generic parsers import list is consistent with switching to PULPReduceMeanParser and avoids unused imports. Nothing else in this file appears to rely on the generic parser.

34-36: PULP-specific ReduceMean parser import is consistent with design

Importing PULPReduceMeanParser here matches its usage in ReduceMeanMapper and aligns with the pattern of using PULP-specific parsers for ops that need extra tiling metadata. The dependency direction looks clean and local to PULPOpen.

70-71: Check that PULPReduceMeanParser safely handles both ReduceMean and IntegerMean usages

Switching ReduceMeanMapper to use PULPReduceMeanParser is the right hook to feed per-dimension metadata into PULPReduceMeanTilingReadyBindings. Note that the same ReduceMeanMapper is used for both 'ReduceMean' and 'IntegerMean' in PULPMapping (lines 124 and 127), so PULPReduceMeanParser must correctly parse both cases (e.g., operatorRepresentation['data_in'] and 'axes' need to exist for both).

Please double-check that:

PULPReduceMeanParser is compatible with nodes created for 'IntegerMean' in your IR, and

the additional dim_in_* fields don’t break any existing downstream bindings that assume the older representation.

.github/workflows/ci-platform-siracusa-tiled.yml (3)

58-76: ReduceMean test coverage expansion aligns well with PR objectives.

The workflow now includes 18 ReduceMean test cases across two categories (KeepDims and NoKeepDims), with 9 variants each. This expansion (Add_ReduceMean, AllAxes, Axes1_2_3, Axes1_3, Axes2_1, Axis0, Axis2, ReduceMean_Add, and composite tests) directly supports the PR objective of adding extensive ReduceMean test coverage to address Issue #134.

Also applies to: 123-141

166-198: All referenced model test paths exist and are correctly configured. Verification confirms:

FP32-specific paths (Models/CCT/FP32/, Models/TinyViT/5M/Layers/FP32/ReduceMean) correspond to actual test directories

All model names are consistently used across singlebuffer-L2, singlebuffer-L3, and doublebuffer-L3 jobs

Directory structure properly reflects FP32 vs. non-FP32 model organization

40-96: Document the intentional L1 memory allocation differences between singlebuffer and doublebuffer configurations.

The L1 allocations (max 220,000 bytes) fit comfortably within Siracusa's 256 KB L1 cache. However, the different L1 values between singlebuffer and doublebuffer test suites lack explanatory comments. For example:

"FP32Kernels/Add/large" (220K) appears only in singlebuffer

"FP32Kernels/Activations/ReLU" differs: 2000 (singlebuffer) vs 20 (doublebuffer)

"FP32Kernels/ReduceMean/KeepDims/AllAxes" differs: 50000 vs 100000

Add comments explaining why these values differ (e.g., buffer strategy implications, kernel-specific tuning) to clarify the intentional configuration split.

Xeratec · 2025-12-21T13:35:39Z

Thanks for restructuring the other tests. I would suggest a few modifications, but I can make them, and you can review them once you have time.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0cb405 and 9c7da3a.

📒 Files selected for processing (13)

.github/workflows/ci-deeploy.yml (6 hunks)
.github/workflows/ci-platform-chimera.yml (1 hunks)
.github/workflows/ci-platform-cortexm.yml (2 hunks)
.github/workflows/ci-platform-generic.yml (2 hunks)
.github/workflows/ci-platform-mempool.yml (2 hunks)
.github/workflows/ci-platform-siracusa-neureka-tiled.yml (6 hunks)
.github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
.github/workflows/ci-platform-siracusa.yml (2 hunks)
.github/workflows/ci-platform-snitch-tiled.yml (1 hunks)
.github/workflows/ci-platform-snitch.yml (1 hunks)
.github/workflows/ci-platform-softhier.yml (1 hunks)
.github/workflows/infra-generate-ccache.yml (1 hunks)
Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

.github/workflows/ci-platform-snitch.yml
.github/workflows/ci-platform-cortexm.yml
.github/workflows/ci-platform-siracusa-neureka-tiled.yml
.github/workflows/ci-platform-snitch-tiled.yml
Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/infra-generate-ccache.yml
.github/workflows/ci-deeploy.yml
.github/workflows/ci-platform-siracusa.yml
.github/workflows/ci-platform-mempool.yml
.github/workflows/ci-platform-generic.yml

🔇 Additional comments (13)

.github/workflows/infra-generate-ccache.yml (1)

37-45: Update README.md to reference the new test path structure.

The workflow file has been correctly updated with the new test directory paths. However, README.md still references the old DeeployTest/Tests/Adder path in the documentation. Update these references to point to DeeployTest/Tests/Kernels/Integer/Add/Regular to reflect the new hierarchical test structure.

.github/workflows/ci-platform-chimera.yml (1)

38-40: LGTM!

The test name update to the hierarchical path-based structure is consistent with the broader test reorganization. Based on learnings, these paths correspond to directory names under DeeployTest/Tests/.

.github/workflows/ci-platform-softhier.yml (1)

38-39: LGTM!

Consistent with the hierarchical path-based test structure introduced across CI workflows.

.github/workflows/ci-platform-mempool.yml (1)

75-85: LGTM!

The models section follows the hierarchical path structure consistently with clear grouping by model family.

.github/workflows/ci-deeploy.yml (2)

63-66: LGTM!

The test paths are correctly updated to the new hierarchical structure with appropriate memory allocation settings.

129-136: LGTM!

Good coverage with both passing and --shouldFail test cases for tiler extension validation.

.github/workflows/ci-platform-siracusa-tiled.yml (2)

58-76: Comprehensive ReduceMean test coverage added.

The extensive ReduceMean tests covering both KeepDims and NoKeepDims variants across multiple axis configurations align well with the PR objective of improving ReduceMean operator support.

191-194: TinyViT ReduceMean layer test inclusion.

Good addition of model-level ReduceMean testing via Models/TinyViT/5M/Layers/FP32/ReduceMean to validate the operator in a realistic context.

.github/workflows/ci-platform-siracusa.yml (3)

61-79: LGTM!

Comprehensive ReduceMean test coverage with both KeepDims and NoKeepDims variants, covering various axis configurations including edge cases like AllAxes, combined operations (Add_ReduceMean), and individual axis reductions.

102-102: Model-layer test placement note.

Models/TinyViT/5M/Layers/FP32/ReduceMean is included in the kernels job rather than the models job. This appears intentional since it tests a specific layer/kernel rather than a full model, but verify this is the desired grouping.

121-135: LGTM!

The models section is well-organized with clear groupings by model family (CCT, miniMobileNet, MLPerf, TinyViT, CNN_Linear).
.github/workflows/ci-platform-generic.yml (2)
39-146: Excellent restructuring of test organization!

The hierarchical path-based structure significantly improves maintainability and clarity:

Clear categorization into Kernels/FP32, Kernels/Integer, Kernels/Mixed, and Models

Comprehensive ReduceMean test coverage (18 test variants) aligns perfectly with the PR objectives

Consistent with the discussed structure from past reviews

This makes it much easier to understand test coverage at a glance and navigate the test suite.

Based on learnings, these paths correspond to directory structures under DeeployTest/Tests/.

119-121: Remove duplicate test entry.

Kernels/Integer/Conv/Regular_2D_RQ appears twice (lines 119 and 121), causing unnecessary duplicate test execution.
🔎 Proposed fix

Remove the duplicate entry on line 121:
         Kernels/Integer/Conv/Regular_2D_RQ
         Kernels/Integer/Conv/DW_2D_RQ
-        Kernels/Integer/Conv/Regular_2D_RQ
         Kernels/Integer/MatMul/Regular_RQ
Likely an incorrect or invalid review comment.

.github/workflows/ci-platform-generic.yml

.github/workflows/ci-platform-mempool.yml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/ci-platform-generic.yml (1)

113-120: Test paths verified—organization is intentional.

All test paths exist and the CI checks pass. The grouping of model tests with related kernel tests in the generic-kernels job appears intentional: the quantization-aware kernel tests (*_RQ variants) are grouped with Models/Transformer_DeepQuant and Models/TinyViT/5M/Layers/FP32/ReduceMean to test these operations in both isolation and model context. If consistency with job naming is a priority, consider moving the two model tests to generic-models, but the current organization is functional and the grouping may be deliberate.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c7da3a and 6cd45a7.

📒 Files selected for processing (4)

.github/workflows/ci-platform-generic.yml (2 hunks)
.github/workflows/ci-platform-siracusa.yml (2 hunks)
.github/workflows/ci-platform-snitch-tiled.yml (1 hunks)
.github/workflows/ci-platform-snitch.yml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/ci-platform-snitch.yml
.github/workflows/ci-platform-snitch-tiled.yml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-24T11:43:47.236Z

Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

.github/workflows/ci-platform-generic.yml
.github/workflows/ci-platform-siracusa.yml

🔇 Additional comments (3)

.github/workflows/ci-platform-siracusa.yml (2)

38-111: Test paths are valid and correctly organized.

All 57 test paths referenced in the siracusa-kernels job exist as directories under DeeployTest/Tests/. The reorganization from flat test names to hierarchical categories (Kernels/FP32, Kernels/Integer, Models, Others) is properly structured and will execute without path-related failures.

120-136: All model test paths are valid and exist in the repository.

Verification confirms that all 10 test paths in the siracusa-models job configuration are correctly defined as directories in DeeployTest/Tests: Kernels/Integer/Attention, Models/CCT/FP32/CCT_1_16_16_8, Models/CCT/FP32/CCT_2_32_32_128_Opset20, Models/miniMobileNet, Models/miniMobileNetv2, Models/MLPerf/KeywordSpotting, Models/MLPerf/ImageClassification, Models/MLPerf/AnomalyDetection, Models/TinyViT/Demo, and Models/CNN_Linear2. No CI failures should result from missing test paths.

.github/workflows/ci-platform-generic.yml (1)

39-111: Excellent restructuring of kernel tests!

The hierarchical organization into Kernels/FP32/* and Kernels/Integer/* is clear and consistent, exactly as requested by Xeratec. The extensive ReduceMean test coverage (18 test cases covering KeepDims/NoKeepDims variants with multiple axis combinations) directly addresses the PR's objective.

Also, the duplicate Kernels/Integer/Slice entry from the previous review has been resolved.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

.github/workflows/ci-platform-siracusa-tiled.yml (1)

58-76: L1 memory configurations look reasonable.

The varying L1 memory sizes (e.g., single values like [8000] vs. multiple configurations like [5000,50000]) appear intentional for testing different tiling scenarios. Double-buffer tests appropriately use higher memory values. If these specific L1 values are critical for triggering specific tiling behaviors mentioned in Issue #134, consider adding comments documenting the reasoning.

Also applies to: 123-141

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cd45a7 and 3cff86f.

📒 Files selected for processing (12)

.github/workflows/ci-platform-mempool.yml
.github/workflows/ci-platform-siracusa-neureka-tiled.yml
.github/workflows/ci-platform-siracusa-tiled.yml
.github/workflows/ci-platform-snitch-tiled.yml
.github/workflows/ci-platform-snitch.yml
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/activations.npz
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/inputs.npz
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/network.onnx
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/outputs.npz
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/inputs.npz
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/network.onnx
DeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/outputs.npz

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/ci-platform-snitch.yml
.github/workflows/ci-platform-mempool.yml

🔇 Additional comments (7)

.github/workflows/ci-platform-snitch-tiled.yml (1)

40-50: LGTM! Test configuration restructuring is correct.

The updated hierarchical test paths follow the new "Kernels/{Integer|FP32}/" convention consistently, and the JSON syntax is now valid with proper comma separation and no trailing comma.

.github/workflows/ci-platform-siracusa-neureka-tiled.yml (3)

121-121: LGTM: neureka-wmem configuration properly added.

The neureka-wmem flag is correctly added to the relevant job configurations and properly passed through to the reusable workflow. The syntax is appropriate for both direct parameters (line 121) and matrix parameters (line 136).

Also applies to: 136-136, 146-146

131-131: Add comment explaining why Transformer test is disabled for wmem configuration.

The Models/Transformer test is commented out only in the doublebuffer-L3-wmem job (line 131), while remaining active in the standard doublebuffer configuration (line 93) and singlebuffer configuration (line 72). Without an explanatory comment, it's unclear whether this is a known incompatibility that needs resolution or a temporary exclusion. If intentional, add a comment documenting the reason.

40-43: Hierarchical test path restructuring approved.

The migration from granular test names to hierarchical path-based identifiers (e.g., Kernels/Integer/GEMM/Regular_RQPerColumn, Models/miniMobileNet) improves organization and aligns with restructuring the Tests subdirectory. All referenced paths exist in the test directory structure, and changes are consistently applied across all job configurations.

Also applies to: 55-58, 70-73, 91-93, 115-118, 129-132

.github/workflows/ci-platform-siracusa-tiled.yml (3)

160-198: Model test organization is consistent and comprehensive.

The hierarchical reorganization extends cleanly to model tests (Models/CCT/FP32/*, Models/TinyViT/*, etc.), and the inclusion of TinyViT ReduceMean layer tests across different memory configurations ensures thorough validation of the ReduceMean improvements.

Also applies to: 207-246, 248-294

26-294: Well-structured workflow with clear job organization.

The separation between kernel tests (using sequential runner) and model tests (using matrix strategy) is appropriate. The four job configurations (single/double-buffer × L2/L3) provide comprehensive coverage of different memory and buffering scenarios for the ReduceMean improvements.

38-96: LGTM: Excellent test coverage and improved organization.

The migration to hierarchical test paths (e.g., Kernels/FP32/ReduceMean/KeepDims/Axis0) significantly improves maintainability. The extensive ReduceMean test coverage with multiple axis combinations, both KeepDims/NoKeepDims variants, and integration with other operators aligns well with the PR objectives.

Xeratec

I organized the tests a bit differently. For example, the RQ kernels are actually integer kernels. Otherwise, everything stayed the same. @diaconuccalin, if it is also ok for you, I would merge this PR.

diaconuccalin · 2025-12-23T15:54:31Z

I organized the tests a bit differently. For example, the RQ kernels are actually integer kernels. Otherwise, everything stayed the same. @diaconuccalin, if it is also ok for you, I would merge this PR.

Hey @Xeratec, I will look into the changes today or tomorrow and let you know if everything seems fine.

diaconuccalin

Other than the 2 small mentions, LGTM! If the test was removed on purpose, then imo it's ok to merge without solving the small duplicate issue.

.github/workflows/ci-platform-mempool.yml

DeeployTest/Tests/test2DRequantizedConv/activations.npz

diaconuccalin requested review from Victor-Jung and Xeratec as code owners December 10, 2025 14:29

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py Show resolved Hide resolved

Xeratec requested changes Dec 12, 2025

View reviewed changes

.github/workflows/ci-platform-generic.yml Outdated Show resolved Hide resolved

Xeratec assigned diaconuccalin Dec 12, 2025

Xeratec added the Feature Addition of new features label Dec 12, 2025

Xeratec added this to Deeploy Dec 12, 2025

Xeratec moved this to In progress in Deeploy Dec 12, 2025

Xeratec added this to the Release 0.2.1 milestone Dec 12, 2025

Xeratec linked an issue Dec 12, 2025 that may be closed by this pull request

[BUG] Improper tiling implementation in PULPOpen for Slice and ReduceMean operators #134

Open

coderabbitai bot reviewed Dec 14, 2025

View reviewed changes

.github/workflows/ci-platform-snitch-tiled.yml Show resolved Hide resolved

.github/workflows/ci-platform-snitch.yml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 14, 2025

View reviewed changes

diaconuccalin requested a review from Xeratec December 15, 2025 08:26

diaconuccalin added 15 commits December 15, 2025 13:38

Reordered some float kernel tests. Clean-up gitignore file

05fc8aa

Add various test cases for FP32 ReduceMean

4079ccd

Extended ReduceMean binder to support all-axis reduction (so missing …

8417a54

…axes parameter). Added ReduceMean tests. Other fixes

ReduceMean parser fixes

6853ac5

Implement reduction of singleton reduced sizes in ReduceMean. Warning…

b66ca26

… fix

Added ReduceMean pass to remove if only singleton dimension reduction…

a568ff9

… and related tests

Improved ReduceMean template readability. Added ReduceMean test match…

a0b82c0

…ing the existent layer in TinyViT 5M

Reverted changes from regular reduce mean template and moved them to …

ab1ad52

…float one

Added parallelism to PULPOpen FP32 ReduceMean kernel

5ce3574

FP ReduceMean PULPOpen template fix

26070c7

Implement changes for tiling an uninitialized number of dimensions fo…

12b4c07

…r ReduceMean PULPOpen FP operation

Applied fixes to the PULPOpen FP32 ReduceMean template and tiling con…

81eb8a1

…straints. Added FP32 ReduceMean tests for Siracusa single buffer. Changed ReduceMean TinyViT test location. Decreased size of some ReduceMean tests

Fixed test naming issues. Reduced dimensions of 2 ReduceMean tests

c601384

Added CI tests for double buffering and L3 for PULPOpen FP32 ReduceMean

49c590e

Improved comments

0199be1

diaconuccalin added 7 commits December 15, 2025 13:40

Reorganized tests

a9e09ae

Fixed test locations in text

e3830ad

Relocated softmax

dd8973e

Quickfix

2be214c

Quickfix

d2762ad

Quickfix

9898f37

Fix formatting

4fca212

diaconuccalin force-pushed the 134-bug-improper-tiling-implementation-in-pulpopen-for-slice-and-reducemean-operators branch from 1e0826e to 4fca212 Compare December 15, 2025 13:45

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

.github/workflows/ci-platform-snitch-tiled.yml Show resolved Hide resolved

diaconuccalin added 2 commits December 15, 2025 13:56

New test reordering and post-merge fixes

3fcc048

Format fix

5b8afcf

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

Fix formatting

a0cb405

Xeratec added 3 commits December 21, 2025 21:13

Move Tests

19c8ed4

Adapt workflows

d88860b

Update ReadMe

9c7da3a

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

.github/workflows/ci-platform-generic.yml Outdated Show resolved Hide resolved

.github/workflows/ci-platform-mempool.yml Outdated Show resolved Hide resolved

Fix errors

6cd45a7

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

Fix errors

3cff86f

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

Xeratec approved these changes Dec 22, 2025

View reviewed changes

diaconuccalin commented Dec 23, 2025

View reviewed changes

.github/workflows/ci-platform-mempool.yml Outdated Show resolved Hide resolved

DeeployTest/Tests/test2DRequantizedConv/activations.npz Outdated Show resolved Hide resolved

Remove duplicated test

a7d0483

Xeratec merged commit 6db943f into pulp-platform:devel Dec 24, 2025
142 checks passed

github-project-automation bot moved this from In progress to Done in Deeploy Dec 24, 2025

diaconuccalin deleted the 134-bug-improper-tiling-implementation-in-pulpopen-for-slice-and-reducemean-operators branch December 24, 2025 00:47

coderabbitai bot mentioned this pull request Jan 12, 2026

PyTest Migration #144

Open

5 tasks

FP32 ReduceMean operator improvement #137

FP32 ReduceMean operator improvement #137

Uh oh!

Conversation

diaconuccalin commented Dec 10, 2025 • edited by Xeratec Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xeratec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Xeratec commented Dec 21, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Xeratec left a comment

Choose a reason for hiding this comment

Uh oh!

diaconuccalin commented Dec 23, 2025

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

diaconuccalin commented Dec 10, 2025 •

edited by Xeratec

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading