-
Notifications
You must be signed in to change notification settings - Fork 33
FP32 ReduceMean operator improvement #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP32 ReduceMean operator improvement #137
Conversation
📝 WalkthroughWalkthroughAdds opset-aware ReduceMean support (parsers, bindings), a PULPOpen-specific ReduceMean parser and parallel FloatReduceMean template, a pass to remove ReduceMean over only-singleton axes, tiler/constraint updates and wiring for PULPOpen, plus broad CI workflows migrated to hierarchical Kernels/Models/Others test paths. Changes
Sequence Diagram(s)sequenceDiagram
participant ONNX as ONNX Node
participant GenParser as ReduceMeanParser (Generic)
participant PulpParser as PULPReduceMeanParser
participant Optimizer as RemoveOnlySingletonReduceMeanPass
participant Bindings as ReduceMeanBindings
participant Template as FloatReduceMeanTemplate
participant Tiler as ReduceMeanTileConstraint
ONNX->>GenParser: parseNode (opset-aware: 1 or 2 inputs)
GenParser->>GenParser: normalize axes -> operatorRepresentation (numpy axes)
GenParser->>PulpParser: hand off ctxt & operatorRepresentation
PulpParser->>PulpParser: add dim_in_<axis> entries for non-reduced axes
PulpParser->>Optimizer: lowering/optimizations run
Optimizer->>Optimizer: inspect axes & input shapes
alt all reduced dims == 1
Optimizer-->>Bindings: remove ReduceMean node
else
Optimizer-->>Bindings: keep ReduceMean node
Bindings->>Template: select opset-specific template & transformer
Template->>Tiler: expose restDims/reduceLength/offsets
Tiler->>Tiler: addPolicyConstraint & serialize per-axis replacements
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
64-89: Policy constraint correctly limits tiling to largest non-reduced dimension.The implementation properly identifies the biggest non-reduced dimension and constrains tiling to that axis only. The edge case where all dimensions are reduced (
biggestNonReducedDim = -1) is handled by constraining all dimensions to their full size, which effectively disables tiling—a sensible fallback.Regarding the unused
ctxtparameter flagged by static analysis: this is likely kept for API consistency with otherTileConstraintmethods. Consider adding a comment or using_ = ctxtto explicitly acknowledge this.@staticmethod def addPolicyConstraint(tilerModel: TilerModel, parseDict: Dict, ctxt: NetworkContext) -> TilerModel: + _ = ctxt # Unused but kept for API consistency # ===== GET NECESSARY INFORMATION =====Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)
34-36: Clarify intent of dimension key access loop.This loop accesses
dim_in_keys but discards the result. If the intent is to validate key existence, consider adding a comment or using an explicit check. The current pattern is non-obvious.for ax in range(len(operatorRepresentation['data_in_shape'])): if ax not in operatorRepresentation['axes']: - _ = operatorRepresentation['dim_in_' + str(ax)] + # Ensure non-reduced dimension keys exist for tiling + assert 'dim_in_' + str(ax) in operatorRepresentation
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/_runner-siracusa-tiled-sequential.yml(1 hunks).github/workflows/ci-platform-generic.yml(1 hunks).github/workflows/ci-platform-siracusa-tiled.yml(5 hunks).github/workflows/ci-platform-siracusa.yml(1 hunks).gitignore(1 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py(1 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(1 hunks)Deeploy/Targets/Generic/Bindings.py(1 hunks)Deeploy/Targets/Generic/Parsers.py(2 hunks)Deeploy/Targets/Generic/Platform.py(2 hunks)Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py(2 hunks)Deeploy/Targets/PULPOpen/Bindings.py(3 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/_runner-siracusa-tiled-sequential.yml
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
Applied to files:
CHANGELOG.md
📚 Learning: 2025-12-02T13:54:22.700Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.
Applied to files:
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.pyDeeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.pyDeeploy/Targets/PULPOpen/Bindings.py
🧬 Code graph analysis (7)
Deeploy/Targets/Generic/Platform.py (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
RemoveEmptyConvBiasPass(518-523)RemoveOnlySingletonReduceMeanPass(547-552)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/DeeployTypes.py (1)
inputs(2503-2520)Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
deleteNode(67-86)contextagnostic(285-298)ReplaceSequentialPatternPass(265-282)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
Deeploy/AbstractDataTypes.py (1)
nLevels(194-195)
Deeploy/Targets/PULPOpen/Platform.py (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
RemoveEmptyConvBiasPass(518-523)RemoveOnlySingletonReduceMeanPass(547-552)Deeploy/Targets/PULPOpen/Parsers.py (2)
PULPReduceMeanParser(467-487)PULPTallGEMMParser(443-464)
Deeploy/Targets/Generic/Bindings.py (3)
Deeploy/DeeployTypes.py (1)
NodeBinding(1512-1657)Deeploy/Targets/Generic/TypeCheckers.py (1)
ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
inputs(2503-2520)outputs(2522-2539)lookup(720-752)
Deeploy/Targets/PULPOpen/Parsers.py (1)
Deeploy/Targets/Generic/Parsers.py (16)
ReduceMeanParser(526-599)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(549-599)parseNodeCtxt(613-619)parseNodeCtxt(632-649)
🪛 Ruff (0.14.8)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
65-65: Unused static method argument: ctxt
(ARG004)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
526-526: Unused function argument: name
(ARG001)
🔇 Additional comments (25)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1)
242-243: LGTM! Good documentation practice.The added comment clearly documents the inheritance of policy constraints from the parent Conv2D implementation, improving code readability without any functional impact.
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py (1)
145-162: Verify pageargs string parsing robustness.The string extraction logic assumes simple quoted string literals without handling escaped quotes (e.g.,
pageargs["key\"with\"quotes"]), string concatenation (e.g.,pageargs["prefix" + suffix]), or non-string keys. If templates only use simple string literals for pageargs keys, this limitation may be acceptable. However, if any of these patterns exist in actual template usage, the parsing will fail or produce incorrect results. Consider either: confirming that templates restrict pageargs to simple quoted strings, or implementing more robust parsing usingast.literal_eval().Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)
23-26: LGTM! Proper guard for empty closure arguments.The conditional check prevents generating an unused
argsvariable when there are no closure struct arguments, avoiding potential compiler warnings and improving code cleanliness..github/workflows/ci-platform-siracusa.yml (1)
38-74: Well-organized test expansion for FP32 kernels.The test matrix restructuring follows a clear hierarchical naming convention and provides comprehensive coverage for ReduceMean with various axis configurations and keepdims settings. The organization with blank lines between categories improves maintainability.
CHANGELOG.md (1)
7-7: Changelog entries are comprehensive and well-documented.All additions accurately reflect the PR scope: tiler flexibility, parallelization support, test coverage expansion, and the fix for Issue #134. The entries follow the existing formatting conventions.
Also applies to: 31-34, 86-88, 128-128
.gitignore (1)
5-54: Clean reorganization with section headers.The categorical organization with comment headers improves readability and makes future modifications easier. The negation pattern for
docs/_templates/*correctly preserves template files while ignoring other HTML files.Deeploy/Targets/Generic/Parsers.py (3)
576-582: Thoughtful handling of singleton dimension reduction.The logic to skip reduction over singleton dimensions while preserving at least one axis is a good optimization. This avoids unnecessary computation when reducing dimensions that don't change the output.
554-556: Clean ONNX opset >= 18 handling.The parser correctly identifies the opset version by checking for the absence of the
axesattribute and properly marks the axes tensor as non-deployable since it's only needed at compile-time for template generation.Also applies to: 561-566
568-572: Verify axis normalization before sorting.Negative axis indices should be normalized to positive values before sorting. ONNX allows negative indices (e.g.,
-1for last dimension), and sorting negative integers produces incorrect ordering (they sort before positive values). Ensure axes are converted to positive indices relative to the tensor rank before callingsort(), or confirm that the input axes are already normalized at this point.Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (2)
20-24: Helpful docstring clarifying optimization scope.The warning about TinyViT-specific optimization (49 elements in reduced axis) is valuable for future maintainers who might need different strategies for larger reduction sizes.
146-153: Per-axis dimension tracking aligns with parser changes.The replacement scheme now correctly tracks
dim_in_<axis>for each non-reduced axis, matching thePULPReduceMeanParserthat populates these in the operator representation. Usinguint32_tpointers is appropriate for dimension values that may exceed 16-bit limits.Also applies to: 165-168
.github/workflows/ci-platform-generic.yml (1)
40-74: Comprehensive ReduceMean test coverage added.The new test entries provide thorough coverage of ReduceMean scenarios with both
KeepDimsandNoKeepDimsvariants, covering various axis configurations (single axis, multiple axes, all axes). The test organization follows a clear hierarchical structure..github/workflows/ci-platform-siracusa-tiled.yml (2)
40-71: Well-structured FP32 kernel test configurations for tiled execution.The test configurations include appropriate L1 memory allocations for various ReduceMean scenarios. Tests with multiple L1 values (e.g.,
[5000, 50000]) allow testing different memory pressure scenarios.
185-186: TinyViT ReduceMean layer test added to model matrix.The L1 values
[200, 40000]provide good coverage for both constrained and relaxed memory scenarios.Deeploy/Targets/Generic/Platform.py (2)
5-6: Proper integration of RemoveOnlySingletonReduceMeanPass.The new optimization pass is correctly imported and will remove ReduceMean operations that only reduce over singleton dimensions, which is a no-op.
166-168: Pass ordering looks correct.Placing
RemoveOnlySingletonReduceMeanPassafterRemoveEmptyConvBiasPassis appropriate since both are cleanup passes that simplify the graph before downstream processing.Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
546-551: Pass implementation follows established patterns.The class follows the same structure as
RemoveEmptyConvBiasPassand uses the@contextagnosticdecorator appropriately.Deeploy/Targets/PULPOpen/Parsers.py (1)
467-487: Clean extension of ReduceMeanParser for PULP-specific needs.The parser correctly extends the generic
ReduceMeanParserand adds dimension information for non-reduced axes. This enables the tiler to handle variable-dimension inputs properly.The dynamic key generation (
dim_in_+ str(ax)) allows flexible handling of tensors with different numbers of dimensions.Deeploy/Targets/Generic/Bindings.py (1)
211-220: Correct ONNX opset version handling for ReduceMean bindings.The distinction between OPSET < 18 (axes as integer constant input) and OPSET >= 18 (axes as optional input or attribute) is properly handled with separate binding configurations.
Deeploy/Targets/PULPOpen/Bindings.py (2)
296-304: ForkTransformer enables parallelization for FP32 ReduceMean.Changing from
ClusterTransformertoForkTransformerenables the parallel execution path viapi_cl_team_fork, which aligns with the PR objective of adding parallelization support for the FP32 ReduceMean operator.
30-35: Verify FloatReduceMeanTemplate uses explicit bitwidth types in PULPOpen implementation.The import has been moved from Generic.Templates to PULPOpen.Templates for platform-specific implementation. Per template best practices, ensure the FloatReduceMeanTemplate uses explicit bitwidth types (e.g.,
float${...type.referencedType.typeWidth}_t*) rather than hardcoded types likefloat*to maintain type consistency with templated kernel calls.Deeploy/Targets/PULPOpen/Platform.py (2)
8-9: LGTM! Clean integration of PULPReduceMeanParser and RemoveOnlySingletonReduceMeanPass.The import updates and mapper changes correctly wire up the new ReduceMean parsing pathway for PULPOpen. The PULPReduceMeanParser extends the base parser with dimension information needed for proper tiling.
Also applies to: 36-36, 68-68
237-237: LGTM! Optimization pass added to correct location.Adding
RemoveOnlySingletonReduceMeanPass()toPULPOptimizeris appropriate for removing no-op ReduceMean operations that only change shape.Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)
10-15: Helpful parallelization caveat documented.The warning about TinyViT optimization (49 elements in reduced axis) is useful context for future maintainers.
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
84-88: Verifyoutput_offsetasymmetry in keepdims condition.
output_offsetis only applied whenkeepdimsisTrue. Determine whether this asymmetry is intentional or if it should apply consistently in both branches. Compare with the corresponding PULPOpen template to clarify the intended behavior.
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
33-58: MakerestDimsiteration deterministic and consider a small rank‑1 safeguardTwo minor template-preparation points:
restDimsis built as asetand later iterated vialist(restDims), which makes the loop nest order depend on Python’s set iteration order. This doesn’t change numerical results but can make generated C code non‑reproducible between runs. Consider sorting once when you compute it:-## Compute the remaining dimensions after reduction -restDims = set(list(range(len(data_in_shape)))).difference(set(axes)) +## Compute the remaining dimensions after reduction (deterministic order) +restDims = sorted(set(range(len(data_in_shape))).difference(set(axes)))
- For rank‑1 inputs,
data_in_shape[1:]is empty, soshapeStrstays''and the cast becomes(${data_in_type.referencedType.typeName} (*) )${data_in}. Most compilers will accept this, but if you ever target rank‑1 ReduceMean heavily it may be clearer to special‑case that rank (e.g., cast to${data_in_type.referencedType.typeName}*instead).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py(1 hunks)Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-02T13:54:22.700Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.
Applied to files:
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py
🧬 Code graph analysis (1)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)
Deeploy/AbstractDataTypes.py (1)
nLevels(194-195)Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)
_FloatReduceMeanTemplate(10-38)
🔇 Additional comments (1)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
21-27: Input/output offset computation inalignToContextlooks correct nowUsing
data_inforinput_offsetanddata_outforoutput_offsetwith the_signed/nLevelsguards fixes the earlier bug and correctly derives offsets from the respective tensor types. No further changes needed here.
Xeratec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work! Very valuable.
I only have one comment regarding the restructuring of the tests. I would either consistently restructure the tests or remove it from this PR and do it in another one.
If you decide to do the restructuring here, please also apply the attached patch to improve selecting the correct test in VS Code. Please let me know how you want to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1)
50-55: Comment-only change looks correct; consider avoiding hardcoded test paths in templates.Updating the example command to
-t=Tests/Others/Transformermatches the test-suite reorg and is low-risk. If this pattern appears across templates, consider referencing a canonical doc/README snippet (or a single shared constant) to reduce future comment drift..github/workflows/ci-deeploy.yml (1)
62-66: Minor: normalize-tpath style (Tests/...vs./Tests/...) for consistency.
These run aftercd DeeployTest, so both likely work, but standardizing reduces future footguns if the working directory changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.github/workflows/ci-deeploy.yml(6 hunks).github/workflows/ci-platform-chimera.yml(1 hunks).github/workflows/ci-platform-cortexm.yml(2 hunks).github/workflows/ci-platform-generic.yml(2 hunks).github/workflows/ci-platform-mempool.yml(2 hunks).github/workflows/ci-platform-siracusa-neureka-tiled.yml(6 hunks).github/workflows/ci-platform-siracusa-tiled.yml(5 hunks).github/workflows/ci-platform-siracusa.yml(2 hunks).github/workflows/ci-platform-snitch-tiled.yml(1 hunks).github/workflows/ci-platform-snitch.yml(1 hunks).github/workflows/ci-platform-softhier.yml(1 hunks).github/workflows/infra-generate-ccache.yml(1 hunks)Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-cortexm.yml.github/workflows/ci-deeploy.yml.github/workflows/ci-platform-siracusa.yml.github/workflows/ci-platform-snitch.yml.github/workflows/ci-platform-mempool.yml.github/workflows/infra-generate-ccache.yml.github/workflows/ci-platform-generic.yml
🔇 Additional comments (18)
.github/workflows/infra-generate-ccache.yml (1)
37-45: Verify that the cache generation scope aligns with the PR's ReduceMean focus.The cache generation workflow now runs only
IntKernels/Add/Regulartests across all test runners. However, this PR adds extensive ReduceMean test coverage and fixes ReduceMean tiling/parallelization. Given that the cache is refreshed daily to support CI workflows, running only Add tests may not provide optimal coverage for ReduceMean-related changes and subsequent CI runs.Consider whether the cache generation should include ReduceMean test paths (e.g.,
./Tests/FP32Kernels/ReduceMean/Regularor similar) to ensure the cache benefits the newly added ReduceMean functionality.Can you confirm:
- Is
IntKernels/Add/Regularintentionally chosen as a lightweight representative test, or is this an oversight?- Should ReduceMean test paths be included in the cache generation?
- If only Add is needed, is there documentation explaining why Add alone is sufficient for all workflows?
.github/workflows/ci-deeploy.yml (5)
84-88: LGTM: state-equality tests now target the reorganizedOthers/SimpleRegressionpath.
(Assuming the directory move is part of this PR’s test reorg.)
106-110: LGTM: memory-level extension tests updated toOthers/SimpleRegression.
128-136: LGTM: tiler-extension coverage broadened; just ensure all referencedTests/...directories exist.
154-161: LGTM: memory-allocation-extension coverage broadened; consider deduping if runtime becomes an issue.
197-201: LGTM: debug print tests now point atOthers/SimpleRegression..github/workflows/ci-platform-siracusa-tiled.yml (3)
104-155: Same ask for the doublebuffertests-config: parse JSON + ensure directories exist.
164-199: Double-check casing/renames for newly referenced model paths (TinyViT ReduceMean, microLlama, CCT FP32).
These strings are effectively filesystem paths; please ensure the directory names match exactly (notablyFP32Kernels/Add/largevs.../Largeconventions, and anyregular/Regularmismatches).Also applies to: 213-239, 255-285
39-95: JSON syntax in bothtests-configblocks is valid.Both embedded JSON arrays parse successfully:
- Block 1 (lines 39-95): 45 entries ✓
- Block 2 (lines 105-155): 41 entries ✓
No syntax errors detected (missing commas, malformed objects, etc.).
.github/workflows/ci-platform-cortexm.yml (2)
57-59: The test directories referenced in the workflow exist and are correctly specified.
38-49: All listed Cortex-M kernel test paths are valid and correctly formatted. Verification confirms all 11 directories exist underDeeployTest/Tests/with the exact casing shown in the workflow, including the lowercase variantsIntKernels/MatMul/regularandIntKernels/MatMul/add. No corrections needed..github/workflows/ci-platform-mempool.yml (1)
75-85: All MemPool model test directories verified to exist underDeeployTest/Tests/with correct case sensitivity. No action needed..github/workflows/ci-platform-chimera.yml (1)
38-40: The test pathDeeployTest/Tests/IntKernels/Add/Regularexists and is correctly referenced in the workflow..github/workflows/ci-platform-siracusa.yml (2)
55-73: Extensive ReduceMean test coverage with both KeepDims and NoKeepDims variants.Lines 55–73 add comprehensive ReduceMean coverage with 18 test entries spanning both keepdims modes and multiple axis configurations (AllAxes, Axes1_2_3, Axes1_3, Axes2_1, Axis0, Axis2, and composition tests like Add_ReduceMean). This directly addresses the PR objective of adding extensive ReduceMean test coverage and aligns with the FP32 ReduceMean operator improvements described in Issue #134.
39-110: Hierarchical test path restructuring is consistent and comprehensive.The restructuring from granular legacy test names to structured FP32Kernels/IntKernels/Models/Others categorization is uniformly applied across both siracusa-kernels (lines 39–110) and siracusa-models (lines 119–135) jobs. All listed test paths correspond to actual directories in DeeployTest/Tests/, including the extensive ReduceMean variants. Visual grouping with blank lines aids readability, and the naming scheme is consistent with other CI workflows and aligns with past feedback requesting comprehensive test reorganization.
.github/workflows/ci-platform-generic.yml (2)
62-80: ReduceMean test coverage mirrors Siracusa workflow with identical structure.The FP32Kernels/ReduceMean entries in lines 62–80 are structurally identical to those in ci-platform-siracusa.yml (lines 55–73), providing the same comprehensive coverage of both KeepDims and NoKeepDims variants with multiple axis configurations. This consistency reinforces the systematic addition of extensive ReduceMean test coverage across platforms.
39-120: Generic workflow test restructuring extends beyond Siracusa with additional IntKernels coverage.The generic-kernels job (lines 39–120) includes all hierarchical FP32Kernels paths plus additional entries for IntKernels/ReduceMean, IntKernels/ReduceSum, IntKernels/Slice (lines 108–110), and FP32Kernels/Pow/Scalar, FP32Kernels/Pow/Vector variants (lines 59–60), appropriately expanding test coverage for the generic platform. All referenced test directories exist in DeeployTest/Tests/. The generic-models job (lines 128–145) follows the same hierarchical organization with Models and Others categories.
.github/workflows/ci-platform-siracusa-neureka-tiled.yml (1)
40-43: Test paths are valid and consistent across all Siracusa Neureka Tiled workflow jobs.All referenced test directories exist in DeeployTest/Tests/: Others/RequantizedLinear, IntKernels/Conv/2D/PW/* variants, Models/miniMobileNet, IntKernels/Attention, Others/Transformer, and Models/microLlama/microLlama1. L1 memory configurations are preserved uniformly across the six workflow jobs (lines 40-43, 55-58, 70-73, 91-93, 115-118, 129-132). The commented-out Transformer test in wmem-doublebuffer-L3 (line 131) aligns with the noted LMACAN memory constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.vscode/launch.json (1)
69-72: Keep JSON style consistent ("find":spacing) if the repo enforces formatting.
Not functional, but this file often gets auto-formatted; matching surrounding style reduces noisy diffs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/ci-platform-snitch-tiled.yml(1 hunks).vscode/launch.json(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/ci-platform-snitch-tiled.yml
🔇 Additional comments (1)
.vscode/launch.json (1)
68-72: Removefind/replaceand useapplyarray —labelTransformAPI structure is incorrect.The
labelTransformobject cannot have directfindandreplacefields. According to the Command Variable extension documentation, the correct structure uses anapplyarray:"labelTransform": { "text": "${fileDirname}", "apply": [ { "find": "...", "replace": "", "flags": "g" } ] }Also,
findmust be a regex pattern (not a literal path string). If the goal is to strip${workspaceFolder}/DeeployTest/Tests/from the label, you'll need an escaped regex pattern. To handle Windows path separators correctly, include a separate regex step:{ "find": "\\\\", "replace": "/", "flags": "g" }before the path-stripping operation.Likely an incorrect or invalid review comment.
…axes parameter). Added ReduceMean tests. Other fixes
… and related tests
…ing the existent layer in TinyViT 5M
…r ReduceMean PULPOpen FP operation
…straints. Added FP32 ReduceMean tests for Siracusa single buffer. Changed ReduceMean TinyViT test location. Decreased size of some ReduceMean tests
1e0826e to
4fca212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (1)
44-44: Non-deterministic iteration order due to unorderedset.As discussed in the previous review, while the set difference operation is correct, iterating over
restDims(lines 68, 89) can produce different loop orderings across Python runs, leading to non-reproducible generated C code. Consider usingsorted()for deterministic output, matching the PULPOpen template pattern.
🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (2)
34-36: Clarify intent: accessing without assignment.The loop accesses
operatorRepresentation['dim_in_' + str(ax)]but doesn't use the returned value. If this is meant to validate that these keys exist (raisingKeyErrorif missing), consider making the intent explicit:for ax in range(len(operatorRepresentation['data_in_shape'])): if ax not in operatorRepresentation['axes']: - _ = operatorRepresentation['dim_in_' + str(ax)] + # Validate that dimension metadata exists for non-reduced axes + assert f'dim_in_{ax}' in operatorRepresentation, \ + f"Missing dimension metadata for axis {ax}"If validation isn't the goal, please clarify the purpose of this access.
11-15: Consider tracking optimization scope for future work.The warning indicates this implementation is optimized for TinyViT's specific use case (49 elements in the reduced axis). For broader applicability, you might want to track investigation of alternative parallelization strategies for larger reduction sizes.
Would you like me to open a follow-up issue to track investigation of parallelization strategies for larger reduction dimensions?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
.github/workflows/ci-deeploy-testing.yml(1 hunks).github/workflows/ci-deeploy.yml(6 hunks).github/workflows/ci-platform-chimera.yml(1 hunks).github/workflows/ci-platform-cortexm.yml(2 hunks).github/workflows/ci-platform-generic.yml(2 hunks).github/workflows/ci-platform-mempool.yml(2 hunks).github/workflows/ci-platform-siracusa-neureka-tiled.yml(6 hunks).github/workflows/ci-platform-siracusa-tiled.yml(5 hunks).github/workflows/ci-platform-siracusa.yml(2 hunks).github/workflows/ci-platform-snitch-tiled.yml(1 hunks).github/workflows/ci-platform-snitch.yml(1 hunks).github/workflows/ci-platform-softhier.yml(1 hunks).github/workflows/infra-generate-ccache.yml(1 hunks).gitignore(1 hunks).vscode/launch.json(1 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py(1 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(1 hunks)Deeploy/Targets/Generic/Bindings.py(1 hunks)Deeploy/Targets/Generic/Parsers.py(2 hunks)Deeploy/Targets/Generic/Platform.py(2 hunks)Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Bindings.py(3 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(4 hunks)Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py
- Deeploy/CommonExtensions/CodeTransformationPasses/IntrospectiveCodeTransformation.py
- .github/workflows/infra-generate-ccache.yml
- CHANGELOG.md
- .github/workflows/ci-platform-siracusa-neureka-tiled.yml
- Deeploy/Targets/Generic/Parsers.py
- .github/workflows/ci-deeploy.yml
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
- Deeploy/Targets/Generic/Bindings.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-snitch.yml.github/workflows/ci-platform-siracusa.yml.github/workflows/ci-platform-mempool.yml.github/workflows/ci-platform-cortexm.yml.github/workflows/ci-deeploy-testing.yml.github/workflows/ci-platform-generic.yml
📚 Learning: 2025-12-02T13:54:22.700Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.
Applied to files:
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.pyDeeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.pyDeeploy/Targets/PULPOpen/Bindings.py
🧬 Code graph analysis (8)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/DeeployTypes.py (1)
inputs(2503-2520)Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
deleteNode(67-86)contextagnostic(285-298)ReplaceSequentialPatternPass(265-282)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/Targets/Generic/Parsers.py (16)
ReduceMeanParser(526-599)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(549-599)parseNodeCtxt(613-619)parseNodeCtxt(632-649)Deeploy/DeeployTypes.py (3)
parseNodeCtxt(1051-1076)NetworkContext(508-1020)lookup(720-752)
Deeploy/Targets/Generic/Platform.py (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
RemoveEmptyConvBiasPass(518-523)RemoveOnlySingletonReduceMeanPass(547-552)
Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)
Deeploy/AbstractDataTypes.py (1)
nLevels(194-195)Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (1)
_FloatReduceMeanTemplate(10-38)
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (4)
Deeploy/DeeployTypes.py (2)
NetworkContext(508-1020)NodeTemplate(87-229)Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)
_FloatReduceMeanTemplate(10-29)alignToContext(15-29)Deeploy/Targets/Generic/Templates/ReduceMeanTemplate.py (1)
alignToContext(15-27)Deeploy/AbstractDataTypes.py (1)
nLevels(194-195)
Deeploy/Targets/PULPOpen/Platform.py (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
RemoveEmptyConvBiasPass(518-523)RemoveOnlySingletonReduceMeanPass(547-552)Deeploy/Targets/PULPOpen/Parsers.py (2)
PULPReduceMeanParser(467-487)PULPTallGEMMParser(443-464)Deeploy/DeeployTypes.py (1)
NodeMapper(1660-1816)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (5)
Deeploy/CommonExtensions/DataTypes.py (1)
uint32_t(48-51)Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (3)
addPolicyConstraint(85-124)addPolicyConstraint(241-256)constructSymbolicNodeRep(127-138)Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
addPolicyConstraint(77-93)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (4)
addPolicyConstraint(78-116)addPolicyConstraint(337-386)constructSymbolicNodeRep(119-130)constructSymbolicNodeRep(389-407)Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)
constructSymbolicNodeRep(62-66)
Deeploy/Targets/PULPOpen/Bindings.py (2)
Deeploy/Targets/Snitch/Templates/FloatSoftmaxTemplate.py (1)
FloatSoftmaxTemplate(10-24)Deeploy/Targets/Generic/Templates/RQAddTemplate.py (1)
RQAddTemplate(10-25)
🪛 Ruff (0.14.8)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
526-526: Unused function argument: name
(ARG001)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
65-65: Unused static method argument: ctxt
(ARG004)
🔇 Additional comments (38)
.vscode/launch.json (1)
70-70: LGTM—aligns with test path reorganization.The update from a regex pattern to an absolute path selector is appropriate for the new hierarchical test structure. This improves clarity and maintainability of the debug configuration's file picker labels.
.github/workflows/ci-platform-mempool.yml (3)
38-85: Improved test organization and ReduceMean test addition align with PR objectives.The restructuring from flat test names to hierarchical, path-based test selection improves maintainability and clarity. The addition of
IntKernels/ReduceMean(line 58) is consistent with the PR's objective to extend ReduceMean support and add extensive test coverage. The blank-line grouping further enhances readability without changing test logic.
75-85: All model test paths are valid and correspond to existing directories.Verification confirms that all seven test paths listed in the workflow file (Models/CCT/Int/ICCT, Models/CCT/Int/ICCT_8, Models/CCT/Int/ICCT_ITA, Models/miniMobileNet, Models/miniMobileNetv2, Models/simpleCNN, and Others/SimpleRegression) exist as actual directories under DeeployTest/Tests/.
38-67: All hierarchical test paths are correctly mapped to actual directories.The test path migration is verified. All referenced directories under
DeeployTest/Tests/exist and match the workflow configuration: IntKernels paths (Add, Conv, GEMM, MatMul, MaxPool, Pad, ReduceMean, ReduceSum, Slice) and Others paths (RequantizedConv2D, RequantizedDWConv, RQConv, RQGEMM, RQMatMul) are properly structured and accessible for test execution.Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py (1)
53-53: LGTM!Comment correctly updated to reflect the new hierarchical test path structure (
Tests/Others/Transformer), consistent with the PR's test directory reorganization.Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (4)
20-24: LGTM!The docstring appropriately warns about the optimization being tailored for TinyViT ReduceMean layers (49 elements in the reduced axis), which helps maintainers understand when to revisit this implementation.
64-89: LGTM!The policy constraint implementation correctly:
- Identifies the biggest non-reduced dimension for tiling
- Constrains all other dimensions to their full size (no tiling)
- Handles the edge case where all dimensions are reduced (
biggestNonReducedDim = -1) by disabling tiling entirelyThe unused
ctxtparameter follows the established interface pattern used by otherTileConstraintsubclasses (e.g.,MatMulTileConstraint.addPolicyConstraint,ConvTileConstraint.addPolicyConstraint).
91-102: LGTM!The symbolic node representation correctly exposes
dim_in_<axis>variables for non-reduced axes, enabling the tiler to work with dynamic dimension values during tiling.
146-168: LGTM!The serialization correctly:
- Uses
uint32_tpointers for dimension replacements (supporting larger dimension values)- Builds replacement entries only for non-reduced axes
- Populates the replacement values from computed input cube dimensions
This aligns well with the
dim_in_<axis>pattern established inPULPReduceMeanParserandconstructSymbolicNodeRep.Deeploy/Targets/Generic/Templates/FloatReduceMeanTemplate.py (2)
10-29: LGTM!The
alignToContextmethod correctly:
- Initializes offsets to 0 for float tensors
- Only computes non-zero offsets when quantization attributes (
_signed,nLevels) exist- Uses
data_out.nLevelsforoutput_offset(previously fixed bug)Based on learnings, explicit bitwidth types should be used in templates, which is correctly done here via
${data_out_type.referencedType.typeName}.
62-91: LGTM!The template correctly implements the ReduceMean operation:
- Nested loops iterate over non-reduced dimensions first, then reduced dimensions
- Accumulator is properly initialized with
input_offset * reduceLength- Mean is computed by dividing by
reduceLength- Output is written with a dummy pointer that auto-increments
For FP32 tensors (the target of this template), offsets are always 0, so the
keepdimsbranching is effectively a no-op but maintains consistency with the quantized code path.Deeploy/Targets/PULPOpen/Parsers.py (2)
11-12: LGTM!Import correctly updated to include
ReduceMeanParserfor the newPULPReduceMeanParserclass.
467-487: LGTM!The
PULPReduceMeanParsercorrectly extendsReduceMeanParserto adddim_in_<axis>entries for non-reduced axes:
- Delegates to parent
parseNodeCtxtfor standard parsing- Retrieves the original input shape from context
- Adds
dim_in_<axis>for each non-reduced axis, matching the pattern expected byReduceMeanTileConstraint.constructSymbolicNodeRepandserializeTilingSolutionThis enables the tiler to work with per-axis dimension information for flexible tiling strategies.
Deeploy/Targets/Generic/Platform.py (2)
5-6: LGTM!Import correctly updated to include
RemoveOnlySingletonReduceMeanPassalongside existingRemoveEmptyConvBiasPass.
167-167: LGTM!The
RemoveOnlySingletonReduceMeanPassis appropriately placed in the optimizer pass sequence afterRemoveEmptyConvBiasPass. This pass will remove ReduceMean operators that only reduce over singleton dimensions (size 1), which don't change data content—just shape—as described in the PR objectives.Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
526-543: LGTM! Solid implementation of singleton ReduceMean removal.The optimization pass correctly removes ReduceMean operations that only reduce over singleton dimensions (size 1), which don't change data content. The logic properly:
- Preserves single-node graphs
- Handles both attribute-based and input-based axis specifications
- Guards against missing shape information
- Uses the existing
deleteNodehelperNote: The unused
nameparameter flagged by static analysis is part of the expected signature forReplaceSequentialPatternPassreplacement functions and can be ignored.
546-552: LGTM! Properly structured optimization pass.The class follows the established pattern for
ReplaceSequentialPatternPasssubclasses in this codebase and is correctly decorated with@contextagnostic.Deeploy/Targets/PULPOpen/Bindings.py (2)
17-18: LGTM! Import reorganization supports PULPOpen-specific template.The import change correctly moves
FloatReduceMeanTemplatefrom the generic template path to the PULPOpen-specific implementation, enabling platform-specific optimizations and parallelization.Also applies to: 32-35
300-304: LGTM! Transformer change enables parallelization.Switching from
ClusterTransformertoForkTransformeris appropriate for the new parallelized ReduceMean implementation.ForkTransformerprovides the necessary fork-based parallel execution support (pi_cl_team_fork) that the new template requires.Deeploy/Targets/PULPOpen/Platform.py (1)
9-9: LGTM! Proper integration of PULPOpen-specific ReduceMean support.All changes are consistent and correctly integrate the new ReduceMean functionality:
- Imports the new optimization pass and PULPOpen-specific parser
- Updates the ReduceMeanMapper to use
PULPReduceMeanParser()- Adds the singleton removal pass to the optimization pipeline in a logical position
Also applies to: 37-37, 71-71, 243-243
Deeploy/Targets/PULPOpen/Templates/FloatReduceMeanTemplate.py (2)
42-91: LGTM! Well-structured precomputation logic.The Mako precomputation block correctly:
- Updates shapes based on tiling constraints
- Normalizes negative axis indices
- Computes the reduction length and remaining dimensions
- Sorts dimensions heuristically for efficient parallelization
- Builds C array access patterns
93-142: LGTM! Solid parallelization implementation with proper type handling.The template correctly implements core-based parallelization:
- Divides work into chunks across
NUM_CORES- Handles both compile-time constants and runtime dimensions (via pointer dereference)
- Properly initializes accumulators with offset correction
- Uses type introspection (
${data_in_type.referencedType.typeName}) for explicit bitwidth typesThe conditional
output_offsetapplication based onkeepdimswas confirmed as intentional in past reviews.Based on learnings, the template correctly uses explicit bitwidth types via
referencedType.typeNameintrospection..github/workflows/ci-platform-chimera.yml (1)
38-40: Test path restructuring aligns with hierarchical naming convention.The change from
AddertoIntKernels/Add/Regularfollows the standardized path-based test-naming pattern across the PR. Based on learnings, test names correspond toDeeployTest/Tests/directory paths.Verify that the test directory
DeeployTest/Tests/IntKernels/Add/Regularexists in the repository..github/workflows/ci-platform-softhier.yml (1)
38-39: Test path restructuring aligns with hierarchical naming convention.The change from
AddertoIntKernels/Add/Regularmirrors the standardized pattern across the PR.Verify that the test directory
DeeployTest/Tests/IntKernels/Add/Regularexists in the repository..github/workflows/ci-deeploy-testing.yml (1)
43-43: Consistent test path updates for type inference testing.All three test entries (fail-input0, fail-input2, pass) are updated from
testTypeInferenceDifferentTypestoOthers/TypeInference, maintaining consistency. The path construction at line 74 (./Tests/${{ matrix.test }}) properly integrates the updated paths.Verify that the test directory
DeeployTest/Tests/Others/TypeInferenceexists in the repository.Also applies to: 49-49, 55-55
.github/workflows/ci-platform-siracusa.yml (2)
38-111: Comprehensive hierarchical test restructuring with extensive ReduceMean coverage.The siracusa-kernels job expands test coverage significantly with 70+ hierarchical test paths organized into FP32Kernels, IntKernels, Models, and Others categories. Notably:
- ReduceMean receives comprehensive testing: 18 variants (9 KeepDims + 9 NoKeepDims) covering different axes and compositions
- Proper visual grouping with blank-line separators between categories
- Consistent indentation and path formatting throughout
Verify that all hierarchical test paths exist in
DeeployTest/Tests/, especially the extensive ReduceMean variants (FP32Kernels/ReduceMean/KeepDims/* and FP32Kernels/ReduceMean/NoKeepDims/*).
120-137: Models section aligned with hierarchical structure.The siracusa-models job properly includes the specific TinyViT ReduceMean test (line 102) and maintains consistent hierarchical paths for model-level tests.
.github/workflows/ci-platform-cortexm.yml (2)
38-49: Cortex-M kernel tests properly restructured with hierarchical paths.The test-names list for cortexm-kernels follows the standardized path-based naming, with appropriate mix of IntKernels and Others categories. The inclusion of IntKernels/ReduceMean and IntKernels/Slice aligns with the PR's focus on ReduceMean and related operator improvements.
Verify that all paths exist in
DeeployTest/Tests/, particularly IntKernels/ReduceMean and IntKernels/Slice.
57-59: Cortex-M models tests aligned with platform focus.Models section includes appropriate test cases for the platform.
.github/workflows/ci-platform-snitch.yml (1)
38-52: Snitch-kernel test paths properly restructured with visual category grouping.The test-names list uses hierarchical paths with blank-line separators for readability, covering FP32Kernels Softmax, IntKernels Add/Activations/MatMul, and Others categories. Per the past review comment regarding whitespace-only lines, the formatting appears corrected here.
Verify that blank lines in the test-names block contain no trailing whitespace, as the bash loop condition
[[ -n "$testName" ]]will treat whitespace-only strings as non-empty, causing malformed test-path errors. Run a verification check to ensure all blank lines in this section are truly empty..github/workflows/ci-platform-generic.yml (2)
38-122: Comprehensive hierarchical test restructuring across generic-kernels.The generic-kernels job implements the extensive test restructuring requested in the past review, organizing 80+ test paths into standardized FP32Kernels, IntKernels, and Others categories. Notably:
- ReduceMean receives broad coverage with 18 test variants across KeepDims and NoKeepDims modes
- All major operator categories represented (Activations, Conv2D, GEMM, MatMul, Norm, ReduceMean, etc.)
- Consistent blank-line grouping for readability
- Aligns with Xeratec's request for comprehensive, hierarchical test restructuring across all test categories
Verify that all 80+ hierarchical test paths exist in
DeeployTest/Tests/, particularly the extensive ReduceMean variants and FP32Kernels subcategories.
129-145: Generic-models test paths properly structured with ReduceMean coverage.The generic-models job includes comprehensive model testing (Autoencoder1D, CCT variants, miniMobileNet, TinyViT, WaveFormer) with dedicated TinyViT ReduceMean testing (line 113), aligning with PR objectives.
.gitignore (1)
5-54: Thoughtful reorganization of .gitignore with improved section clarity.The restructuring groups related patterns into labeled sections (Editor/OS, Python, Build, Node, Documentation, DeeployTest) for better maintainability. Key observations:
- CHANGELOG_GEN.md explicitly tracked (line 54), aligning with PR's changelog workflow updates
- Build artifacts (compile_commands.json, dist, build) properly exposed for IDE/CI integration
- Python caches correctly grouped under Python section
- DeeployTest patterns remain appropriately restrictive for generated artifacts
Confirm that CHANGELOG_GEN.md is a generated artifact from a changelog management tool used in the project (e.g.,
git-cliff, changelog automation) and should be tracked in version control..github/workflows/ci-platform-siracusa-tiled.yml (5)
105-156: Double-buffer L2 job tests-config mirrors singlebuffer variant with appropriately scaled L1 memory values.The tests-config for siracusa-kernels-tiled-doublebuffer-L2 (lines 105–156) maintains consistent test coverage and path naming, with L1 allocations appropriately increased relative to the singlebuffer variant (e.g., 50000 → 100000 for AllAxes ReduceMean tests). JSON syntax is valid.
160-207: Matrix-based model test organization with extensive TinyViT/ReduceMean coverage aligns with PR objectives.The matrix strategy for singlebuffer-L2 model tests introduces a clean path-based organization (Models/CCT, Models/microLlama, Models/miniMobileNet, Models/MLPerT, Models/TinyViT) and explicitly adds TinyViT/5M/Layers/FP32/ReduceMean testing (lines 191–192), directly supporting the PR's goal of improving ReduceMean test coverage.
213-250: Singlebuffer-L3 job maintains consistent test structure with L3 memory configuration.The singlebuffer-L3 matrix job (lines 213–250) preserves test coverage and path naming consistency, sets default-memory-level to L3, and includes the same TinyViT/ReduceMean test configuration as the L2 variant (lines 230–231).
256-298: Double-buffer L3 job extends coverage to double-buffer + L3 configuration.The doublebuffer-L3 matrix job (lines 256–298) follows the established organizational pattern, sets both double-buffer and L3 memory level configurations, and includes TinyViT/ReduceMean testing (lines 276–277) consistent with other variants.
38-96: Hierarchical test reorganization with comprehensive ReduceMean coverage is well-structured.The refactored tests-config for the singlebuffer-L2 job properly transitions from per-test names to hierarchical paths (FP32Kernels, IntKernels, Others), and the extensive ReduceMean test additions (lines 58–76) with both KeepDims and NoKeepDims variants address the PR's goal of improving ReduceMean operator testing. JSON syntax is valid throughout.
Test path verification (e.g.,
FP32Kernels/ReduceMean/KeepDims/Axes1_3,FP32Kernels/ReduceMean/NoKeepDims/AllAxes) requires access to the test suite definition to confirm paths are defined and resolvable at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Platform.py (1)
224-243: New ReduceMean optimization pass placement + minor pass-list cleanup opportunityAdding
RemoveOnlySingletonReduceMeanPass()immediately afterRemoveEmptyConvBiasPass()is a sensible place for a shape-only ReduceMean elimination: it keeps these “no-op on data” cleanups together and ensures such ReduceMean nodes are gone before PULPOpen-specific mapping/tiling.Two follow-ups to consider:
- Please confirm that this pass ordering matches your expectations w.r.t. quantization and ReduceMean lowering (i.e., graphs still contain the relevant
ReduceMeannodes at this stage so the pass can fire, and it does not interfere with later ReduceMean-specific tiling/parallelization logic).- The two consecutive
SkipUnityRequantPass(previous_op_regex = "Reshape|Transpose", num_inputs = 1)entries at lines 229–230 are identical; one of them is likely redundant and can be removed without behavior change, simplifying the optimizer configuration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/ci-platform-siracusa-tiled.yml(5 hunks).github/workflows/ci-platform-siracusa.yml(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/Platform.py (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
RemoveEmptyConvBiasPass(518-523)RemoveOnlySingletonReduceMeanPass(547-552)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPReduceMeanParser(467-487)
🔇 Additional comments (9)
.github/workflows/ci-platform-siracusa.yml (2)
38-111: Test structure is correctly organized and all referenced directories exist.The hierarchical test paths are properly formatted and the test coverage is well-structured across the four main categories: FP32Kernels, IntKernels, Models, and Others. The ReduceMean test coverage is particularly comprehensive, with both KeepDims/NoKeepDims variants across various axis configurations (Axis0, Axis2, AllAxes, Axes1_2_3, Axes1_3, Axes2_1) as well as compound tests, which aligns well with the PR's objective of improving ReduceMean operator support.
120-135: All referenced test directories exist and paths are correctly formatted.The siracusa-models job test-names are valid. All paths have been verified to exist in the test hierarchy: IntKernels/Attention, Models/CCT/FP32 subdirectories (CCT_1_16_16_8, CCT_2_32_32_128_Opset20), Models/miniMobileNet and miniMobileNetv2, Models/MLPerf subdirectories (KeywordSpotting, ImageClassification, AnomalyDetection), Models/TinyViT/Demo, and Others/SimpleRegression. The TinyViT ReduceMean test (Models/TinyViT/5M/Layers/FP32/ReduceMean) in the first job is also present.
Deeploy/Targets/PULPOpen/Platform.py (4)
8-9: Wiring inRemoveOnlySingletonReduceMeanPasslooks correctImporting
RemoveOnlySingletonReduceMeanPassalongsideRemoveEmptyConvBiasPasskeeps lowering passes centralized and consistent with how they’re used later in the optimizer. No issues here from a wiring perspective.
21-26: Cleanup of unusedReduceMeanParserimportDropping
ReduceMeanParserfrom the generic parsers import list is consistent with switching toPULPReduceMeanParserand avoids unused imports. Nothing else in this file appears to rely on the generic parser.
34-36: PULP-specific ReduceMean parser import is consistent with designImporting
PULPReduceMeanParserhere matches its usage inReduceMeanMapperand aligns with the pattern of using PULP-specific parsers for ops that need extra tiling metadata. The dependency direction looks clean and local to PULPOpen.
70-71: Check thatPULPReduceMeanParsersafely handles bothReduceMeanandIntegerMeanusagesSwitching
ReduceMeanMapperto usePULPReduceMeanParseris the right hook to feed per-dimension metadata intoPULPReduceMeanTilingReadyBindings. Note that the sameReduceMeanMapperis used for both'ReduceMean'and'IntegerMean'inPULPMapping(lines 124 and 127), soPULPReduceMeanParsermust correctly parse both cases (e.g.,operatorRepresentation['data_in']and'axes'need to exist for both).Please double-check that:
PULPReduceMeanParseris compatible with nodes created for'IntegerMean'in your IR, and- the additional
dim_in_*fields don’t break any existing downstream bindings that assume the older representation..github/workflows/ci-platform-siracusa-tiled.yml (3)
58-76: ReduceMean test coverage expansion aligns well with PR objectives.The workflow now includes 18 ReduceMean test cases across two categories (KeepDims and NoKeepDims), with 9 variants each. This expansion (Add_ReduceMean, AllAxes, Axes1_2_3, Axes1_3, Axes2_1, Axis0, Axis2, ReduceMean_Add, and composite tests) directly supports the PR objective of adding extensive ReduceMean test coverage to address Issue #134.
Also applies to: 123-141
166-198: All referenced model test paths exist and are correctly configured. Verification confirms:
- FP32-specific paths (
Models/CCT/FP32/,Models/TinyViT/5M/Layers/FP32/ReduceMean) correspond to actual test directories- All model names are consistently used across singlebuffer-L2, singlebuffer-L3, and doublebuffer-L3 jobs
- Directory structure properly reflects FP32 vs. non-FP32 model organization
40-96: Document the intentional L1 memory allocation differences between singlebuffer and doublebuffer configurations.The L1 allocations (max 220,000 bytes) fit comfortably within Siracusa's 256 KB L1 cache. However, the different L1 values between singlebuffer and doublebuffer test suites lack explanatory comments. For example:
- "FP32Kernels/Add/large" (220K) appears only in singlebuffer
- "FP32Kernels/Activations/ReLU" differs: 2000 (singlebuffer) vs 20 (doublebuffer)
- "FP32Kernels/ReduceMean/KeepDims/AllAxes" differs: 50000 vs 100000
Add comments explaining why these values differ (e.g., buffer strategy implications, kernel-specific tuning) to clarify the intentional configuration split.
|
Thanks for restructuring the other tests. I would suggest a few modifications, but I can make them, and you can review them once you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.github/workflows/ci-deeploy.yml(6 hunks).github/workflows/ci-platform-chimera.yml(1 hunks).github/workflows/ci-platform-cortexm.yml(2 hunks).github/workflows/ci-platform-generic.yml(2 hunks).github/workflows/ci-platform-mempool.yml(2 hunks).github/workflows/ci-platform-siracusa-neureka-tiled.yml(6 hunks).github/workflows/ci-platform-siracusa-tiled.yml(5 hunks).github/workflows/ci-platform-siracusa.yml(2 hunks).github/workflows/ci-platform-snitch-tiled.yml(1 hunks).github/workflows/ci-platform-snitch.yml(1 hunks).github/workflows/ci-platform-softhier.yml(1 hunks).github/workflows/infra-generate-ccache.yml(1 hunks)Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- .github/workflows/ci-platform-snitch.yml
- .github/workflows/ci-platform-cortexm.yml
- .github/workflows/ci-platform-siracusa-neureka-tiled.yml
- .github/workflows/ci-platform-snitch-tiled.yml
- Deeploy/Targets/PULPOpen/Templates/GEMMTemplate.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/infra-generate-ccache.yml.github/workflows/ci-deeploy.yml.github/workflows/ci-platform-siracusa.yml.github/workflows/ci-platform-mempool.yml.github/workflows/ci-platform-generic.yml
🔇 Additional comments (13)
.github/workflows/infra-generate-ccache.yml (1)
37-45: Update README.md to reference the new test path structure.The workflow file has been correctly updated with the new test directory paths. However, README.md still references the old
DeeployTest/Tests/Adderpath in the documentation. Update these references to point toDeeployTest/Tests/Kernels/Integer/Add/Regularto reflect the new hierarchical test structure..github/workflows/ci-platform-chimera.yml (1)
38-40: LGTM!The test name update to the hierarchical path-based structure is consistent with the broader test reorganization. Based on learnings, these paths correspond to directory names under
DeeployTest/Tests/..github/workflows/ci-platform-softhier.yml (1)
38-39: LGTM!Consistent with the hierarchical path-based test structure introduced across CI workflows.
.github/workflows/ci-platform-mempool.yml (1)
75-85: LGTM!The models section follows the hierarchical path structure consistently with clear grouping by model family.
.github/workflows/ci-deeploy.yml (2)
63-66: LGTM!The test paths are correctly updated to the new hierarchical structure with appropriate memory allocation settings.
129-136: LGTM!Good coverage with both passing and
--shouldFailtest cases for tiler extension validation..github/workflows/ci-platform-siracusa-tiled.yml (2)
58-76: Comprehensive ReduceMean test coverage added.The extensive ReduceMean tests covering both
KeepDimsandNoKeepDimsvariants across multiple axis configurations align well with the PR objective of improving ReduceMean operator support.
191-194: TinyViT ReduceMean layer test inclusion.Good addition of model-level ReduceMean testing via
Models/TinyViT/5M/Layers/FP32/ReduceMeanto validate the operator in a realistic context..github/workflows/ci-platform-siracusa.yml (3)
61-79: LGTM!Comprehensive ReduceMean test coverage with both
KeepDimsandNoKeepDimsvariants, covering various axis configurations including edge cases likeAllAxes, combined operations (Add_ReduceMean), and individual axis reductions.
102-102: Model-layer test placement note.
Models/TinyViT/5M/Layers/FP32/ReduceMeanis included in the kernels job rather than the models job. This appears intentional since it tests a specific layer/kernel rather than a full model, but verify this is the desired grouping.
121-135: LGTM!The models section is well-organized with clear groupings by model family (CCT, miniMobileNet, MLPerf, TinyViT, CNN_Linear).
.github/workflows/ci-platform-generic.yml (2)
39-146: Excellent restructuring of test organization!The hierarchical path-based structure significantly improves maintainability and clarity:
- Clear categorization into Kernels/FP32, Kernels/Integer, Kernels/Mixed, and Models
- Comprehensive ReduceMean test coverage (18 test variants) aligns perfectly with the PR objectives
- Consistent with the discussed structure from past reviews
This makes it much easier to understand test coverage at a glance and navigate the test suite.
Based on learnings, these paths correspond to directory structures under DeeployTest/Tests/.
119-121: Remove duplicate test entry.
Kernels/Integer/Conv/Regular_2D_RQappears twice (lines 119 and 121), causing unnecessary duplicate test execution.🔎 Proposed fix
Remove the duplicate entry on line 121:
Kernels/Integer/Conv/Regular_2D_RQ Kernels/Integer/Conv/DW_2D_RQ - Kernels/Integer/Conv/Regular_2D_RQ Kernels/Integer/MatMul/Regular_RQLikely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/ci-platform-generic.yml (1)
113-120: Test paths verified—organization is intentional.All test paths exist and the CI checks pass. The grouping of model tests with related kernel tests in the
generic-kernelsjob appears intentional: the quantization-aware kernel tests (*_RQvariants) are grouped withModels/Transformer_DeepQuantandModels/TinyViT/5M/Layers/FP32/ReduceMeanto test these operations in both isolation and model context. If consistency with job naming is a priority, consider moving the two model tests togeneric-models, but the current organization is functional and the grouping may be deliberate.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/ci-platform-generic.yml(2 hunks).github/workflows/ci-platform-siracusa.yml(2 hunks).github/workflows/ci-platform-snitch-tiled.yml(1 hunks).github/workflows/ci-platform-snitch.yml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/ci-platform-snitch.yml
- .github/workflows/ci-platform-snitch-tiled.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-generic.yml.github/workflows/ci-platform-siracusa.yml
🔇 Additional comments (3)
.github/workflows/ci-platform-siracusa.yml (2)
38-111: Test paths are valid and correctly organized.All 57 test paths referenced in the siracusa-kernels job exist as directories under
DeeployTest/Tests/. The reorganization from flat test names to hierarchical categories (Kernels/FP32, Kernels/Integer, Models, Others) is properly structured and will execute without path-related failures.
120-136: All model test paths are valid and exist in the repository.Verification confirms that all 10 test paths in the siracusa-models job configuration are correctly defined as directories in DeeployTest/Tests: Kernels/Integer/Attention, Models/CCT/FP32/CCT_1_16_16_8, Models/CCT/FP32/CCT_2_32_32_128_Opset20, Models/miniMobileNet, Models/miniMobileNetv2, Models/MLPerf/KeywordSpotting, Models/MLPerf/ImageClassification, Models/MLPerf/AnomalyDetection, Models/TinyViT/Demo, and Models/CNN_Linear2. No CI failures should result from missing test paths.
.github/workflows/ci-platform-generic.yml (1)
39-111: Excellent restructuring of kernel tests!The hierarchical organization into
Kernels/FP32/*andKernels/Integer/*is clear and consistent, exactly as requested by Xeratec. The extensive ReduceMean test coverage (18 test cases covering KeepDims/NoKeepDims variants with multiple axis combinations) directly addresses the PR's objective.Also, the duplicate
Kernels/Integer/Sliceentry from the previous review has been resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/ci-platform-siracusa-tiled.yml (1)
58-76: L1 memory configurations look reasonable.The varying L1 memory sizes (e.g., single values like
[8000]vs. multiple configurations like[5000,50000]) appear intentional for testing different tiling scenarios. Double-buffer tests appropriately use higher memory values. If these specific L1 values are critical for triggering specific tiling behaviors mentioned in Issue #134, consider adding comments documenting the reasoning.Also applies to: 123-141
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.github/workflows/ci-platform-mempool.yml.github/workflows/ci-platform-siracusa-neureka-tiled.yml.github/workflows/ci-platform-siracusa-tiled.yml.github/workflows/ci-platform-snitch-tiled.yml.github/workflows/ci-platform-snitch.ymlDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/activations.npzDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/inputs.npzDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/network.onnxDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerColumn/outputs.npzDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/inputs.npzDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/network.onnxDeeployTest/Tests/Kernels/Integer/GEMM/Regular_RQPerRow/outputs.npz
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/ci-platform-snitch.yml
- .github/workflows/ci-platform-mempool.yml
🔇 Additional comments (7)
.github/workflows/ci-platform-snitch-tiled.yml (1)
40-50: LGTM! Test configuration restructuring is correct.The updated hierarchical test paths follow the new "Kernels/{Integer|FP32}/" convention consistently, and the JSON syntax is now valid with proper comma separation and no trailing comma.
.github/workflows/ci-platform-siracusa-neureka-tiled.yml (3)
121-121: LGTM: neureka-wmem configuration properly added.The
neureka-wmemflag is correctly added to the relevant job configurations and properly passed through to the reusable workflow. The syntax is appropriate for both direct parameters (line 121) and matrix parameters (line 136).Also applies to: 136-136, 146-146
131-131: Add comment explaining why Transformer test is disabled for wmem configuration.The
Models/Transformertest is commented out only in thedoublebuffer-L3-wmemjob (line 131), while remaining active in the standard doublebuffer configuration (line 93) and singlebuffer configuration (line 72). Without an explanatory comment, it's unclear whether this is a known incompatibility that needs resolution or a temporary exclusion. If intentional, add a comment documenting the reason.
40-43: Hierarchical test path restructuring approved.The migration from granular test names to hierarchical path-based identifiers (e.g.,
Kernels/Integer/GEMM/Regular_RQPerColumn,Models/miniMobileNet) improves organization and aligns with restructuring the Tests subdirectory. All referenced paths exist in the test directory structure, and changes are consistently applied across all job configurations.Also applies to: 55-58, 70-73, 91-93, 115-118, 129-132
.github/workflows/ci-platform-siracusa-tiled.yml (3)
160-198: Model test organization is consistent and comprehensive.The hierarchical reorganization extends cleanly to model tests (
Models/CCT/FP32/*,Models/TinyViT/*, etc.), and the inclusion of TinyViT ReduceMean layer tests across different memory configurations ensures thorough validation of the ReduceMean improvements.Also applies to: 207-246, 248-294
26-294: Well-structured workflow with clear job organization.The separation between kernel tests (using sequential runner) and model tests (using matrix strategy) is appropriate. The four job configurations (single/double-buffer × L2/L3) provide comprehensive coverage of different memory and buffering scenarios for the ReduceMean improvements.
38-96: LGTM: Excellent test coverage and improved organization.The migration to hierarchical test paths (e.g.,
Kernels/FP32/ReduceMean/KeepDims/Axis0) significantly improves maintainability. The extensive ReduceMean test coverage with multiple axis combinations, both KeepDims/NoKeepDims variants, and integration with other operators aligns well with the PR objectives.
Xeratec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I organized the tests a bit differently. For example, the RQ kernels are actually integer kernels. Otherwise, everything stayed the same. @diaconuccalin, if it is also ok for you, I would merge this PR.
Hey @Xeratec, I will look into the changes today or tomorrow and let you know if everything seems fine. |
diaconuccalin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the 2 small mentions, LGTM! If the test was removed on purpose, then imo it's ok to merge without solving the small duplicate issue.
This PR fixes the tiling implementation for the PULPOpen FP32 ReduceMean operator (Issue #134) and adds parallelization support. It also extends ReduceMean to support ONNX opset >= 18, reorganizes the FP32 TinyViT kernel test suite, and adds extensive ReduceMean test coverage.
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.