Skip to content

[Feature] Tensor::view overload that reduces rank by squeezing dropped axes #785

@Hzfengsy

Description

@Hzfengsy

Summary

Add a Tensor::view overload that produces a lower-rank view by squeezing
out size-1 axes (the result of a rank-reducing slice). The current
view(shapes[], offsets[]) inherits ndims from the parent, so it cannot
express "slice + drop axis" in a single step, and view(...).reshape(...)
does not compose for sliced views.

Motivation / Use Case

PyPTO RFC #1338 / PR #1343 added a drop_dims operand to tensor.slice so
that numpy-style indexing (C[i], C[i, j], C[i, j, :, :], ...) can
produce a lower-rank Tensor. The orchestration codegen
(src/codegen/tensor_op_codegen.cpp, REGISTER_ORCHESTRATION_OP(tensor_slice))
needs to emit a runtime Tensor at the reduced rank so that downstream
kernel-call bindings see the correct ndims. Without this, a kernel that
takes a sub-tensor via numpy-style indexing generates wrong-rank code. See
PyPTO follow-up issue: hw-native-sys/pypto#1349.

The composition view(...).reshape(...) does not work because:

  1. view(view_shapes[], view_offsets[]) (tensor.h:331) inherits
    ndims = other.ndims (init_with_view at tensor.h:233). The view
    stays at the parent's rank with size-1 entries in dropped positions.
  2. reshape(new_shapes[], new_ndims) (tensor.h:358) hard-asserts
    is_contiguous() (tensor.h:360). A rank-reducing slice produces
    shapes[i]=1, raw_shapes[i]=B for any non-leading dropped axis, which
    fails is_contiguous() (tensor.h:341 only allows divergence in dim 0
    via is_raw_eq_shapes).
  3. Even when the contiguity check passes, reshape clobbers the per-dim
    offsets that view just installed: result.is_all_offset_zero = true; result.is_raw_eq_shapes = true; (tensor.h:364-365). The data offset
    from the slice is lost.

Proposed API / Behavior

Add an overload (under runtime/src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/runtime/tensor.h):

```cpp
// Slice-and-squeeze: produce a view at a lower rank by collapsing size-1
// axes listed in drop_dims. view_shapes / view_offsets are at the
// parent's rank; view_shapes[d] for d in drop_dims must equal 1.
Tensor view(
const uint32_t view_shapes[],
const uint32_t view_offsets[],
const uint32_t drop_dims[],
uint32_t num_drop_dims,
bool manual_dep = false) const;
```

Semantics:

  • The result's ndims is parent.ndims - num_drop_dims.
  • The result's shapes[] and raw_shapes[] are the parent's
    view_shapes[] and parent_raw[] with the entries at indices in
    drop_dims removed.
  • The element offset contributed by the dropped axes
    (sum(view_offsets[d] * stride(d)) for d in drop_dims) is folded into
    start_offset (or equivalently into offsets[] of the surviving
    leading axis), so element addressing in the lower-rank view points at
    the same memory as the parent slice.
  • Surviving axes keep their view_offsets[] as-is.

Constraint: every d in drop_dims must have view_shapes[d] == 1.
This matches what a rank-reducing slice produces and lets us collapse the
dimension without needing a real reshape (no contiguity requirement).

Alternatives Considered

Additional Context

  • Same change is needed in both src/a2a3/.../tensor.h and
    src/a5/.../tensor.h so the orchestration codegen path is uniform
    across architectures.
  • PyPTO codegen call site that will adopt the new API:
    src/codegen/tensor_op_codegen.cpp, REGISTER_ORCHESTRATION_OP(tensor_slice)
    (around line 209). It already constructs _shapes and _offsets arrays;
    it would emit an additional _drop_dims array and call the new overload.
  • The tile.slice rank-reducing path in PyPTO is purely codegen-side and
    does not need runtime support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions