Skip to content

Questions about The Implicit Reuse #59

@gychen233

Description

@gychen233

I fully thank the organizers for addressing our comments, but I still have a lot of confusion.

As an extension of #15 #37 .

I would like to ask a few more questions regarding Implicit Reuse.

Does 'implicit reuse' only trigger between adjacent executions, or can I arbitrarily retain data in the fast memory to achieve better implicit reuse?

Does the latency calculation assume an optimal memory scheduling strategy by default?

Specifically, does it use Bélády's optimal algorithm?

To achieve this strategy, when freeing up space, must we evict an entire data block at once, or can we evict partial slices?

Additionally, if a MatMul operation takes the same Tensor as both of its inputs, the fast memory needs to load both a horizontal strip and a vertical strip of that Tensor. Can we assume that the overlapping intersection of these strips only consumes a single copy of capacity within the fast memory limit?

Finally, if our calculated subgraph_latencies are incorrect, will our submission receive a score of zero?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions