Questions about The Implicit Reuse

I fully thank the organizers for addressing our comments, but I still have a lot of confusion.

As an extension of #15  #37 .

I would like to ask a few more questions regarding **Implicit Reuse**. 

Does 'implicit reuse' only trigger between adjacent executions, or can I **arbitrarily** retain data in the fast memory to achieve **better implicit reuse**?

Does the latency calculation assume an **optimal** memory scheduling strategy by default?

Specifically, does it use **Bélády's optimal algorithm**? 

To achieve this strategy, when freeing up space, must we evict **an entire data block** at once, or can we evict **partial slices**? 

Additionally, if a MatMul operation takes the same Tensor as both of its inputs, the fast memory needs to load **both a horizontal strip and a vertical strip** of that Tensor. Can we assume that the overlapping intersection of these strips only consumes **a single copy of capacity** within the fast memory limit?

 Finally, if our calculated `subgraph_latencies` are incorrect, will our submission receive a score of zero?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about The Implicit Reuse #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about The Implicit Reuse #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions