I fully thank the organizers for addressing our comments, but I still have a lot of confusion.
As an extension of #15 #37 .
I would like to ask a few more questions regarding Implicit Reuse.
Does 'implicit reuse' only trigger between adjacent executions, or can I arbitrarily retain data in the fast memory to achieve better implicit reuse?
Does the latency calculation assume an optimal memory scheduling strategy by default?
Specifically, does it use Bélády's optimal algorithm?
To achieve this strategy, when freeing up space, must we evict an entire data block at once, or can we evict partial slices?
Additionally, if a MatMul operation takes the same Tensor as both of its inputs, the fast memory needs to load both a horizontal strip and a vertical strip of that Tensor. Can we assume that the overlapping intersection of these strips only consumes a single copy of capacity within the fast memory limit?
Finally, if our calculated subgraph_latencies are incorrect, will our submission receive a score of zero?
I fully thank the organizers for addressing our comments, but I still have a lot of confusion.
As an extension of #15 #37 .
I would like to ask a few more questions regarding Implicit Reuse.
Does 'implicit reuse' only trigger between adjacent executions, or can I arbitrarily retain data in the fast memory to achieve better implicit reuse?
Does the latency calculation assume an optimal memory scheduling strategy by default?
Specifically, does it use Bélády's optimal algorithm?
To achieve this strategy, when freeing up space, must we evict an entire data block at once, or can we evict partial slices?
Additionally, if a MatMul operation takes the same Tensor as both of its inputs, the fast memory needs to load both a horizontal strip and a vertical strip of that Tensor. Can we assume that the overlapping intersection of these strips only consumes a single copy of capacity within the fast memory limit?
Finally, if our calculated
subgraph_latenciesare incorrect, will our submission receive a score of zero?