[Buffer Placement] Constraining Channels to Contain 0 Buffer Slots

I think that the buffering constraints do not behave as expected on reconvergent paths when some edges are constrained to contain zero buffers. Let's have a look at the following example.

<img width="462" height="663" alt="Image" src="https://github.com/user-attachments/assets/0b6788bf-8080-489b-9b5b-eceed0cee2e4" />

We have three primitives, a, b, and c, and b is a primitive with 1-cycle latency while the others are combinational. The dotted edge represents the virtual edge inserted to account for the latency of b. 

In general, the throughput of a channel c connecting primitives u -> v and assuming c is not a back edge, must respect the following constraints:

```math
\begin{align*}
  \Theta_c &= r_v - r_u \\
  \Theta_c &\leq \# \text{buffer slots on channel c}
\end{align*}
```

Therefore, if we set the slot constraint to 0 on a channel, we prevent retiming in a channel by setting the source and destination retiming variables to the same value. On the example above, if we want to constrain the edge with associated throughput $\Theta_0$ to not contain any buffer, we are essentially setting $r(a) == r(c)$. We can then write the throughput constraints for the three channels on the path $a \rightarrow b_{in} \rightarrow b_{out} \rightarrow c$:

```math
\begin{align*}
  \Theta_1 &= r_{b_{in}} - r_a \\
  \Theta_2 &= r_{b_{out}} - r_{b_{in}} \\
  \Theta_3 &= r_c - r_{b_{out}}
\end{align*}
```

And since all throughput variables are positive (and again we do not have a back edge), we get:

```math
r_c \geq r_{b_{out}} \geq r_{b_{in}} \geq r_a
```

However, since $r(a) == r(c)$, the only solution to the equations above is to set all retiming variables to the same value, thereby preventing retiming across the pipelined unit. However, the pipelined unit must satisfy:

```math
\begin{align*}
\Theta_2 &= r_{b_{out}} - r_{b_{in}} \\
\Theta_{CFDFC}  \times L_b &\leq \Theta_2
\end{align*}
```

And therefore, the maximum throughput of the ILP becomes 0 in that case. As a result, presolve would optimize away all throughput equations, and I suppose the optimizer would pick the initial buffer allocation, placing buffers on back edges. As a result, the ILP could fail to optimize circuit altogether. For example, we can have a look at the following example with two re-convergent paths. 

<img width="681" height="1112" alt="Image" src="https://github.com/user-attachments/assets/bb45ab85-e875-4134-b7af-6e91a01f7b7f" />

The top part of the graph is identical to the example at the start of this issue. The bottom part contains a primitive e with a latency of 4 and d is combinational. In the optimal case, the ILP would place (transparent) buffers on both edges $a \rightarrow c$ and $c \rightarrow d$, mirroring the latencies of the pipelined units, and achieve an II of 1. If we set a 0 slot constraint on the edge from $a \rightarrow c$, the optimal II is 2 since the output of a needs to stay valid for 2 cycles. However, due to the 0 slot constraint on $a \rightarrow c$, the ILP would not place any buffers and achieve an II of 5: no (transparent) buffers on the second part of the graph would imply that c needs to wait for 4 cycles before d can consume it.

I created a simple toy example to reproduce the issue on the branch _custom_constraints_debug_. The example is in [custom_constraints](https://github.com/EPFL-LAP/dynamatic/blob/custom_constraints_debug/integration-test/custom_constraints/) and contains a simple for loop with a few multipliers inside. You can run it with the regular commands:

```
./bin/dynamatic
set-src integration-test/custom_constraints/custom_constraints.c
compile --buffer-algorithm fpl22
write-hdl --hdl vhdl
cat integration-test/custom_constraints/out/sim/report.txt | grep "Simulation done! Latency"
cat integration-test/custom_constraints/out/comp/buffer-placement/custom_constraints/placement.log | grep "Throughput of CFDFC"
```

a) Without constraint, the program latency is **2032** cycles and the ILP achieves a CFDFC throughput of **1/2**.

b) With constraints on the path from the fork to the second multiplier (see bufProps in [handshake_transformed_annos](https://github.com/EPFL-LAP/dynamatic/blob/custom_constraints_debug/integration-test/custom_constraints/handshake_transformed_annos.mlir)), the latency becomes **25009** cycles, and the CFDFC throughput is now **0**.

c) However, it is easy to extend the above solution with one forced buffer (see bufProps in [handshake_transformed_annos_correct](https://github.com/EPFL-LAP/dynamatic/blob/custom_constraints_debug/integration-test/custom_constraints/handshake_transformed_annos_correct.mlir)) and get a program latency of **13512** cycles with a CFDFC throughput still of **0**. I tried to come up with something closer to the performance of (a), but I did not manage to place a TEHB of depth 12, only an OEHB of depth 1 and a TEHB of depth 11, unfortunately...  

To try the different buffer constraints, you can comment/uncomment selectively the cp lines in [compile.sh:263](https://github.com/EPFL-LAP/dynamatic/blob/custom_constraints_debug/tools/dynamatic/scripts/compile.sh#L263).

Could we rewrite the buffer constraints so we constrain the buffer placement boolean variables (both for the OEHBs and for the TEHBs so the [signal variables](https://github.com/EPFL-LAP/dynamatic/blob/custom_constraints_debug/include/dynamatic/Transforms/BufferPlacement/BufferPlacementMILP.h#L65)) to 0 and not the number of slots? In that case, the retiming variables would no longer be affected by the 0-slot constraints, since in the optimal case, no retiming should be placed on channels with no buffers, since that would only decrease the amount of retiming available for other channels. Although maybe this has some other effect which also makes it incorrect...  

Let me know what you think,
Thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Buffer Placement] Constraining Channels to Contain 0 Buffer Slots #660

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Buffer Placement] Constraining Channels to Contain 0 Buffer Slots #660

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions