You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-33Lines changed: 13 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,56 +91,36 @@ flowchart TB
91
91
emb --> L1 --> L2 --> LN --> head
92
92
```
93
93
94
-
### 2. Conv1D UNet Transformer
95
-
A U-Net style architecture that uses Conv1D for downsampling and ConvTranspose1D for upsampling. It progressively reduces sequence length while increasing hidden dimension, allowing the model to process information at hierarchically different resolutions.
94
+
### 2. Transformer UNet
95
+
A U-Net architecture that uses skip connections between encoder and decoder layers, but maintains the same sequence length and hidden dimension throughout (no downsampling). This allows the model to mix features from early and late layers.
96
96
97
97
```mermaid
98
98
flowchart TB
99
99
subgraph Input
100
-
emb[Embedding Layer<br/>seq_len x hidden_size]
100
+
emb[Embedding Layer]
101
101
end
102
-
102
+
103
103
subgraph Encoder[Encoder Path]
104
-
e0[TransformerBlock 0<br/>FULL RESOLUTION<br/>1024 x 768]
105
-
down1[Conv1D Downsample]
106
-
e1[TransformerBlock 1<br/>512 x 832]
107
-
down2[Conv1D Downsample]
108
-
e2[TransformerBlock 2<br/>256 x 896]
109
-
down3[...]
110
-
eN[MLP Block if seq=1]
104
+
e1[TransformerBlock 1]
105
+
e2[TransformerBlock 2]
111
106
end
112
-
107
+
113
108
subgraph Decoder[Decoder Path]
114
-
dN[MLP Block if seq=1]
115
-
up1[ConvTranspose1D Upsample]
116
-
d2[TransformerBlock + Skip<br/>256 x 896]
117
-
up2[ConvTranspose1D Upsample]
118
-
d1[TransformerBlock + Skip<br/>512 x 832]
119
-
up3[ConvTranspose1D Upsample]
120
-
d0[TransformerBlock N<br/>FULL RESOLUTION<br/>1024 x 768]
An optimized U-Net architecture designed for speed. It uses "Patch Merging" (concatenating adjacent tokens) for downsampling instead of convolutions, which is faster and cleaner. It operates on batched inputs `(B, L)` and efficiently handles document boundaries and padding without complex dynamic shape logic.
123
+
An optimized U-Net architecture designed for speed. It uses "Patch Merging" (concatenating adjacent tokens) for downsampling, which is faster and cleaner than convolutions. It operates on batched inputs `(B, L)` and efficiently handles document boundaries and padding without complex dynamic shape logic.
0 commit comments