Commit 8d6c90b
committed
feat(compute): add AllocDeviceFloat32 and CopyToDevice to FusedEncoderProvider
Enable callers to allocate persistent GPU buffers and upload weight data
for the fused encoder kernel. Without this, CPU-backed weight tensors
have no device pointer and the fused path always falls back to per-op.
- AllocDeviceFloat32: pool-managed GPU allocation
- CopyToDevice: host-to-device memcpy for float32 arrays1 parent 716bbd6 commit 8d6c90b
2 files changed
+20
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
20 | 27 | | |
21 | 28 | | |
22 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
71 | 84 | | |
72 | 85 | | |
0 commit comments