feat/fix(kernel): integrate multi kernel and fix some bugs by flashzxi · Pull Request #19 · bitzyz/InfiniTensor_v2.0

flashzxi · 2026-03-15T06:48:21Z

算子添加

[算子添加-clip] 支持clip(torch.slamp), 不支持min max广播（infiniteCore限制）
[算子添加-conv] 支持Conv (torch.conv1d, torch.conv2d, torch.conv3d)
[算子添加-LayerNorm] 支持LayerNorm 输入x的rank至少为3(infiniCore限制)
[算子添加-LogSoftmax] 支持 LogSoftmax, 不支持dim参数
[算子添加-Softmax] 支持 Softmax, 仅gpu
[算子添加-LpNorm] 支持 LpNorm, 仅gpu
[算子添加-RmsNorm] 支持 RmsNorm, cpu and gpu
[算子添加-Unary] 支持 relu、sigmoid、silu、gelu、softplus、tanh

bug修复

发现多个影响测试的bug，已经修复，细节见Bugs.md

另外InfiniCore的问题没有修复
比如:
logsoftmax kernel申请了太多线程导致无法启动。故类似测试中没有比较计算结果是否正确

可以看到打印出支持的最大线程是256，但是申请的线程数是BLOCK_SIZE = 1024，kernel启动失败

cuda 测试通过截图

test

test-front

* [算子添加-clip] 支持clip(torch.slamp), 不支持min max广播（infiniteCore限制） * [算子添加-conv] 支持Conv (torch.conv1d, torch.conv2d, torch.conv3d) * [算子添加-LayerNorm] 支持LayerNorm 输入x的rank至少为3(infiniCore限制) * [算子添加-LogSoftmax] 支持 LogSoftmax, 不支持dim参数 * 大量原生bug修复 * [算子添加-Softmax] 支持 Softmax, 仅gpu * [算子添加-LpNorm] 支持 LpNorm, 仅gpu * [算子添加-RmsNorm] 支持 RmsNorm, cpu and gpu * [算子添加-Unary] 支持 relu、sigmoid、silu、gelu、softplus、tanh * cpu gpu 测试通过 * format all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/fix(kernel): integrate multi kernel and fix some bugs#19

feat/fix(kernel): integrate multi kernel and fix some bugs#19
flashzxi wants to merge 1 commit into
bitzyz:mainfrom
flashzxi:main

flashzxi commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

flashzxi commented Mar 15, 2026

算子添加

bug修复

cuda 测试通过截图

test

test-front

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant