Skip to content
#

tensor-cores

Here are 24 public repositories matching this topic...

jetson-orin-matmul-analysis

CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.

  • Updated Apr 2, 2026
  • Python

🎓 CUDA HPC Kernel Optimization Lab: Progressive GEMM, FlashAttention, Tensor Core & CUDA 13 Features | 从朴素到 Tensor Core 的 CUDA 高性能算子优化实验室

  • Updated Apr 22, 2026
  • Cuda

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

  • Updated Sep 12, 2025
  • Cuda

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

  • Updated Apr 22, 2026
  • Python

CUDA GEMM Optimization Learning Project: 7-Level Progressive Optimization from Naive to ~89% cuBLAS Performance | CUDA GEMM 渐进式优化学习项目:7级优化从基础到~89% cuBLAS性能

  • Updated Apr 22, 2026
  • C++

Improve this page

Add a description, image, and links to the tensor-cores topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-cores topic, visit your repo's landing page and select "manage topics."

Learn more