sm121

Star

Here are 8 public repositories matching this topic...

albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4

Star

Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)

cuda lossless mtp speedup performance-optimization vllm autoround dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b

Updated May 9, 2026
Python

Sggin1 / DGX-SPARK

Star

DGX Spark research and tests - containers, benchmarks, and investigation notes for running models on GB10 (SM 12.1)

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated May 25, 2026
Python

albond / DGX_Spark_Unsloth_Lossless_Speedup

Star

7.67× LoRA / 8.35× Full FT speedup for Qwen3.5 (0.8B–27B) on NVIDIA DGX Spark — wall-clock parity with rented H100. Lossless within BF16. Three-command interactive wizard handles model picker, data validator, training, and merge.

cuda transformers pytorch nvidia triton lora fine-tuning peft multimodal blackwell qwen unsloth gb10 dgx-spark qwen3-5 sm121

Updated May 19, 2026
Python

Logos-Flux / optimized-CUDA-GB10

Star

Optimized CUDA kernels for NVIDIA GB10 Blackwell (sm_121, DGX Spark). RMSNorm + GELU. First sm_121 kernel on HuggingFace Kernel Hub.

gpu cuda pytorch nvidia kernels gelu huggingface blackwell rmsnorm gb10 dgx-spark sm121

Updated May 3, 2026
Cuda

idonati / spark-vllm-docker-festr2

Star

Patches + recipe to deploy festr2/MiMo-V2.5-Pro-NVFP4-MXFP8-attn-TP8 on 8-node DGX Spark sm_121 (Ray + vLLM, TP=8). Fixes the fused-qkv loader bug that mis-slotted Q values as K/V on 7 of 8 ranks.

moe ray quantization mimo huggingface vllm gb10 nvfp4 dgx-spark mxfp8 sm121 tensor-parallel

Updated May 19, 2026
Python

leap21ai / autospark

Star

DGX Spark (GB10/SM121) platform support for Meta's KernelAgent — auto-detect, hardware constraints, safe Triton configs

cuda nvidia triton gpu-optimization gb10 dgx-spark sm121 kernel-agent

Updated Mar 14, 2026
Python

parallelArchitect / gb10-kernel-probe

Star

Empirical kernel scheduling characterization for NVIDIA GB10 (SM121a). Sweeps GEMM tile configurations, classifies PTX instruction paths, captures hardware telemetry

benchmark gpu cuda nvidia empirical performance-analysis profiling cutlass gemm ptx black-box-testing unified-memory kernel-scheduling nvidia-tools gb10 dgx-spark sm121

Updated May 10, 2026
C++

ogulcanaydogan / dgx-spark-llm-stack

Star

Pre-built PyTorch wheels and build scripts for NVIDIA DGX Spark (GB10, sm_121, Blackwell, CUDA 13.0, ARM64)

machine-learning deep-learning gpu cuda inference pytorch nvidia arm64 aarch64 fine-tuning blackwell llm gb10 dgx-spark grace-blackwell sm121 cuda-13 pre-built-wheels

Updated May 28, 2026
Shell

Improve this page

Add a description, image, and links to the sm121 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sm121 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sm121

Here are 8 public repositories matching this topic...

albond / DGX_Spark_Qwen3.5-122B-A10B-AR-INT4

Sggin1 / DGX-SPARK

albond / DGX_Spark_Unsloth_Lossless_Speedup

Logos-Flux / optimized-CUDA-GB10

idonati / spark-vllm-docker-festr2

leap21ai / autospark

parallelArchitect / gb10-kernel-probe

ogulcanaydogan / dgx-spark-llm-stack

Improve this page

Add this topic to your repo