Mog9

mohit Mog9

20, ml systems

Achievements

gpt2-inference gpt2-inference Public

A GPT-2 inference engine written from scratch in CUDA and C++. Implements custom CUDA kernels for tiled matrix multiplication, LayerNorm, fused attention, transformer blocks, KV cache management, a…

Cuda 39 1
Memory-Allocator Memory-Allocator Public

Custom memory allocator in C++ built from scratch using mmap. Allocates a 1MB memory pool upfront and carves blocks from it to keep all allocations contiguous. Implements malloc, free, block reuse …

C++ 36 2
tri-sds tri-sds Public

Triton-based EAGLE speculative decoding engine for Qwen3-4B to Qwen3-32B on AMD MI300X. Matches SGLang's acceptance speedup ratios (1.56–2.49×) with fully custom Triton kernels (prefill, GQA decode…

Python 4
Adaptive-ViT Adaptive-ViT Public

An adaptive Vision Transformer inference system that avoids unnecessary high-resolution computation, achieving ~3× faster inference than static high-res ViT by selectively escalating only when needed.

Python 5
KV-Compression KV-Compression Public

Implementing and benchmarking KV cache compression methods for LLM inference in Triton. Featuring optimized kernels for KIVI and TurboQuant

Python 5 1
research-papers research-papers Public

Research implementations focused on inference efficiency and model optimization. Includes custom Triton kernels, LoRA, knowledge distillation pipelines, and more.

Python 12 2