danisereb

Follow

danisereb

Follow

Deep Learning Inference @NVIDIA

2 followers · 1 following

NVIDIA

Achievements

Achievements

Popular repositories Loading

vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Python
Model-Optimizer Model-Optimizer Public

Forked from NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python