40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
-
Updated
Nov 14, 2025 - Python
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
Comprehensive guide for tuning Linux network stack buffers (socket, TCP, qdisc, NIC rings) on RHEL/OEL 8. Includes detailed documentation, RTT-based buffer calculations, tuning profiles for low-latency and high-throughput scenarios, and production-ready shell scripts for validation and monitoring.
XRDrone streams live drone video to VR, runs real-time object detection, and overlays visual effects in 3D.
This repo focuses on latency-aware resource optimization for Kubernetes
A distributed Java system that dynamically allocates computational tasks based on real-time latency and client performance, using AVL-based scheduling, RSA-secured communication, and asynchronous task execution to boost efficiency in mid-scale clusters.
Request hedging for tail latency reduction in distributed systems
Add a description, image, and links to the latency-optimization topic page so that developers can more easily learn about it.
To associate your repository with the latency-optimization topic, visit your repo's landing page and select "manage topics."