[ICLR 2026] The official repo of "MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs"
-
Updated
Mar 11, 2026 - Python
[ICLR 2026] The official repo of "MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs"
Code for Prefilled responses enhance zero-shot detection of AI-generated images
[AAAIW 2026] Implementation for ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics
An implementation of FastVLM/LLaVA or any llm/vlm model using FastAPI (backend) and react js (backend) + Action/Caption mode and frame control
A curated, builder-first list of Vision Language Models (VLMs), local runtimes, document AI tools, UI agents, robotics vision stacks, datasets, benchmarks, and production resources.
This project demonstrates parameter-efficient fine-tuning of large Vision-Language Models (VLMs), specifically Qwen2-VL-7B-Instruct, using LoRA (Low-Rank Adaptation) and 4-bit quantization.
Add a description, image, and links to the vlm-inference topic page so that developers can more easily learn about it.
To associate your repository with the vlm-inference topic, visit your repo's landing page and select "manage topics."