Releases: elixir-image/image_vision
Image Vision version 0.2.0
[0.2.0] 2026-05-02
Added
-
Image.FaceDetection— fast face detection with bounding boxes, confidence scores, and the five canonical facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner). Default model is YuNet 2023-March hosted atopencv/face_detection_yunet— MIT licensed, ~340 KB on disk, real-time on CPU. Functions:detect/2,boxes/2,crop_largest/2,draw_boxes/3. Thecrop_largest/2helper is the wire-in point for face-aware crop bias used by siblingimage_plug(gravity: :face, ImageKitz-, Cloudflareface-zoom). -
Image.Background— class-agnostic foreground/background separation.remove/2returns the input image with the background made transparent (alpha mask applied);mask/2returns the foreground mask alone for custom compositing. Default model is BiRefNet lite (MIT, ~210 MB), powered by Ortex. -
Image.Captioning— natural-language description of an image.caption/2returns a string like"a man riding a horse with a bird of prey". Default model is BLIP base (BSD-3-Clause, ~990 MB), powered by Bumblebee. Heavy enough that it is not autostarted by default; configureautostart: trueor add the child spec to your supervisor. -
Image.ZeroShot— classify an image against arbitrary labels you supply at call time, no retraining.classify/3returns[%{label, score}]sorted descending;label/3returns just the best label;similarity/3computes CLIP-space cosine similarity between two images. Default model is OpenAI CLIP ViT-B/32 (MIT, ~600 MB), powered by Bumblebee. Default prompt template"a photo of {label}"boosts accuracy on bare-noun labels; override or disable as needed. -
New flags
--background,--caption, and--zero-shotformix image_vision.download_modelsto pre-fetch the new defaults.
Changed
- The
:fileslist inmix.exsnow shipslogo.jpgso the docs render the project logo on hexdocs.pm.
See the README for the full feature list and the background, captioning, and zero-shot guides
Image Vision version 0.1.0
ImageVision v0.1.0
image_vision is a thin, opinionated wrapper around the Elixir ML ecosystem (Bumblebee, Ortex, Nx) that sits next to the image library. It exposes three vision tasks through a small API designed for developers who are not ML experts: pass a t:Vix.Vips.Image.t/0 in, get useful results out. Strong, permissively-licensed defaults handle model selection, backend configuration, and weight downloads automatically.
Highlights
-
Image classification via
Image.Classification.classify/2andImage.Classification.labels/2— returns ImageNet-1k labels with confidence scores. Default model isfacebook/convnext-tiny-224(Apache 2.0, ~110 MB), powered by Bumblebee. -
Image embeddings via
Image.Classification.embed/2— returns a 768-dim feature vector suitable for similarity search, clustering, or as input to a downstream classifier. Default model isfacebook/dinov2-base(Apache 2.0, ~340 MB). -
Object detection via
Image.Detection.detect/2— returns bounding boxes with class labels and scores across the 80 COCO classes. Default model isonnx-community/rtdetr_r50vd(Apache 2.0, ~175 MB), an NMS-free real-time transformer detector that beats YOLOv8 on COCO without YOLO's AGPL licensing constraints. -
Promptable segmentation via
Image.Segmentation.segment/2— point, box, or multi-point prompts produce precise pixel masks via SAM 2. Default model isSharpAI/sam2-hiera-tiny-onnx(Apache 2.0, ~150 MB encoder + decoder). -
Panoptic segmentation via
Image.Segmentation.segment_panoptic/2— every region in the image gets a class label across 133 COCO panoptic categories (things and stuff). Default model isXenova/detr-resnet-50-panoptic(Apache 2.0, ~175 MB). Includes a baked-in canonical COCO panoptic id→label map so common stuff classes resolve correctly even on repos with incompleteconfig.jsonentries. -
Result composition helpers that return
t:Vix.Vips.Image.t/0directly:Image.Detection.draw_bbox_with_labels/3(configurable opacity, stroke width, font size, palette),Image.Segmentation.compose_overlay/3(colour-coded overlay of all panoptic segments), andImage.Segmentation.apply_mask/2(mask as alpha channel for cutouts). -
Automatic model weight management via
ImageVision.ModelCache— ONNX weights download from HuggingFace on first call and cache on disk. Cache directory is configurable viaconfig :image_vision, :cache_dir, ...; defaults to an XDG-compliant per-user cache. Bumblebee weights use Bumblebee's own HF cache. -
mix image_vision.download_modelstask pre-fetches every default model so first-call latency is eliminated and the library can run offline. Pass--classify,--detect, or--segmentto limit scope. Honours user overrides for the Bumblebee classifier and embedder. -
Optional ML dependencies —
:bumblebee,:nx, and:ortexare alloptional: trueinmix.exs. The library compiles cleanly without them; each task module is compile-time gated on its underlying runtime so you only pay for what you use. -
Strong, opinionated defaults chosen for permissive licensing (Apache 2.0 / MIT only — no AGPL/GPL, no non-commercial), reasonable size (<500 MB), broad applicability, and proven quality. Power users can override every default through options or app config.
See the README for installation, prerequisites (toolchain, disk space, Livebook Desktop), and quick-start examples. The classification, detection, and segmentation guides cover each task in depth, including how to swap in alternative models.