Releases · elixir-image/image_vision

03 May 01:25

kipcole9

v0.2.0

7f12d1a

Image Vision version 0.2.0 Latest

Latest

[0.2.0] 2026-05-02

Added

Image.FaceDetection — fast face detection with bounding boxes, confidence scores, and the five canonical facial landmarks (right eye, left eye, nose tip, right mouth corner, left mouth corner). Default model is YuNet 2023-March hosted at opencv/face_detection_yunet — MIT licensed, ~340 KB on disk, real-time on CPU. Functions: detect/2, boxes/2, crop_largest/2, draw_boxes/3. The crop_largest/2 helper is the wire-in point for face-aware crop bias used by sibling image_plug (gravity: :face, ImageKit z-, Cloudflare face-zoom).
Image.Background — class-agnostic foreground/background separation. remove/2 returns the input image with the background made transparent (alpha mask applied); mask/2 returns the foreground mask alone for custom compositing. Default model is BiRefNet lite (MIT, ~210 MB), powered by Ortex.
Image.Captioning — natural-language description of an image. caption/2 returns a string like "a man riding a horse with a bird of prey". Default model is BLIP base (BSD-3-Clause, ~990 MB), powered by Bumblebee. Heavy enough that it is not autostarted by default; configure autostart: true or add the child spec to your supervisor.
Image.ZeroShot — classify an image against arbitrary labels you supply at call time, no retraining. classify/3 returns [%{label, score}] sorted descending; label/3 returns just the best label; similarity/3 computes CLIP-space cosine similarity between two images. Default model is OpenAI CLIP ViT-B/32 (MIT, ~600 MB), powered by Bumblebee. Default prompt template "a photo of {label}" boosts accuracy on bare-noun labels; override or disable as needed.
New flags --background, --caption, and --zero-shot for mix image_vision.download_models to pre-fetch the new defaults.

Changed

The :files list in mix.exs now ships logo.jpg so the docs render the project logo on hexdocs.pm.

See the README for the full feature list and the background, captioning, and zero-shot guides

Assets 2

25 Apr 22:44

kipcole9

v0.1.0

0f28c66

Image Vision version 0.1.0

ImageVision v0.1.0

image_vision is a thin, opinionated wrapper around the Elixir ML ecosystem (Bumblebee, Ortex, Nx) that sits next to the image library. It exposes three vision tasks through a small API designed for developers who are not ML experts: pass a t:Vix.Vips.Image.t/0 in, get useful results out. Strong, permissively-licensed defaults handle model selection, backend configuration, and weight downloads automatically.

Highlights

Image classification via Image.Classification.classify/2 and Image.Classification.labels/2 — returns ImageNet-1k labels with confidence scores. Default model is facebook/convnext-tiny-224 (Apache 2.0, ~110 MB), powered by Bumblebee.
Image embeddings via Image.Classification.embed/2 — returns a 768-dim feature vector suitable for similarity search, clustering, or as input to a downstream classifier. Default model is facebook/dinov2-base (Apache 2.0, ~340 MB).
Object detection via Image.Detection.detect/2 — returns bounding boxes with class labels and scores across the 80 COCO classes. Default model is onnx-community/rtdetr_r50vd (Apache 2.0, ~175 MB), an NMS-free real-time transformer detector that beats YOLOv8 on COCO without YOLO's AGPL licensing constraints.
Promptable segmentation via Image.Segmentation.segment/2 — point, box, or multi-point prompts produce precise pixel masks via SAM 2. Default model is SharpAI/sam2-hiera-tiny-onnx (Apache 2.0, ~150 MB encoder + decoder).
Panoptic segmentation via Image.Segmentation.segment_panoptic/2 — every region in the image gets a class label across 133 COCO panoptic categories (things and stuff). Default model is Xenova/detr-resnet-50-panoptic (Apache 2.0, ~175 MB). Includes a baked-in canonical COCO panoptic id→label map so common stuff classes resolve correctly even on repos with incomplete config.json entries.
Result composition helpers that return t:Vix.Vips.Image.t/0 directly: Image.Detection.draw_bbox_with_labels/3 (configurable opacity, stroke width, font size, palette), Image.Segmentation.compose_overlay/3 (colour-coded overlay of all panoptic segments), and Image.Segmentation.apply_mask/2 (mask as alpha channel for cutouts).
Automatic model weight management via ImageVision.ModelCache — ONNX weights download from HuggingFace on first call and cache on disk. Cache directory is configurable via config :image_vision, :cache_dir, ...; defaults to an XDG-compliant per-user cache. Bumblebee weights use Bumblebee's own HF cache.
mix image_vision.download_models task pre-fetches every default model so first-call latency is eliminated and the library can run offline. Pass --classify, --detect, or --segment to limit scope. Honours user overrides for the Bumblebee classifier and embedder.
Optional ML dependencies — :bumblebee, :nx, and :ortex are all optional: true in mix.exs. The library compiles cleanly without them; each task module is compile-time gated on its underlying runtime so you only pay for what you use.
Strong, opinionated defaults chosen for permissive licensing (Apache 2.0 / MIT only — no AGPL/GPL, no non-commercial), reasonable size (<500 MB), broad applicability, and proven quality. Power users can override every default through options or app config.

See the README for installation, prerequisites (toolchain, disk space, Livebook Desktop), and quick-start examples. The classification, detection, and segmentation guides cover each task in depth, including how to swap in alternative models.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.2.0] 2026-05-02

Added

Changed

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

ImageVision v0.1.0

Highlights

Uh oh!

Releases: elixir-image/image_vision

Image Vision version 0.2.0

[0.2.0] 2026-05-02

Added

Changed

Uh oh!

Image Vision version 0.1.0

ImageVision v0.1.0

Highlights

Uh oh!