This project demonstrates how to constrain vision model outputs into a fixed label space for more predictable and reproducible behavior.
A Python pipeline for batch image classification using OpenAI vision models with:
- strict JSON-schema output
- constrained label selection
- batch processing over a local image folder
- saved structured results
- token and cost tracking
This project shows how to move from ad hoc image prompting to a controlled, reproducible vision pipeline.
Note:
Fox_Image.png was intentionally introduced as a wildcard test case and was not included in the allowed label set.
As a result, the model mapped it to the closest available class (dog) with a lower confidence score (0.91).
This demonstrates an important behavior of constrained classification systems: when the correct label is unavailable, the model will still attempt to select the nearest valid option rather than abstaining.
The script:
- Loads
.pngimages from a local folder\ - Converts each image to a base64 data URL\
- Sends each image to an OpenAI vision-capable model\
- Forces the model to return strict JSON\
- Restricts predictions to an allowed label set\
- Saves results to a
classification_results.jsonfile\ - Tracks token usage and estimated run cost
The model is required to return structured JSON in a fixed format:
{
"predicted_label": "string",
"confidence": float
}Instead of allowing free-form responses, the model is constrained to select from these allowed labels only.
- prevents unstructured or inconsistent outputs
- enables reliable downstream automation
- improves reproducibility across runs
If the correct label is not present, the model will still choose the closest available option.
This behavior is demonstrated in the Fox_Image.png example.
The classifier operates over a fixed, predefined label set:
allowed_labels = ["cat", "dog", "bear", "chicken", "fish", "iguana", "giraffe", "raccoon", "octopus", "owl"]Most image API examples stop at "send one image and print a result."
This repo shows how to build a structured pipeline:
- reproducible batch processing\
- machine-readable outputs\
- schema enforcement\
- cost observability
structured-image-classifier/
├── README.md
├── LICENSE
├── .gitignore
├── Agent_Image_Processor_1.4.py
├── assets/
│ ├── run_example.png
│ └── allowed_labels.png
└── labeled_images/
Quick start:
If you already have the required Python dependencies installed (openai, python-dotenv), you can skip environment setup and run the script directly.
Otherwise, follow the full setup below.
git clone https://github.com/mjtiv/structured-image-classifier.git
cd structured-image-classifierpython -m venv .venvActivate:
Windows
.venv\Scripts\activateMac/Linux
source .venv/bin/activatepip install openai python-dotenvCreate .env file:
OPENAI_API_KEY=your_api_key_herepython Agent_Image_Processor_1.4.pyImportant (update image path):
The script uses a hardcoded local path for images:
location_images = Path(r"D:\Coding_Agents\Image_Agentic_Processing\labeled_images")This will not work on other machines.
Update it to point to the repository folder:
location_images = Path("labeled_images")Place your images inside the labeled_images/ directory before running.
Bear_Image.png -> bear (0.99)
Cat_Image.png -> cat (0.99)
Dog_Image.png -> dog (0.99)
Giraffe_Image.png -> giraffe (0.99)
Fox_Image.png -> dog (0.91)
Note:
Fox_Image.png was intentionally introduced as a wildcard test case
and was not included in the allowed label set.
Because the classifier is constrained to a fixed set of valid labels,
the model selected the closest available category (dog) with a
slightly lower confidence score.
This demonstrates an important property of constrained classification systems:
- the model will always return a valid label
- even when the correct label is unavailable
- resulting in a forced approximation rather than abstention
[
{
"predicted_label": "cat",
"confidence": 0.99,
"filename": "Cat_Image.png"
}
][
{
"predicted_label": "dog",
"confidence": 0.91,
"filename": "Fox_Image.png"
}
]When using a constrained label set, the model will:
- always return a valid label
- even if the correct label is unavailable
- approximate to the closest available category
This is a key tradeoff in controlled AI systems and should be handled in production (e.g., with an "unknown" class or confidence thresholds).
Ensures reliable parsing and downstream automation.
Forces classification into a controlled set, exposing edge cases.
Tracks token usage to estimate run cost.
- PNG-only input\
- Hardcoded label list\
- No retry logic\
- No evaluation metrics
- CLI arguments for paths and labels\
- Support JPG / WEBP\
- CSV export\
- Retry + logging\
- Confusion matrix / evaluation
MIT

