|
| 1 | +# FlexServ API Prompt: YOLO Evaluation Script Generator |
| 2 | + |
| 3 | +### Task Summary: |
| 4 | +To test the capabilities of the FlexServ inference server, we can provide a complex prompt to the Responses API. This prompt asks the AI to generate a complete Python evaluation script that performs Animal detection on the images from the LILA BC Small Animal dataset. This is a large camera-trap image dataset used for wildlife monitoring and ecological research. It contains millions of images captured by automated cameras, including small mammals and many blank triggers, along with annotations describing the detected species. For training object detection models such as YOLO, the dataset can be downloaded in YOLO format, where each image has a corresponding .txt label file containing normalized bounding-box coordinates in the form <class_id> <x_center> <y_center> <width> <height>. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +### On FlexServ UI |
| 9 | + |
| 10 | +1. Copy and paste the following text into the FlexServ UI in the `Responses API`, `Input(Markdown)` section |
| 11 | + |
| 12 | + |
| 13 | +<div style="max-height:400px; overflow:auto; border:1px solid #ddd; padding:10px;"> |
| 14 | +<pre> |
| 15 | + |
| 16 | +> "Write Python code that reads all images from a dataset root directory stored in the variable DATASET_ROOT. |
| 17 | +> |
| 18 | +> **TASK DESCRIPTION:** |
| 19 | +> - This is an IMAGE-LEVEL BINARY CLASSIFICATION task implemented using an object detection model. |
| 20 | +> - The goal is to determine whether an image contains an animal or not. |
| 21 | +> |
| 22 | +> **DATASET STRUCTURE:** |
| 23 | +> - DATASET_ROOT contains three subdirectories: `train`, `test`, and `val`. |
| 24 | +> - Each directory contains two subdirectories: |
| 25 | +> * images/ → contains image files (.jpg, .jpeg, .png) |
| 26 | +> * labels/ → contains YOLO format .txt files |
| 27 | +> - GROUND-TRUTH LOGIC: An image is considered an `animal` if a corresponding .txt file exists and is not empty in the `labels/` folder. |
| 28 | +> |
| 29 | +> **MODEL REQUIREMENTS:** |
| 30 | +> - Use ONLY a pretrained Ultralytics YOLO detection model (e.g., yolov8n.pt). |
| 31 | +> - Load the model using the Ultralytics YOLO API. |
| 32 | +> - Assume YOLO detects animals using class ID `animal` at index 0. |
| 33 | +> |
| 34 | +> **DETECTION LOGIC (IMPORTANT):** |
| 35 | +> - Run object detection on each image. |
| 36 | +> - If the model produces AT LEAST ONE detection of an animal class with confidence >= 0.5: |
| 37 | +> → The image-level prediction is `animal`. |
| 38 | +> |
| 39 | +> **EVALUATION METRICS:** |
| 40 | +> - Iterate through the images in the `test` split. |
| 41 | +> - Compare the image-level prediction with the ground truth (existence of label file). |
| 42 | +> - Count: True Positives, True Negatives, False Positives, and False Negatives. |
| 43 | +> |
| 44 | +> **ACCURACY DEFINITION:** |
| 45 | +> - Overall accuracy = (True Positives + True Negatives) / Total Images |
| 46 | +> |
| 47 | +> **OUTPUT REQUIREMENTS:** |
| 48 | +> - Print for each image: filename, ground-truth status, and prediction. |
| 49 | +> - At the end, print a summary report including total images, counts for each metric, and overall detection accuracy. |
| 50 | +> |
| 51 | +> **CODING REQUIREMENTS:** |
| 52 | +> - Store the main path in DATASET_ROOT. |
| 53 | +> - Use `pathlib` or `os` for robust file path matching. |
| 54 | +> - Read only .jpg, .jpeg, and .png files. |
| 55 | +> - Include clear comments explaining each step. |
| 56 | +> |
| 57 | +> After the code, briefly explain how the program works in plain English." |
| 58 | +</pre> |
| 59 | +</div> |
| 60 | + |
| 61 | +2. Change the temperature to 0.0 for a deterministic solution. |
| 62 | +3. Select the model to Run |
| 63 | + - Qwen/Qwen2.5-Coder32B-Instruct-61.0 GB - Text Generation |
| 64 | +4. Make sure the Streams is checked. |
| 65 | +5. Uncheck Multi-turn conversation |
| 66 | +6. Click Run. You should see the generated code in the blue box in Responses API. Wait for it to complete. |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +Now, let's test it's performance on the test dataset using the Jupyter Notebook. |
| 71 | + |
| 72 | + |
| 73 | +### On Jupyter : |
| 74 | + |
| 75 | +Go to the notebook Code-Detection |
| 76 | + |
| 77 | +Copy the generated code from FlexServ UI in a new cell in the notebook file. |
| 78 | + |
| 79 | +Update the variable `DATASET_ROOT` to path `/home/jovyan/work/vista/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11` |
| 80 | + |
| 81 | +Update the model path to `/home/jovyan/work/vista/ai-tutorial-2026/models/yolov9t_ep200_bs32_lr0.005_baa22147.pt` |
| 82 | + |
| 83 | +Now run the code. On successful run, you should see output similar to below |
| 84 | + |
| 85 | +<div style="max-height:400px; overflow:auto; border:1px solid #ddd; padding:10px;"> |
| 86 | +<pre> |
| 87 | + |
| 88 | +image 1/1 /home/jovyan/work/vista/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11/test/images/KPC2__2019-09-19__15-47-42-1-_JPG.rf.608031a2809f0f6714f175d3e5eb7f06.jpg: 640x640 1 animal, 96.6ms |
| 89 | +Filename: KPC2__2019-09-19__15-47-42-1-_JPG.rf.608031a2809f0f6714f175d3e5eb7f06.jpg, Ground Truth: no_animal, Prediction: animal |
| 90 | + |
| 91 | +Speed: 2.5ms preprocess, 96.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640) |
| 92 | +image 1/1 /home/jovyan/work/vista/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11/test/images/NOR3__2019-07-19__11-40-00-1-_JPG.rf.b85ee30f99a803b09f8c5a7da7f9a508.jpg: 640x640 (no detections), 104.2ms |
| 93 | +Speed: 1.9ms preprocess, 104.2ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640) |
| 94 | +Filename: NOR3__2019-07-19__11-40-00-1-_JPG.rf.b85ee30f99a803b09f8c5a7da7f9a508.jpg, Ground Truth: animal, Prediction: no_animal |
| 95 | +.... |
| 96 | +... |
| 97 | +Evaluation Metrics: |
| 98 | +Total images processed: 100 |
| 99 | +Total animal images (based on label files): 71 |
| 100 | +True Positives: 47 |
| 101 | +True Negatives: 6 |
| 102 | +False Positives: 23 |
| 103 | +False Negatives: 24 |
| 104 | +Overall detection accuracy: 0.53 |
| 105 | +</pre> |
| 106 | +</div> |
| 107 | + |
0 commit comments