This guide provides a step-by-step workflow for training YOLOv3 to detect a custom object, in this case, a specific cow. The workflow combines both local and cloud-based computational resources for optimal performance.
The full dataset, including images, YOLO labels, and Darknet configuration files, is available on Kaggle.
- Introduction
- Hardware and Software Environment
- Preprocessing Input Data
- Labeling Input Images
- Configuring YOLOv3
- Training YOLOv3 on Google Colab
- References
YOLO (You Only Look Once) is a state-of-the-art object detection model capable of identifying and localizing objects in images or video frames. Pre-trained versions are available for common objects (COCO dataset, 80 classes). In this project, we train YOLOv3 to detect a specific cow, but the workflow can be adapted for any object.
We use:
- Darknet: Open-source neural network framework for YOLO.
- Google Colab: Linux-based GPU environment for training.
- Local machine: Windows-based environment for preprocessing, manual labeling, and tracking using OpenCV, LabelImg, and dlib.
-
Image and Video Preprocessing
- Done locally using OpenCV and FFmpeg.
- Includes resizing, frame extraction, format conversion, etc.
-
Manual or Semi-Automated Labeling
- Use LabelImg for manual labeling.
- Semi-automated labeling: Track objects in video with dlib, save bounding boxes as YOLO labels.
-
Training and Testing YOLOv3
- Conducted on Google Colab with GPU support.
- Custom anchor sizes calculated and YOLOv3 configured for your dataset.
-
Interoperability
- Data is exchanged between local machine and Google Drive to manage preprocessing, training, and post-processing.
YOLO requires both images and corresponding .txt label files. Each line in the label file contains:
<class_id> <center_x> <center_y> <width> <height>
<class_id>: Object class (0 if single class)<center_x>&<center_y>: Center of bounding box relative to image dimensions<width>&<height>: Width and height relative to image dimensions
Steps:
- Take snapshots of the target cow from video using FFmpeg:
ffmpeg -i input_video.mp4 -vf "fps=0.5,scale=1920:-1" -q:v 1 snapshot%d.png- Label snapshots in LabelImg:
# Navigate to LabelImg folder
cd C:\your_path
# Run LabelImg
python labelImg.py- Draw bounding boxes and assign class.
- Save annotations (
.txt) for YOLO. - If LabelImg assigns a different class ID (e.g., 15), adjust it to 0 manually or via provided script.
- Select a clear frame of the target object and manually annotate the bounding box.
- Use dlib correlation tracker to track the object in video.
- Save frames and bounding boxes at intervals, converting coordinates from dlib format to YOLO format.
- Save snapshots and
.txtlabels with consistent filenames.
Download Darknet YOLOv3 from AlexeyAB/darknet.
-
.cfgfile:batch = 64subdivisions = 64max_batches = 2000(for 1 class)classes = 1filters = (classes + 5) * 3 = 18- Update
[yolo]layers with calculated anchors.
-
Custom anchors (from k-means clustering):
20 27, 16 23, 16 31, 18 27, 15 21, 19 26, 15 27, 15 25, 13 23
obj.data:
classes = 1
train = /content/darknet/train.txt
valid = /content/darknet/valid.txt
names = /content/darknet/obj.names
backup = /content/darknet/backup/
obj.names:
cow13
train.txt/valid.txt:
/content/darknet/data/train/image1.jpg
/content/darknet/data/train/image2.jpg
...
Upload the following files to Google Drive:
backupfolderobj.dataobj.namestrain.txtvalid.txt
Run in Colab:
# Add execute permission
!chmod +x /content/drive/MyDrive/darknet/darknet
# Change directory
%cd /content/drive/MyDrive/darknet
# Start training
!./darknet detector train /content/drive/MyDrive/darknet/data/obj.data /content/drive/MyDrive/darknet/cfg/yolov3.cfg -dont_show- Save trained weights to
backupfolder. - Test detection on a video:
!./darknet detector demo /content/drive/MyDrive/darknet/data/obj.data \
/content/drive/MyDrive/darknet/cfg/yolov3.cfg \
/content/drive/MyDrive/darknet/backup/yolov3_best.weights \
-dont_show -out /content/drive/MyDrive/outputVideo.mp4 \
/content/drive/MyDrive/cutVideo.mp4- Detect on a single image:
!./darknet detector test /content/drive/MyDrive/darknet/data/obj.data \
/content/drive/MyDrive/darknet/cfg/yolov3.cfg \
/content/drive/MyDrive/darknet/backup/yolov3_best.weights \
/content/drive/MyDrive/cow3.png -dont_show -out /content/drive/MyDrive/outputImage.jpg- Redmon, J., Divvala, S., Girshick, R., Farhadi, A., You Only Look Once: Unified, Real-Time Object Detection, 2016.
- Lin, T.Y., et al., Microsoft COCO: Common Objects in Context, 2014.
- FFmpeg
- LabelImg
- AlexeyAB Darknet
- Kaggle dataset for YOLOv3 Cow Detection: https://www.kaggle.com/datasets/gayebartos/yolov3-cow-detection