This project demonstrates the capabilities of the YOLO model for real-time object detection. Specifically, the YOLOv8m architecture is trained on a custom dataset to evaluate its performance in accurately detecting objects under application-specific conditions.
The project documentation outlines the complete workflow adopted in this study, including:
- generation of bounding box annotations for ground truth labels
- preprocessing of the image and corresponding label datasets
- training of the YOLO-based object detection model
- quantitative and qualitative evaluation of the trained model.
The dataset preparation workflow in this project closely follows the procedure described in the related repository developed for YOLO 2D Segmentation.
The primary distinction in this implementation lies in the conversion of JSON annotation files to YOLO-compatible text formats. In this case, the --output_format=polygon argument is not required, as the labelme2yolo tool generates annotation files suitable for object detection tasks by default.
The label and ground truth coordinate files are generated using the following command:
This concludes the dataset preparation phase. The YOLO model can now be downloaded and trained using the custom-prepared dataset.
The YOLOv8m architecture is employed in this project to detect the presence of butterflies and generate a bounding box for each detected object.
The file paths for the training and validation image datasets are specified and managed through a YAML configuration file, ensuring a structured and reproducible training setup.
The table below summarizes the key software libraries installed and used for training the model.
| Library | Description |
|---|---|
| Ultralytics | Ultralytics provides the instance to download the YOLO model and the arguments necessary to initiate the training |
| Pytorch | Pytorch provides the backbone framework to support the YOLO training |
Although the Ultralytics framework automatically installs the required PyTorch dependencies, the default installation does not include CUDA support for GPU acceleration. As a result, the model may be restricted to CPU-based training unless additional configuration is performed.
To enable GPU-accelerated training, it is therefore recommended to install PyTorch directly from the official homepage. This installation provides the necessary CUDA support, allowing the model to leverage available GPU resources and significantly reduce training time.
The table below provides the information on training parameters:
| Parameter | Value |
|---|---|
| task | detect |
| mode | train |
| epochs | 100 |
| batch size | 8 |
| imgsz | 640 |
The table below presents the number of samples used for training and validation, providing insight into the dataset distribution.
| Mode | Number of samples |
|---|---|
| Training | 300 |
| Validation | 100 |
Upon successful completion of the training process, the pretrained model checkpoints best.pt and last.pt are generated in the runs directory, along with additional training logs and performance metrics. Model evaluation is conducted using the best.pt checkpoint, which corresponds to the model state that achieved the highest validation performance.
This section presents the inference results produced by the model trained on the custom dataset. The trained YOLOv8 model is evaluated using both still images and video sequences to assess its detection performance under different input modalities.
The image shown below is provided as input to the YOLO model during the inference phase.
The corresponding prediction generated by the network is illustrated below. The model successfully identifies the object of interest by producing a bounding box along with the predicted confidence score.
In addition to image-based evaluation, the detector is assessed using a video sequence. The output video generated by the YOLO model is shown below.
butterfly_video_output.mp4
The model demonstrates the ability to detect multiple butterflies within a single frame. The resulting video output highlights the efficiency of the detector in handling multiple instances simultaneously.
butterfly_video2_output.mp4
The table below presents a summary of the model’s inference time observed during the testing phase:
| File type | Inference time (ms) |
|---|---|
| Image | 113.3 |
| Video 1 | 15 to 20 |
| Video 2 | 16 to 18 |


