Real-Time Object Recognition

PURA UAV Team | Publication Date: April 2025

Object recognition plays a crucial role, particularly in areas such as autonomous vehicles, security systems, unmanned aerial vehicles (UAVs), and robotic applications. For a system to perceive its environment and respond appropriately, it needs to recognize objects accurately and quickly. In this context, object recognition is not just a feature but a fundamental requirement for the safety, efficiency, and stability of the system. For real-time decision making, it is critical that the algorithms used provide both high accuracy and low latency. In this article, we explore in detail the preprocessing techniques, model selection, and dataset creation processes we implemented to improve real-time object recognition performance.

The Power of Preprocessing Techniques

Directly feeding raw images to the model does not always yield ideal results. We tested various preprocessing steps to enhance the details in images and help the model better recognize patterns. During this process, we tried many different methods and selected the Unsharp Masking (UMS) technique, which has the highest efficiency in terms of performance/speed.

What is Unsharp Masking?

Unsharp Masking works by first blurring the image with a Gaussian Blur filter, then subtracting this blurred version from the original image by a certain proportion. This method makes edges and details in the image more distinct. It significantly improves the model's object recognition success, especially in long-distance shots or low-clarity images.

Example of clarity enhancement with UMS method.

Why the YOLO Model?

After optimizing the preprocessing steps, it was time for model selection. In real-time object recognition applications, accuracy is as important as speed. For this reason, the YOLO (You Only Look Once) family was chosen due to its widespread use, extensive community support, and high performance rates.

Why YOLOv11s?

In evaluations among different versions of YOLOv11, the YOLOv11s model stood out with its high accuracy (mAP) and low latency, especially due to its structure that can keep up with camera FPS speeds. Thus, our system became capable of both capturing real-time data flow and maintaining high accuracy rates.

Dataset and Augmentation Process

To train the model, we used not only ready-made datasets but also images that we collected and labeled ourselves. We compiled a large training dataset from the following sources:

Roboflow
COCO
Open Images
Special images captured with our drone
Images collected from the internet

We performed various augmentation processes on these datasets using the Roboflow platform. The applied processes included: Flip, Rotate, Shear, Scale, Brightness, Contrast, Saturation, Hue, Grayscale, Blur, Noise, Auto-Orient, Resize.

Model Training

After preparing the dataset, we proceeded to train the YOLOv11s model. During the training process, both the overfitting risk was reduced and the model's ability to generalize in different scenarios was increased by considering the diversity and balance of the data. Hyperparameters such as learning rate, batch size, and number of epochs were meticulously determined during training. The model was tested with the validation dataset after each epoch for performance monitoring, and early stopping was applied when necessary. Upon completion of training, the model became capable of successfully performing object recognition tasks with high mAP values and low loss rates. This model was optimized for use in real-time applications and integrated into the field.

Accuracy graph of a YOLO model based on number of epochs

Lessons Learned from Experience

We drew several important lessons from this project:

Feeding processed rather than direct images to the model significantly increases success.
The YOLOv11s model provides a major advantage in real-time applications with both low latency and high accuracy.
By creating our own dataset, we enabled the model to better adapt to real conditions in the field.
We reduced the risk of overfitting and increased generalization capacity by enriching limited data with augmentation processes.
We observed the impact on performance of customizing hyperparameters such as batch size and number of epochs according to the project during model training.