PURA UAV Team | Publication Date: April 2025
Object recognition plays a crucial role, particularly in areas such as autonomous vehicles, security systems, unmanned aerial vehicles (UAVs), and robotic applications. For a system to perceive its environment and respond appropriately, it needs to recognize objects accurately and quickly. In this context, object recognition is not just a feature but a fundamental requirement for the safety, efficiency, and stability of the system. For real-time decision making, it is critical that the algorithms used provide both high accuracy and low latency. In this article, we explore in detail the preprocessing techniques, model selection, and dataset creation processes we implemented to improve real-time object recognition performance.
Directly feeding raw images to the model does not always yield ideal results. We tested various preprocessing steps to enhance the details in images and help the model better recognize patterns. During this process, we tried many different methods and selected the Unsharp Masking (UMS) technique, which has the highest efficiency in terms of performance/speed.
Unsharp Masking works by first blurring the image with a Gaussian Blur filter, then subtracting this blurred version from the original image by a certain proportion. This method makes edges and details in the image more distinct. It significantly improves the model's object recognition success, especially in long-distance shots or low-clarity images.
Example of clarity enhancement with UMS method.
After optimizing the preprocessing steps, it was time for model selection. In real-time object recognition applications, accuracy is as important as speed. For this reason, the YOLO (You Only Look Once) family was chosen due to its widespread use, extensive community support, and high performance rates.
In evaluations among different versions of YOLOv11, the YOLOv11s model stood out with its high accuracy (mAP) and low latency, especially due to its structure that can keep up with camera FPS speeds. Thus, our system became capable of both capturing real-time data flow and maintaining high accuracy rates.
To train the model, we used not only ready-made datasets but also images that we collected and labeled ourselves. We compiled a large training dataset from the following sources:
We performed various augmentation processes on these datasets using the Roboflow platform. The applied processes included: Flip, Rotate, Shear, Scale, Brightness, Contrast, Saturation, Hue, Grayscale, Blur, Noise, Auto-Orient, Resize.
After preparing the dataset, we proceeded to train the YOLOv11s model. During the training process, both the overfitting risk was reduced and the model's ability to generalize in different scenarios was increased by considering the diversity and balance of the data. Hyperparameters such as learning rate, batch size, and number of epochs were meticulously determined during training. The model was tested with the validation dataset after each epoch for performance monitoring, and early stopping was applied when necessary. Upon completion of training, the model became capable of successfully performing object recognition tasks with high mAP values and low loss rates. This model was optimized for use in real-time applications and integrated into the field.
Accuracy graph of a YOLO model based on number of epochs
We drew several important lessons from this project: