Blog — Advancing Analytics

How to YOLO like a Pro

Written by Luke Menzies | Nov 25, 2024 3:49:58 PM

Introduction

When it comes to Computer Vision (CV), discussions often veer towards the latest advancements in GPT-4 Vision or other cutting-edge Gen AI approaches for image classification and object detection. Additionally, with the rise of more user-friendly tools, makes object detection more accessible to people who aren’t codeaholics. These tools, while user-friendly, can sometimes create the impression that object detection has become 'easy'. With platforms like Azure AI Vision or GPT-4 Vision, you can indeed whip up a model at the click of a button. But are they truly fit for purpose?

While these tools are great for quick demos, producing enterprise-grade solutions with razor-sharp accuracy, or detecting unique and bespoke objects (like a newly discovered species of fish with limited imagery), often requires more sophisticated, lower-level solutions. Enter YOLO!

What is YOLO?

YOLO (You Only Look Once) is an innovative deep-learning model for real-time object detection. Whilst it was not originally designed for image classification, its functionality has now expanded to enable impressive image classification capabilities, as well as object detection. Unlike traditional methods that use a sliding window to locate objects, YOLO processes the entire image in one go, making it incredibly fast and efficient.

Why Should I Continue Reading?

While this blog is geared towards Python coders, many readers might be familiar with YOLO but unsure how to fully utilise it. Its relative complexity often deters CV enthusiasts, who might default to more user-friendly methods. Fear not! This blog will unmask YOLOs capabilities, from basic usage to advanced techniques that can outshine one-size-fits-all models.

Setting Up YOLO

There are many ways to setup YOLO. It is recommended to use a virtual environment what you can relieve yourself of too many dependency clashes. To setup a virtual environment, use the following command using Anaconda, say:

Once you have a virtual environment, install the required libraries using either PIP or Anaconda:

or 

Setting Up YOLO with a GPU

If an Nvidia GPU is available on the machine being used, the user can significantly reduce the time it takes to train models by utilising it within the created virtual environment. To achieve this, the steps for installing YOLO are a bit more involved. It is recommended to install the correct GPU-supported version of PyTorch before attempting to install YOLO. For more information on how to install PyTorch for a GPU, please read A Step-by-Step Guide to Installing CUDA with PyTorch in Conda on Windows — Verifying via Console and PyCharm | by Harun Ijaz | Medium for more details.

Creating a Custom Object Detection Model

Although YOLO does come with pretrained object detection, the real power comes from creating a custom model, training on a bespoke dataset. In order to do this, you will require some additional tools installed. The first being software to create bounding boxes to be used within the YOLO training step. There are many options however a recommended one to go with is ‘LabelMe’. This can easily be installed onto your local environment using the following command.

Once you’ve installed this, typing LabelMe within the Anaconda Command prompt (in your activated virtal environment shall produce the software on the screen. Note: you may be greeted with the error message ‘No Qt bindings could be found’. If this is the case installing pyqt5 should resolve the issue.

Preparing Your Dataset

Once you've got LabelMe installed, it's time to start thinking about creating an object detection model. Naturally, this requires a selection of images containing the object you wish to detect. Take your time gathering images that contain what you want. Ideally, the more images you can gather, the better. It's often worth exploring websites such as Kaggle, which offer many collections of images for computer vision. It's recommended to have at least 100 images.

Once you have all your images centralised in a directory, navigate to it using the LabelMe software. LabelMe allows you to take an image and draw what's known as bounding boxes around the object of interest in the image. It's recommended to focus on rectangular bounding boxes, which can be selected from the edit tab at the top. This tool allows you to run through a collection of images within a directory, defining bounding boxes ready for training.

Converting and Formatting for YOLO

Once you've created all the required bounding boxes, the next step is to convert the labelled files into a format YOLO can understand. LabelMe generates JSON files within the same directory as the images. These files contain more information than needed and are not compatible with YOLO models. Additionally, YOLO requires the following structure:

The path for the main yaml file is referenced as an argument within the model. The yaml file itself requires the following within it:

This requires reorganising and reformatting the contents of the current directory, including the images and label files. This might seem like a time-consuming task, but fear not! The Python library labelme2yolo creates this structure and formatting for you at the click of a button. To install this library, simply use:

 

Once installed, open an Anaconda command prompt and navigate to the directory just above the one the images and labels are stored. Type the following command to create the appropriate structure for YOLO:

This conveniently sets up the directory to work with a YOLO object detection model. Once this structure is in place, you're ready to create a YOLO model. This can be achieved with a few lines of code in Python:

Once completed, you have your first YOLO model. To use it, simply pass the path of an image as an argument to the newly trained model:

This will return the results of the model inference. To display the results, simply use the following:

Well done! You have grasped the basics. Now it’s time to explore how you can elevate your YOLO models into the elite category through various tips and techniques.

Utilising pre-trained models

OLO is known for providing weights for pre-trained models. However, you can take it a step further and use these pre-trained weights to enhance custom models. Many of the initial layers of the neural networks are responsible for extracting the contours of an image. By freezing these first few layers and training on top of them, the model can achieve faster training and better performance.

To use these, you need to download the pre-trained weights. There are various types of pre-trained weights to choose from.

  • yolov10n.pt
    Description: YOLOv10 Nano (YOLOv10n) is the smallest and fastest model in the YOLOv10 family. It is designed for edge devices with limited computational resources. It provides a good balance between speed and accuracy, making it suitable for applications where real-time performance is critical, but the available hardware is constrained.
    URL: yolov10n.pt
  • yolov10s.pt
    Description: YOLOv10 Small (YOLOv10s) offers a slightly larger model than YOLOv10 Nano, providing improved accuracy while maintaining a balance with computational efficiency. It is suitable for applications requiring real-time performance on devices with moderate resources.
    URL: yolov10s.pt
  • yolov10m.pt
    Description: YOLOv10 Medium (YOLOv10m) is a mid-sized model designed to provide a compromise between speed and accuracy. It is well-suited for applications that require more accurate object detection but have access to moderate computational resources.
    URL: yolov10m.pt
  • yolov10b.pt
    Description: YOLOv10 Base (YOLOv10b) is the base model in the YOLOv10 family, providing a standard balance between accuracy and computational load. It is typically used for general-purpose object detection tasks where a balanced performance is desired.
    URL: yolov10b.pt
  • yolov10x.pt
    Description: YOLOv10 Extra Large (YOLOv10x) is the largest and most accurate model in the YOLOv10 family. It is designed for applications where the highest accuracy is required, and there are sufficient computational resources available to handle its increased complexity.
    URL: yolov10x.pt
  • yolov10l.pt
    Description: YOLOv10 Large (YOLOv10l) provides a high level of accuracy, balancing between the medium and extra-large models. It is suitable for applications that require high accuracy but can operate with a relatively high computational load.
    URL: yolov10l.pt

These weights are available from the links provided above. Once downloaded, simply point to the path of the .pt file you wish to use and use the following command:

This will vastly improve the training process for the model you’re building.

Using AutoBatch

Choosing the correct batch size is crucial for getting the best out of your model. Generally, for smaller samples, smaller batch sizes and higher epochs are favourable. A lot of work could be put into tuning this parameter. YOLO includes a useful tool known as AutoBatch, which automatically chooses the optimal batch size prior to training. To activate AutoBatch, simply set the batch argument to ‘-1’ to enable automatic batch determination:

Semi-Automated Bounding Box Generation

YOLO is known for its top performance when provided with large numbers of images where the user generates accompanying bounding boxes to train on. One drawback of achieving elevated accuracy and customisation is that the process of obtaining these is extremely time-consuming. The laborious task of manually creating bounding boxes (labelled data) for all the images is one of the reasons people may seek alternatives such as off-the-shelf models or multimodal models (e.g., GPT-4 Vision). However, there is a method for streamlining this process.

Users can actually use the model itself to create bounding boxes and iterate through the creation process, adjusting any poorly produced boxes as they go along to prevent a divergent model. Firstly, train an initial model on a handful of images with bounding boxes as a starting point. This model can then be used as a basis for producing bounding boxes. The second step is to generate the bounding boxes for the next iteration. It is advisable not to create bounding boxes for all the images at once, but rather for a batch of them, making adjustments where necessary.

Due to the limited exposure to the object of interest in the images, the confidence level for each bounding box prediction will initially be very low. Therefore, you must manually set the confidence threshold low in order for the model to pick up any boxes. As time goes on, the confidence level will rise. Initially, the threshold should be set between 0.01 and 0.2. If the level is too low, however, there is a risk that the model may pick out too many bounding boxes. The confidence threshold can be set at a prediction level using the following:

This can be adjusted through the iterations. Once you’ve obtained the bounding box, it needs to be converted into a format that can be visualised within LabelMe. This way, you can easily iterate through any images where the bounding box is not as it should be and adjust it as you go. To do this, you can extract the bounding box with the following function:

This creates a Python dictionary that can be saved as a JSON file in the original directory where the initial bounding boxes were created. The res argument is the result obtained by using result = model(path, conf=*confidence_threshold*). One final thing to note is that the JSON file for LabelMe requires converting the image to base64 and storing it as a string as part of the JSON file. To do this, you can use the following routine:

This function can be used when converting the format. Finally, you have everything you need to start iterating through the batches, tweaking the bounding boxes as you go.

The batch size can be chosen by the user. There isn’t a well-defined way of choosing this. Just be aware that you want to pick a value that doesn’t require too many batches due to the repetitive nature of the process. Similarly, you wouldn’t want to make it too large, in case it requires too many bounding box adjustments as you go. One of the best solutions is to have the first batch relatively small and then increase the batch size over time.

Conclusion

So there you have it!

YOLO, the incredible computer vision library for creating impeccable CV models, continues to evolve and make its presence known with its impressive capabilities. While the landscape of computer vision is ever-changing, with user-friendly tools and pre-trained models becoming more prevalent, mastering YOLO offers enormous advantages for those seeking precision and customisation in object detection.

Although using YOLO is often seen as a more involved process, leveraging pre-trained weights, and employing techniques like AutoBatch and semi-automated bounding box generation, can not only elevate your models to new heights but also accelerate the development process.

Got some AI challenges or looking to take your object detection to the next level? Our team at Advancing Analytics is all about making complex AI solutions understandable and impactful.  Drop us a line, and let’s chat about how we can help bring your ideas to life with AI.