Almost every day we hear we can use the superpowers of artificial intelligence (AI) to make work easier, to automate routine tasks, to speed up workflows, increase accuracy and make more money. But what if you are not a machine learning expert? This quick guide outlines the basics of how AI-driven object detection works, and should give a good understanding of how to employ it.
In the drone industry, and the geospatial sector more broadly, there is a lot of talk about how AI will help extract actionable information from unstructured image data at a scale and speed never previously seen.
The good news is you do not need to be a machine learning expert nor do you need to hire one to harness the power of AI. Picterra, for example, has created an online platform with an easy-to-use graphic user interface to make AI-powered object detection accessible to everyone. The signature tool of the platform is the custom detector, which allows users to train their own AI detection model without writing a single line of code.
Training AI models with “where” and “what”
AI models can be good students, but they are not human. They lack human intuition and they see things differently. You need to teach them to see the world through your eyes.
To train an AI model to detect objects in an image you need to tell the algorithm where it will find the relevant information and show it examples of what it should (and should not) learn to find.
The first step is to understand how you “see” objects. Think about how you define what the object you are looking for looks like? How do you identify a single unit of this type of object? What are the key visual features for which you are looking? Is it the shape, the colour, the size or the texture? Is it a concrete part of the object, or rather the combination of all of them under certain circumstances?
Once you have identified the key visual features that define the object of interest, you can teach the AI model to find it.
For demonstration purposes, this article will consider a challenging sheep detection project using the custom detector tool on the Picterra platform.
Where to look: defining areas
Fig 1. shows the image before adding any training information. On the left is what you can see; on the right is what the AI model can see before you tell it where to look. As you can see, the algorithm sees nothing. You need to tell the AI model to open its eyes and provide it with information it can “see”.
Fig. 1: The image before any training areas are added: What humans see (left) vs what the AI model sees (right).
Analyse your image and find spots where you have examples of your object of interest and spots where you do not have them. These spots are called “training areas”. The algorithm will look at them in order to learn. Select some of the training areas to tell the algorithm where it should look for examples of both what you are interested in and what you are not interested in. Keep in mind that the AI model will not learn from the other sections of your image that you didn’t highlight.
Highlight example areas
These are sections of your image that you highlight to tell the algorithm, “look at this region, here are the examples of what I need you to find”.
Fig. 2: Defining examples areas, the areas which contain examples of what the AI model should look. At this stage, only the human knows what is in the selected spots.
Each training area should contain multiple examples of your object of interest. It is important to draw a series of training areas that highlight your objects of interest in different contexts (Fig. 2). You want to identify sections of your image where your objects of interest appear on different backgrounds, in different distribution configurations, or in different lighting conditions.
Define counter example areas
Defining areas where you know there are not examples of the object of interest helps the algorithm to understand what you are not looking for (Fig. 3).
Fig. 3: Defining counter example areas – areas that will be used to teach the algorithm that bushes, grass, and dogs are not sheep.
The AI model will use these sections of the image as counterexamples. It is particularly helpful to draw the attention of the algorithm to areas with objects that look similar to the object of interest, but which are not what you are looking for. It usually also helps to include spots that are pure background.
Once the training areas have been defined, the AI model knows where to look for information. It will learn what sheep look like by looking at the training data which contains both examples and counterexamples.
Fig. 4: What humans see (left) vs what the AI model can see (right) once the training areas have been added.
What to look for: Drawing annotations
Now that the algorithm know where to look, it is to tell it what it should look for.
Start by identifying the visual features that define the object of interest. To do so, think about what helps you recognise an object as such. The next step is outlining, i.e. annotating, these objects. This is the way you communicate to the algorithm what you need it to learn to find.
Learning how to draw your annotations is an intuitive and experimental process. How do you define a “unit” of this type of object? What is the key visual factor you “see”? Is it the full object? Or is a specific and distinctive part of it? In this case, we went for full-body outlines.
Fig. 5: Make sure to annotate all the relevant objects contained in the training area and the ones crossing its boundary.
Make sure to annotate all the relevant objects contained in the training area and the ones crossing its boundary. Keep in mind that anything contained in a training area that is not highlighted as an example will be considered a counterexample.
Running the algorithm
With the algorithm knowing where to look for what with examples and counterexamples, it is ready to train the model and to detect objects in the rest of the image.
The AI model has learned what sheep look like and detected all the sheep – and only the sheep (Fig. 6). However, upon closer inspection, in the areas where they are very close to each other the sheep were not detected as individual objects.
Fig. 6: All sheep were detected, but due to the proximity of their bodies, some of the detections are merged.
But what if you want to go further than detecting the sheep, say you want to count them? But the sheep are standing very close to each other, making it a very challenging project to count them individually.
However, you already know that the way you annotate an object influences the output. In this example explores a few variations in the method of drawing the annotations to check how it affected the outputs. For reference, in the original image the known sheep count is 433.
Tweaking the algorithm using different annotations
Originally, the detection output has a number of merged detections so it gives an object count of 71 sheep. In other words, the AI model detected 16,4% of the sheep as individual objects.
Using a different drawing method (insetting the contour of the full body), the detection output has fewer merged detections and an object count of 396 sheep. This method allowed the model to detect 91,5% of the sheep as individual objects.
Fig. 7: Drawing the contour of the full body in the annotation improves the detection output and has fewer merged detections.
Using circles to annotate the heads of the sheep, the output has even fewer merged detections, but a few sheep were still not detected, with an overall headcount of 416, i.e. the model detected 96% of the sheep as individual objects.
Fig. 8: Using circles to annotate the heads of the sheep renders even even fewer merged detections than the previous two annotation methods.
Best practices and pitfalls
Training and customising an AI detection model is an iterative process: you will need to explore and test what works best for each type of object you want to detect.
There are however certain mistakes you can avoid:
Build your own AI detector
Experiment to discover what type of annotations work best for the type of object you need to track and for the context they are in. You might be trying to detect a type of object that has a totally different shape, pattern and colour. These objects might appear distributed throughout an image or might be grouped in a different pattern.
There are many possible variables, but the good news is the custom detector tool allows you to experiment, tweak and fine-tune the model to your needs. As you build and refine your detector you will gain experience and intuition and learn how to best take advantage of the power of AI.
For a closer look at the annotations and the outputs generated in the above example, visit the project site here. You can find step by step instructions on how to build your own Picterra AI detector here.
Contact Veronica Alonso, Picterra, veronica.alonso@picterra.ch