A crash course in AI-driven object detection

April 2nd, 2019, Published in Articles: EE Publishers, Articles: PositionIT, Featured: EE Publishers, Featured: PositionIT

Almost every day we hear we can use the superpowers of artificial intelligence (AI) to make work easier, to automate routine tasks, to speed up workflows, increase accuracy and make more money. But what if you are not a machine learning expert? This quick guide outlines the basics of how AI-driven object detection works, and should give a good understanding of how to employ it.

In the drone industry, and the geospatial sector more broadly, there is a lot of talk about how AI will help extract actionable information from unstructured image data at a scale and speed never previously seen.

The good news is you do not need to be a machine learning expert nor do you need to hire one to harness the power of AI. Picterra, for example, has created an online platform with an easy-to-use graphic user interface to make AI-powered object detection accessible to everyone. The signature tool of the platform is the custom detector, which allows users to train their own AI detection model without writing a single line of code.

Training AI models with “where” and “what”

AI models can be good students, but they are not human. They lack human intuition and they see things differently. You need to teach them to see the world through your eyes.

To train an AI model to detect objects in an image you need to tell the algorithm where it will find the relevant information and show it examples of what it should (and should not) learn to find.

The first step is to understand how you “see” objects. Think about how you define what the object you are looking for looks like? How do you identify a single unit of this type of object? What are the key visual features for which you are looking? Is it the shape, the colour, the size or the texture? Is it a concrete part of the object, or rather the combination of all of them under certain circumstances?

Once you have identified the key visual features that define the object of interest, you can teach the AI model to find it.

For demonstration purposes, this article will consider a challenging sheep detection project using the custom detector tool on the Picterra platform.

Where to look: defining areas

Fig 1. shows the image before adding any training information. On the left is what you can see; on the right is what the AI model can see before you tell it where to look. As you can see, the algorithm sees nothing. You need to tell the AI model to open its eyes and provide it with information it can “see”.

Fig. 1: The image before any training areas are added: What humans see (left) vs what the AI model sees (right).

Fig. 1: The image before any training areas are added: What humans see (left) vs what the AI model sees (right).

Analyse your image and find spots where you have examples of your object of interest and spots where you do not have them. These spots are called “training areas”. The algorithm will look at them in order to learn. Select some of the training areas to tell the algorithm where it should look for examples of both what you are interested in and what you are not interested in. Keep in mind that the AI model will not learn from the other sections of your image that you didn’t highlight.

Highlight example areas

These are sections of your image that you highlight to tell the algorithm, “look at this region, here are the examples of what I need you to find”.

Fig. 2: Defining examples areas, the areas which contain examples of what the AI model should look. At this stage, only the human knows what is in the selected spots.

Fig. 2: Defining examples areas, the areas which contain examples of what the AI model should look. At this stage, only the human knows what is in the selected spots.

Each training area should contain multiple examples of your object of interest. It is important to draw a series of training areas that highlight your objects of interest in different contexts (Fig. 2). You want to identify sections of your image where your objects of interest appear on different backgrounds, in different distribution configurations, or in different lighting conditions.

Define counter example areas

Defining areas where you know there are not examples of the object of interest helps the algorithm to understand what you are not looking for (Fig. 3).

Fig. 3: Defining counter example areas – areas that will be used to teach the algorithm that bushes, grass, and dogs are not sheep.

Fig. 3: Defining counter example areas – areas that will be used to teach the algorithm that bushes, grass, and dogs are not sheep.

The AI model will use these sections of the image as counterexamples. It is particularly helpful to draw the attention of the algorithm to areas with objects that look similar to the object of interest, but which are not what you are looking for. It usually also helps to include spots that are pure background.

Once the training areas have been defined, the AI model knows where to look for information. It will learn what sheep look like by looking at the training data which contains both examples and counterexamples.

Fig. 4: What humans see (left) vs what the AI model can see (right) once the training areas have been added.

Fig. 4: What humans see (left) vs what the AI model can see (right) once the training areas have been added.

What to look for: Drawing annotations

Now that the algorithm know where to look, it is to tell it what it should look for.

Start by identifying the visual features that define the object of interest. To do so, think about what helps you recognise an object as such. The next step is outlining, i.e. annotating, these objects. This is the way you communicate to the algorithm what you need it to learn to find.

Learning how to draw your annotations is an intuitive and experimental process. How do you define a “unit” of this type of object? What is the key visual factor you “see”? Is it the full object? Or is a specific and distinctive part of it? In this case, we went for full-body outlines.

Fig. 5: Make sure to annotate all the relevant objects contained in the training area and the ones crossing its boundary.

Fig. 5: Make sure to annotate all the relevant objects contained in the training area and the ones crossing its boundary.

Make sure to annotate all the relevant objects contained in the training area and the ones crossing its boundary. Keep in mind that anything contained in a training area that is not highlighted as an example will be considered a counterexample.

Running the algorithm

With the algorithm knowing where to look for what with examples and counterexamples, it is ready to train the model and to detect objects in the rest of the image.

The AI model has learned what sheep look like and detected all the sheep – and only the sheep (Fig. 6). However, upon closer inspection, in the areas where they are very close to each other the sheep were not detected as individual objects.

Fig. 6: All sheep were detected, but due to the proximity of their bodies, some of the detections are merged.

Fig. 6: All sheep were detected, but due to the proximity of their bodies, some of the detections are merged.

But what if you want to go further than detecting the sheep, say you want to count them? But the sheep are standing very close to each other, making it a very challenging project to count them individually.

However, you already know that the way you annotate an object influences the output. In this example explores a few variations in the method of drawing the annotations to check how it affected the outputs. For reference, in the original image the known sheep count is 433.

Tweaking the algorithm using different annotations

Originally, the detection output has a number of merged detections so it gives an object count of 71 sheep. In other words, the AI model detected 16,4% of the sheep as individual objects.

Using a different drawing method (insetting the contour of the full body), the detection output has fewer merged detections and an object count of 396 sheep. This method allowed the model to detect 91,5% of the sheep as individual objects.

Fig. 7: Drawing the contour of the full body in the annotation improves the detection output and has fewer merged detections.

Fig. 7: Drawing the contour of the full body in the annotation improves the detection output and has fewer merged detections.

Using circles to annotate the heads of the sheep, the output has even fewer merged detections, but a few sheep were still not detected, with an overall headcount of 416, i.e. the model detected 96% of the sheep as individual objects.

//Fig. 8: Using circles to annotate the heads of the sheep renders even even fewer merged detections than the previous two annotation methods.

Fig. 8: Using circles to annotate the heads of the sheep renders even even fewer merged detections than the previous two annotation methods.

Best practices and pitfalls

Training and customising an AI detection model is an iterative process: you will need to explore and test what works best for each type of object you want to detect.

There are however certain mistakes you can avoid:

  • Not defining training areas: Even if you annotated objects, without training areas the model will see nothing.
  • Defining large training areas containing a few small annotations: When you define training areas that contain examples, as a rule of thumb, 20% to 40% of the space inside the training area should be covered by annotations. If you want to add a counter-example area, add a separate one selection.
  • Defining very small training areas or wrapping a training area around a single object: the training area should contain the objects of interest and a fair amount of background. This will help the model understand how the context area in which it would find the object of interest looks. Try to balance the size of the area of annotated and non-annotated elements.
  • Defining too many areas containing examples and very few containing counterexamples, or vice versa: it’s all about balance and making sure you include a variety of both examples and counterexamples.
  • Annotations containing very few pixels are likely to give bad results: consider the size of the object and the image resolution. Fewer pixels provide less information to the AI model for it to understand how the pixels you selected differ from other sections of the image.
  • Not annotating all of the examples contained in the training area or not annotating examples that are not fully contained in a training area: You want all of the examples to be considered. If you do not annotate them they will be considered counter examples:
  • Overlapping annotations when the end goal is counting individual objects.

Build your own AI detector

Experiment to discover what type of annotations work best for the type of object you need to track and for the context they are in. You might be trying to detect a type of object that has a totally different shape, pattern and colour. These objects might appear distributed throughout an image or might be grouped in a different pattern.

There are many possible variables, but the good news is the custom detector tool allows you to experiment, tweak and fine-tune the model to your needs. As you build and refine your detector you will gain experience and intuition and learn how to best take advantage of the power of AI.

For a closer look at the annotations and the outputs generated in the above example, visit the project site here. You can find step by step instructions on how to build your own Picterra AI detector here.

Contact Veronica Alonso, Picterra, veronica.alonso@picterra.ch

Related Articles

  • Support a budding scientist and help build a skilled South Africa
  • Invitation to attend SANEA’s Carbon Tax Colloquium
  • Power and water: Leliefontein pump-as-turbine station
  • Understanding the applications and benefits of ground penetrating radar
  • Corpses block main gate to Eskom’s HQ