Machine learning for road width capture from aerial photography

August 1st, 2019, Published in Articles: PositionIT, Featured: PositionIT

In this proof of concept, the use of feature extraction through machine learning using convolutional neural networks to classify aerial photography pixels as either road or non-road, and produce a basis for a polygon feature representing the road surface (and therefore road width) is demonstrated.

Aurecon received a brief from a client to build a new GIS dataset that accurately presented their road network, to assist in the management of the asset. There are various approaches to achieve the intended accuracy. One of these approaches was to use existing data sources and machine learning to support the process of data extraction.

Aurecon’s team found the application of convolutional neural networks (ConvNets) yielded the best results. Using ConvNets,  the input data is translated into classified pixels. These then become the base of a raster output where each pixel is either a road or non-road. The conversion of the output raster to a polygon feature based on “road” class generates the road surface and thus the converted road width for the associated segment.

Transport agencies are responsible for assets with high capital value. To manage these assets the agencies require accurate and up-to-date information. This information is the basis for asset management decision-making and financial planning. If this data is of a low accuracy the decisions made have a low confidence level.

Fig. 1: Workflow overview.

There are various methods used to store road network data and complete asset verification. However, looking at the network, one standard that is consistent across agencies and systems is the fact that the spatial representation of the network is typically a polyline, and not a polygon. In this proof of concept, the use of feature extraction through machine learning using ConvNets to classify aerial photography pixels as either road or non-road, and produce a basis for a polygon feature representing the road surface is demonstrated (Fig. 1).

Fig. 2: Abstracting patterns from aerial images.

This project presents the use of machine learning using ConvNets to augment human processes in asset management and geospatial applications, in this case an application to quantify the road surface area (pavement) of a defined road network. The process utilises aerial images as input and translates these into units and layers of abstracted patterns (called kernels). These patterns (see Fig. 2) become the building blocks to enable the machine learning system to classify every pixel of an image as either road or non-road.

Training the model

The ConvNets build patterns to look for similarity or difference from a known set of image data labelled as road or non-road. These patterns are developed in the convolution layers of the neural network applying computational techniques to recognise features within the input imagery.

The end goal is for the process to learn to classify the aerial imagery as either being road or non-road. The machine learning process uses the patterns (kernels) generated as part of its internal processes to filter, sort and classify the input imagery. A classification (as separate categories road and non-road) is returned as a number between 0 and 1 to indicate the certainty of the result. A value approaching 1 confirms high certainty in the result, whereas a value like 0,45 has less confidence in what has been detected. Pixel classification is illustrated in Fig. 3.

Fig. 3: Pixel classification.

Learning through iteration

In order to train the system to classify, the system needs training data: imagery that has known labels (or classifications). The software engineer works iteratively to coordinate this data, define parameters in the machine learning system, and review classification results. Data used to train the ConvNets model is critical for the final classification accuracy. The higher the quality of the labelled data is, the better the performance the model is able to achieve.

In the absence of existing labelled data we used an alternative approach to generate training data as illustrated in Fig. 4. This dataset is later divided into two parts, 75% for training and 25% for validation. The training data is used to optimise the ConvNets model (kernels, weights and biases), and the validation data is used to evaluate the training performance to avoid an overfitting issue. We want the model to “learn” from the training images instead of memorising all the images with labels. When the model’s performance (measured by prediction accuracy) meets expectations, the training is halted. We can then use some new aerial images (also labelled) to test how well it works.

Fig. 4: Generating training data.

In the training scenario an expert and software provides input to classify the pixels within the image. The result of the training is visible by the colour-coded pixels (green = road , purple = non-road and orange = low certainty).

Fig. 5: The application of the trained machine learning process on a sample area and the resulting output.

Using the well-trained model, the samples are classified as road or non-road. If a classification is ambiguous it is returned as having low certainty (which occur most frequently near the edge of a road surface).


Applying the model to each pixel of the aerial photography reveals an image of the landscape, with more or less certainty whether the pixel is road or non-road as a pixel-wise probability map as shown in Fig. 6.

Fig. 6: Pixel-wise probability map.

After applying the model on every pixel of the input aerial images, further image processing techniques, such as noise removal and image sharpening can be applied on the output images to remove the isolated points and groupings (e.g. elements not associated with the road network etc.).

By using the machine learning techniques to build a road surface classifier (ConvNets Model), the calculation time of the road surface area is significantly reduced compared with manual approaches. Moreover, the model can provide a higher level of accuracy, robustness and confidence when it is trained with a high quality dataset. The final conversion of the machine learning output to geospatial layers (i.e. a raster to feature conversion), produces a polygon feature based on the existing road centreline data (see Fig. 7).

Fig. 7: The machine learning output to surface area.

Outcomes included:

  • Repeatable, cost-efficient workflow of road surface estimation
  • Machine learning tools to support expert survey requirements
  • Image processing techniques and machine process for building ConvNets model
  • Additional services and outputs can be augmented with this automated and data-driven approach to help clients understand their asset portfolio.


The author want to express his appreciation for the client’s support for the opportunity to develop this project, as well as Caihao Cui (Chris), Greg More, Chris von Holdt, Peter Wilson, Cheryl Beuster, Julian Hendricks, Hosna Tashakkori, Mohammad Ghasab and Steven Haslemore. Read more at

Contact Kevin Johnson, Aurecon,

Related Articles

  • South African Government COVID-19 Corona Virus Resource Portal
  • Now Media acquires EngineerIT and Energize from EE Publishers
  • Latest GIS further supports electric and gas networks
  • EE Publishers 2020 Year Planners
  • To readers, customers, suppliers, staff and other stakeholders of PositionIT