Cloud-based monitoring of SA’s water resources

February 21st, 2018, Published in Articles: PositionIT, Featured: PositionIT

Water is a critical and scarce resource in southern Africa and is likely to become even more so as the regional impacts of global climate change become more evident. Being able to accurately and repeatedly monitor available water resources across the entire landscape is a key information requirement for successful water resource management.

GeoTerraImage has recently launched a web-based platform, Mzanzi Amanzi, which provides water resource managers with highly detailed and accurate information on the current status of all surface water features across South Africa, on a monthly basis (see Fig. 1).

The objective of the monthly national water monitoring service is to provide a web-accessible knowledge service that provides quantitative information on the location, distribution and area-defined status of all surface water features across South Africa. This information is generated from high resolution Sentinel satellite imagery using automated image data modelling techniques, and is independent of any field or other external data input requirements. The web-based platform can be accessed at

The provision of regular, high cadence, high detail water resource maps will support accurate and timeous monitoring of the status of local, regional and national water resources. Water surface extents are automatically generated from imagery recorded by the European Space Agency’s (ESA) Sentinel satellites, which have been operational since June 2015. The Sentinel-2 constellation currently consists of Sentinel-2A and 2B, which collectively provide multiple imaging overpasses per month, based on a regular five-day sun-synchronous repeat cycle. When summarised over a month, this high frequency of repeat image coverage minimises potential information losses due to cloud cover.

The 20 m spatial resolution of the imagery supports a minimum water feature mapping size detection capability of approximately 0,5 ha, depending on the water body and its surrounding landscape characteristics.

Fig. 1: The Mzansi Amanzi monthly water monitoring service website.

Fig. 1: The Mzansi Amanzi monthly water monitoring service website.

The web platform currently provides visual and statistical information on the latest monthly surface water coverage, including total monthly surface water coverage, as well as a comparison with previous months’ surface water in order to determine changes in surface occurrence. For example, Fig. 2 shows the changes in surface water area in Theewaterskloof Dam in Cape Town over a six-month period.

Planned enhancements to the service include extending the monthly water comparisons to the last three months, as well as a long-term (maximum water area) comparison. Furthermore, as part of planned collaboration with Ekosource (formerly DHI South Africa), volumetric water measurements will be generated from the surface water area data and supplied through the web platform.

The ultimate goal is to make the service open-access and free to all, including downloads of spatially-enabled, GIS compatible, digital map coverages and tabulated spreadsheets. This however depends on securing sufficient funds to cover both current and future development and operation costs. Until such time, the website will remain open and accessible as a view-only public service. (Interested parties who wish to evaluate sample test data may contact the company to arrange this.)

Platform functions

Each month a new, total surface water area map of South Africa, Swaziland and Lesotho is generated from all the Sentinel-2 imagery acquired during that particular month. A six-week time period is used for each month to ensure sufficient image data input, water representation and exclusion of cloud obscured areas. For example, November 2017 would be represented by all satellite acquisitions between 1 – 30 November, plus the last two weeks of October.

The water detection algorithms have been developed by GeoTerraImage and detect all surface water areas not obscured by floating objects or vegetation (such as water hyacinth islands). The algorithms are based on multiple combinations of spectral indices and associated threshold-based decision rules.

They support the detection of all surface water features, without the inclusion of false-positives from landscape features such as dark terrain or cloud shadow areas, or wildfire burn scars and some bare earth surfaces, which have similar spectral characteristics. The mapped total surface water area represents the combined extent of both natural and man-made water features, including short term appearances within pans and wetlands that are sufficiently observable and detectable within that particular monthly imagery assessment.

Temporary flooding of a shallow pan may not be evident within the final monthly surface water extent if only evident in a single image overpass in that month. This is because the water classification result is derived from the median value image composite, and not from an individual image acquisition.

The monthly surface water coverage is reported and presented as a 20 x 20 m raster cell framework, equivalent to the resolution of the source Sentinel-2 MSI  Level 1-C ortho-imagery, which is accessible in precise UTM/WGS84 map projection format, with quantified spatial location accuracies (

It is important to understand that the water surface is modelled and presented at a 20 x 20 m cell resolution, and therefore horizontal water change greater than 20 m should be detected. It is important to note that vertical change is required in order to determine the quantity or volume of change in a waterbody. The ability to detect a vertical change in water surface levels is entirely dependent on how this vertical change is reflected in terms of horizontal change, which in turn depends on local terrain profiles around a particular water body.

The spatial characteristics of each month’s total surface water are summarised and presented at quaternary catchment level, in km2 units. The quaternary catchment boundaries have been sourced from the Department of Water and Sanitation’s (DWS) “S.A. Quaternary Catchments Database 4 (hca-4)”. Since these catchment boundaries are not 100% accurately matched to the image-observable coastline, it is possible that in some localised coastal areas, the area of reported monthly surface area may include some coastal and estuarine water extents. This reporting issue will be addressed in future platform modifications to ensure that all coastal water is excluded.

Modelling principles

As a result of the image data modelling approach used to minimise cloud obscured data losses, the current month’s total surface water represents the average water surface extent for the month under assessment, rather than the maximum extent that occurred within that month.

Based on the original ten-day overpass schedule of Sentinel-2A, a six-week period represents approximately five potential image acquisitions over the same location. The image acquisition and overpass rate increased to a five-day period in late 2017 when Sentinel-2B became operational – Sentinel-2A’s tandem pair in the dual constellation. This means that it is now possible to acquire up to approximately ten image acquisitions in any six-week period, over the same area, cloud cover conditions permitting.

The median value for each image pixel (per spectral band), from all image acquisition dates within the six-week period, is then used as the final value on which the presence or absence of water is modelled in that month. This is based on the assumption that even if cloud cover has obscured an image pixel on one acquisition date, it is highly unlikely that clouds will have obscured the same pixel on all dates. Hence, the median (rather than average) pixel values will either remove or minimise the occurrence of cloud or cloud shadow impacted pixel values being included in the water modelling calculations. Should a pixel be cloud affected over several acquisition dates so that it is not possible to extract a pixel value for that month, then that pixel is classified in the final output as a “cloud-loss” pixel, which is accounted for in the monthly surface water area calculations and reporting.

Cloud problem effects are further minimised with the application of a cloud-top and cloud-shadow exclusion mask that is generated for each image date and which effectively masks ± 95% of cloud affected areas. However, since the cloud masking process does not guarantee 100% exclusion of cloud and cloud shadow areas, it is deemed necessary to use this approach in combination with the median pixel value approach. The advantage of this approach is that no false positive water areas (resulting from cloud shadow areas) are included in the final water surface area output. The disadvantage is that any given month’s water surface area representation is in reality the median surface extent for the month, and not necessarily the maximum, especially if the significant majority of rainfall occurred in, for example, the last quarter of the month and the preceding weeks had been dry. This will result in the real current, maximum surface water extent only becoming evident in the following month’s water modelling update, which would include two weeks of image data from the preceding month.

Fig. 2: Changes in surface water extents between August 2017 and January 2018 in Theewaterskloof Dam, Cape Town.

Fig. 2: Changes in surface water extents between August 2017 and January 2018 in Theewaterskloof Dam, Cape Town.

Longer-term surface water areas, such as over a six or twelve-month window, can also be generated from the combined, cumulative individual monthly water area outputs. In such instances it is highly unlikely that these longer-term surface water representations will contain any cloud top and cloud shadow data loss issues, due to the high number of surface observations making up the long-term picture.

Data modelling and automation

The core procedural objective has been the full automation of the image data access and subsequent water surface area detection procedures. This has successfully been achieved by utilising cloud-based global image data archives and associated big-data processing analytical capabilities, and removed the need for downloading, preparation and conventional modelling and analysis of large volumes of image data using office-based proprietary software. The result is significant enhancements in procedural efficiencies that require minimal office-based support infrastructure.

Sentinel-2 imagery

The surface water extents are all modelled from ESA’s Sentinel-2 imagery, sourced as 20 x 20 m resolution MSI Level 1C data from the Google Earth Engine cloud-based data platform. Level 1C imagery is all imagery precisely co-registered and provided in standardised Top-of-Atmosphere (ToA) reflectance values, in Web Mercator projection format.

Detection algorithms using decision tree modelling

Decision tree classifiers are predictive modelling algorithms that can be used to generate explicit classification rules, and are ideally suited to developing generic modelling routines for standardised and repeatable classifications of satellite imagery. Typically, a set of training data (i.e. reference samples) are used to generate the ruleset, which can then be applied to larger data populations for repeatable and consistent classification outputs. As such decision tree classifiers are ideal tools for deriving standardised, threshold-based rules for image classification, they can be applied repetitively over time and/or space with the same output content and accuracy.

The water surface modelling procedure is based on a set of decision tree generated rules that have been derived from a comprehensive set of water and non-water feature reference points distributed across South Africa. The reference points are all associated with a single 20 x 20 m image pixel, and represent a wide range of seasonal and geographic water and non-water surface characteristics across the country, which can be determined visually on Sentinel-2 imagery.

The sample points represent the geographical positions at which spectral image characteristics are extracted from the cloud-based image archives in order to characterise and describe seasonally-defined spectral signatures for all land surface conditions. The specific rulesets for spectral water detection, including potential non-water confusion features, are generated using water and non-water spectral reference characteristics as inputs into the decision tree algorithm. The final water-only identification ruleset represents a comprehensive set of spectral threshold-based rules which can be applied to multi-seasonal Sentinel-2 imagery to determine the presence or absence of water in any given image pixel.

Training data

A total of ±60 000 sample points across the South African landscape were identified and used to represent both water and non-water (but with similar spectral characteristics to water) landscapes. The distribution and location of these points covered a wide range of landscape types and associated seasonal conditions to ensure full representation of all spectral characteristics likely to be encountered during image-based water modelling. One example is differences in water colour as a result of depth and/or turbidity. All sample points were visually identified and defined on Sentinel-2 imagery (circa 2016-17) using manual, desktop mapping techniques.

At each sample point a range of spectral values were extracted, based on a pre-defined set of potentially useful spectral indices and individual spectral band values. In some instances, spectral values were extracted for a sample point linked to a specific image acquisition date in order to ensure the correct representation of a seasonally-dependent feature’s characteristics, whereas in other instances, spectral values were extracted for the full seasonal range of feature characteristics.

Both water and non-water sample points were used to ensure that the water identification ruleset generated by the decision tree algorithm was able to accurately extract water features, and exclude non-water features that had similar spectral characteristics to water, such as dark terrain or cloud shadow areas, dark non-vegetated surfaces from both natural and man-made environments, and temporary burn scars from wildfires.

Spectral indices

The list of suitable spectral indices for water and other landscape feature modelling was sourced from various publications, with the final selection based on proven usefulness with Sentinel-2 imagery, and to some degree due to similar spectral input data, the ability to potentially replicate the same processing (if ever required) on Landsat 8 imagery.

The final selection of the most suitable combination of spectral indices for surface water area detection was determined solely from interim outputs generated during the decision tree rule modelling process. The decision tree classifier software used to identify both the optimal spectral input data and generate the final water detection rulesets was the Waikato Environment for Knowledge Analysis (WEKA) suite of open source machine learning software.

Machine learning and optimal rule generation

The WEKA software includes the open source Java J48 version of the C4.5 algorithm. This algorithm, which is considered one of the top performing data mining algorithms, is used to generate decision trees which are ideally suited to spectral image classification applications.

Fig. 3: Decision tree hierarchical rule structure.

Fig. 3: Decision tree hierarchical rule structure.

A decision tree consists of sets of hierarchically branches, each eventually ending with a leaf, which is the end of a particular ruleset. The size of a decision tree is defined by the number of hierarchically-linked branches that collectively represent a single ruleset that defines a classification decision and final outcome. For example, the first five branches of the decision tree in Fig. 3 represent collectively the ruleset for classifying one instance of water, with branch five representing the end-point, i.e. leaf of the ruleset for this water classification decision. Within a decision tree there will be many leaves that collectively describe all the rulesets required to classify, for example, all occurrences of water.

Cloud-based information generation

Mzanzi Amanzi, the monthly water monitoring service, is operational and fully-automated using cloud-based computing and data archive technologies. The cloud-based processing makes use of Google’s Earth Engine infrastructure, which provides access to global image archives, scalable computing power and flexible, large-volume data storage options. This is the most efficient way to support the water monitoring web-based platform and ensure the provision of monthly national coverage information. Image archives on the Google servers include full global records of a range of ESA imagery from the agency’s Copernicus programme, which includes Sentinel.

Within the Google Earth Engine workflow, the GeoTerraImage-developed water detection models are uploaded to the cloud-based system and applied to the relevant imagery in the cloud-based image archives. This approach has many advantages in comparison to conventional desktop procedures and workflows using proprietary GIS and image processing software: A cloud-based approach significantly improves data processing speeds, efficiency and levels of operational automation (using open-source programming languages). Most importantly, it removes the need to download and pre-process imagery. For example, a full year’s database of Sentinel-2 imagery across South Africa, assuming all five-day overpasses generate cloud-free usable data, would be roughly 7500 GB in size, which would impose significant data access, storage and processing challenges to any desktop-based water monitoring process.

At the start of every month the workflow procedures are activated. The cloud-based procedure loads all the Sentinel-2 satellite imagery over South Africa taken during the previous six weeks. The automated process initiates its two steps: first using a specific set of rules to identify and mask any cloud obscured imagery, after which a second set of rules is applied to the non-cloud obscured data in order to identify and classify all water areas at an individual pixel level. The derived water datasets are stored in GeoTerraImage’s allocated Google storage and then synced into the web-based application, at which point it becomes publicly viewable.

Anyone can visit and review the website. Over the second quarter of the year, the company will be implementing further changes and improvements to the website, and users feedback and suggestions will be considered in these future development plans.

For any further information on Mzanzi Amanzi or to gain access to sample data, contact the company’s Elsie Zwennis.

Contact Elsie Zwennis, GeoTerraImage,