Supporting decision making with innovative data products

July 14th, 2017, Published in Articles: EE Publishers, Articles: PositionIT


To be able to adequately plan for the future development of South Africa, we need to have a comprehensive suite of data providing insight into the past developments; the present demographic, socio-economic and land use activities as well as a forecast illustrating development potential. This data needs to be packaged and presented in a way which assists decision makers and policy formulation in understanding an area holistically whilst offering the necessary insight into the actual drivers and factors influencing development. As the data comes from diverse data sources and has to be presented in various ways, a standardised reporting frame was envisaged which will make the information easy to use and analyse. To provide a standardised reporting frame, the Statistics South Africa (StatsSA) Enumeration Areas (EAs) were used and all data has either been aggregated or disaggregated to this level for ease of use.

South Africa is a developing country where the pressure to develop residential, commercial, mining and industrial areas conflicts with environmental and sustainability goals, and these diverse factors need to be merged to ensure a country which can prosper in the future, whilst providing the necessary infrastructure for people to live and thrive in. The Spatial Planning and Land Use Management Act (SPLUMA) and the National Development Plan (NDP) seek to provide guidance in creating such a future. However, they require detailed information in order to support the myriad of developments.

Table 1: Diverse datasets integrated into the RSA EA Summary dataset.
Category GTI Source Content
Measure of activity GTI (Land cover and land use) Agriculture
Residential (villages, small holdings, informal, formal)
Percentage built-up
DEA Conservation
Demographics StatsSA Night-time population
GTI Daytime population
Daily population change
StatsSA Race
Household income
Individual income
 GTI Neighbourhood segmentation
Historical change GTI (Land cover and land use) Change in land use (agriculture, forestry, conservation, mining, residential, commercial, industrial)
Growth potential GTI Probability of land use change (high, medium and low)

Over and above trying to understand demographic, socio-economic and activity indicators (land uses) in an area, we need to understand both the historical perspective and the future scenarios to be able to fully understand the status quo and the development pressures. Friedrich Nietzsche stated that “the future influences the present as much as the past”, and we need to ensure that we provide the people tasked with decision making and policy formulation with the necessary data and tools to be able to plan accordingly.

Fig. 1: Infographic of land cover change in South Africa (1990 to 2013/14).

Being able to analyse and model trends, patterns and relationships in a given area relies on being able to integrate diverse sources of data such as land use, land cover, demographics, cadastre, etc. Given the software and analytical skills required, this can sometimes be a daunting task. In the development of these datasets, GeoTerraImage (GTI) has focused on the application of the data (why it has been developed) and strived towards taking the complexity of use (the what and how) and made the data easier to work with. This development has focused on the merging of diverse raster and vector datasets and integrating them into a standardised reporting frame which allows the analyst to evaluate a wide range of attributes or characteristics and make meaningful decisions about an area.


South Africa is rich in geospatial data covering a wide range of thematic and application areas. However, it is sometimes difficult to integrate this data into a coherent decision-making tool for use by application specialists, who are not necessarily skilled in the use of raster processing or spatial modelling tools.

Fig. 2: Land use activity near Steve Tshwete Local Municipality (2013/14). The dark brown circles represent agricultural centre pivots, the red areas are open cast mines and light brown areas are rain-fed agriculture.

To be able to provide a standardised reporting frame which could be used across geospatial applications (ArcGIS, GeoMedia, MapInfo, QGIS) and non-spatial tools such as Microsoft Excel, we needed to decide on a frame which would provide the necessary level of granularity and that would, however, aggregate data to ensure it became a workable dataset. We did not want to develop our own reporting frame and our initial thoughts of using street blocks or developing our own independent grids or hexagons were quickly discounted as people would then have another set of disparate boundaries to work with and integrate into their existing structures.

It was decided to focus on the existing administrative boundaries such as cadastre, enumeration areas, voting districts, wards, small areas, sub-places and main-places, as these are well known and already being used in the industry. Initially, we considered using suburbs, however, they are not an official administrative boundary layer and do not link to any of the abovementioned boundary datasets.

After researching the various options, it was decided to use the StatsSA EA layer as this grouped cadastre together into fairly homogeneous land uses and could be aggregated into ward, sub-place or main-place level should this be required. As the data would be used by both the private and public sector, we took the opportunity to speak to some role players and realised that most people were familiar and comfortable with EAs and the level of resolution they provided.

In South Africa, the 2011 Census demarcated 103 576 enumeration areas and these were used as the frame for the national RSA EA Summary dataset. These EAs were used to integrate information including landscape (land cover), activity indicators (land use), demographics, socio-economics and historical and potential growth. Refer to Table 1 for details of the information content represented in the RSA EA Summary dataset. Based on the table, one can see that the data is typically presented in a variety of formats and has been captured at a variety of scales. This required varying spatial modelling and analysis techniques to report the data per EA. However, this process would be part of the dataset development and the users would only receive a standardised and consistent reporting environment for immediate use.

Fig. 3: Land use activity near Steve Tshwete Local Municipality (1990). When compared to Fig. 2, one can see that there have been significant changes in agricultural centre pivots and open case mining in the area which indicates a change in socio-economic drivers in the area.

As the data analysis starts bringing together diverse and complementary data within a single reporting frame, it is possible to develop new information such as daytime populations, which provides an indication of the number of people transacting or working in an area. It also allows one to develop a classification or segmentation of neighbourhoods to provide an indication of the spending power within an area and if the area supports either a daytime or night-time population. This is very valuable for telecommunications demand forecasting or electrical load forecasting which relies heavily on an indication of the household size, spending power and the typical appliances used as well as the lifestyle characteristics of an area. This type of information is also crucial when analysing retail or commercial requirements in an area and can be used in support of transport studies for example.

One of the important considerations was that the dataset was developed to assist in analysis, modelling and spatial data enrichment and was not primarily developed to be used in mapping outputs. This may seem to go against most people’s idea of a geographic information system (GIS), however, the majority of spatial analysis applications do not necessarily need a map output and may rather require a graph illustrating change over time or a report highlighting different land uses in a province as an example. Refer to Fig. 1 for an infographic of the changing land cover and land uses in South Africa (1990 to 2013/14). The development of this RSA EA Summary dataset does not replace the original data, which still shows the data at its highest resolution.


As decision makers require meaningful information to assist in planning and analysis, attributes were summarised into appropriate statistics per enumeration area, hereby presenting the area of a given land use or providing a classification of the primary, secondary and tertiary land use. Representing the information content in this way makes it easy to characterise a given area and makes the data easy to work with. The intent was to make a summary dataset which gave an overview of the country’s characterising demographics, population, land use activity and historical and potential growth which can be used to enrich existing data or understand spatial patterns, trends or relationships across the country.

It is never a case of a one-size-fits-all approach to integrating data or providing a mash-up of all available data sources. In this section we will describe the integration and reporting of the information content within the context of the EAs.

Fig. 4: Small Areas are indicated by black polygons and EAs are indicated by red polygons. The proportion and position of residential land use activities in each EA/Small Area are shown by the brown polygon.

Land cover and land use

Using the 2013/14 RSA National Land Cover (NLC 2013/14) dataset, it was possible to provide an indication of the landscape and land use activity within each EA. The NLC 2013/14 dataset is a combination of land cover (landscape) and land use information, providing a seamless and consistent coverage of South Africa and is the ideal source of data for the RSA EA Summary dataset. This data was captured at a scale of 1:90 000, however, it can be summarised per EA to provide an indication of the area of agriculture, forestry, mining, residential, commercial and built-up area in each EA.

Using traditional zonal statistics tools, the NLC data was summarised and reported per EA as the area of each of the land use activities present in the EA. This allows one to have an accurate description of the nature and the socio-economic factors influencing the EA. This takes advantage of existing data to provide a detailed classification of the EA and illustrates factors such as residential versus non-residential land use, or the economic potential (mining, agriculture, etc.) in the area. This is especially important in rural EAs, where the difference between forestry, agriculture and mining is significant with respect to economic potential and its influence on human settlements and economic opportunities. Refer to Fig. 2 which illustrates the various land use activities within EAs in Mpumalanga in Steve Tshwete Local Municipality (previously Middelburg Local Municipality).

In addition to providing the land use areas per individual EA, a methodology was developed to categorise the primary, secondary and tertiary land use activity and to provide an indication of the built-up nature of the EA. This assists in categorising the nature of the EAs into built-up, rural, urban, and so on.

Historical change

As we have access to both the RSA National Land Cover (1990) (NLC 1990) and the GTI area-based land use dataset (2000) for South Africa, we were able to summarise this data and provide indications of the change over time. As it is sometimes impractical to do pixel-based change detection between two datasets, where false change (i.e. seasonal effects) can sometimes be identified and cause confusion, using a summarised reporting unit such as an EA will remove the small insignificant changes and report the actual significant changing nature of an area. Refer to Fig. 3 for a map of the same area of Steve Tshwete Local Municipality which shows the changes over a 23-year period.

This change is reported and is flagged as a change in mining or agriculture, for example, therefore indicating the evolving nature of the area. The change is represented by the various land use activities as detailed in Table 1.

Fig. 5: Growth potential for an area in southern Midrand illustrating the open areas (green), small holdings (yellow) and vacant stands (red).


As StatsSA reported the Census 2011 demographic data per Small Area Layer (SAL), it was necessary to disaggregate this data down to EA level using the area of each residential land use within a given Small Area. The SAL was originally developed to ensure confidentiality when reporting on specific variables, and also accommodates EAs which represent very low populations such as nature reserves or sparsely populated areas. There are 18 669 EAs that were integrated into neighbouring EAs to form the 84 907 Small Areas in South Africa and data needed to be modelled for these EAs.

The process of disaggregation looks at the actual area of residential land use in each Small Area/EA, and using this proportion the age, gender, race, number of households and household income is split into the EAs based on the ratio and type of residential land use. Refer to Fig. 4 for a map of an area in the City of Tshwane (Highveld), indicating the residential land use in the context of the EAs and Small Areas.

Derived information

As one starts integrating the diverse yet complementary datasets, it is possible to model new information based on the spatial relationships, patterns and trends which present themselves in the data. Most of these new datasets are self-explanatory, however, one needs to merge the datasets together to be able to quantify and to a degree qualify the information content. Two of the derived datasets rely on the integration of the abovementioned data to provide insight into the areas of potential change and the daytime population estimates.

Potential for change

By using the existing landscape and land use activity (land use and land cover) data, one is able to identify the probability of land use change in a given area based on certain assumptions and logical inferences. A simple example is the presence of water bodies or wetlands which will limit the ability to develop in an area and which would be flagged as having a low chance of land use change in the future.

Where areas are close to conservation areas or on steep slopes, there is also a very small chance of change due to the protected nature of the area or the topography which will limit the ability to build and develop an area.

As we have access to information showing where and what land use activity is taking place in an area, we are able to identify vacant stands (cadastre) or vacant land in close proximity to existing built-up areas and classify this according to its spatial proximity. This allows us to identify EAs where there is a high probability of land use change in the future. Refer to Fig. 5 for a map of Midrand (Highveld) showing the growth potential in an area as classified by existing land use.

Fig. 6: Summary of a specific neighbourhood segmentation illustrating the age, gender, education and diversity indicators for a given grouping.

Daytime population

As discussed previously, we have modelled the demographic data down to an EA level using the residential land use activity in a given area. As we also have access to non-residential land use activity data, it is possible to apportion the working population (age class 18 – 65 as well as people reported as employed in the Census information) to the various industrial, commercial, mining, agricultural and other non-residential land uses and estimate a working population for an EA.

This is a modelled approach and is based on the number of working people in a given municipality versus the distribution of economic opportunities. This does not take daily migration patterns into account as there is not sufficient data to assist in this regard. However, we are investing resources into finding datasets which can assist in indicating travel patterns and the number of people who move outside of their own municipalities on a daily basis. This is specifically relevant in a province like Gauteng, where the close proximity of three metropolitan areas makes it easy for people to commute relatively long distances on a daily basis.

Future enhancements and innovative developments

GTI has embarked on a project to capture an up-to-date national structure database, based on aerial photography, which will allow the company to provide a higher resolution of land use activity based on the position and classification of individual structures. This will bring the national coverage in line with datasets developed for the City of Cape Town Metropolitan Municipality, eThekwini Municipality and the Gauteng Province where the GTI Building Based Land Use (BBLU) data is used to disaggregate demographic and population data based on the residential land uses and use the non-residential structure count and size to estimate the daytime population to the highest resolution possible.

Having a national dataset underpinning the EA Summaries will provide detailed information which can be used to further quantify land use activities in an area and, when used in combination with demographic and socioeconomic indicators, can provide valuable insight into an area. Over and above this new dataset, various other spatial datasets are being developed which will integrate with the RSA EA Summaries to provide a comprehensive set of data. These include the development of Neighbourhood Segmentation datasets and the identification of new developments as they occur in the country.

Neighbourhood segmentation

When trying to understand an area from a perspective of buying power or the requirements for infrastructure such as telephony or electricity, it is necessary to categorise or classify areas into groups based on income, demographics and lifestyle. Whilst the Census provides data on household size, age class distribution, income, and more, one needs to look at this data in the context of living conditions such as housing type (estates, freehold cadastre, townships, informal settlements, backyard structures) and housing density to clearly model socioeconomic factors such as disposable income and spending patterns. In this context, the term neighbourhood refers to the grouping of EAs into logical groups which represent a common set of characteristics.

Neighbourhoods are also never purely residential by nature, and one has to investigate the complete land use within an area taking this into account when defining the various categories. This is especially relevant as it influences the concept of daytime and night-time population and land use patterns where people may frequent a store in the day with a limited audience in the evening.

Using a combination of demographics, socio-economic and land use activity data we can also understand the diversity within an area. For example, an area with high-income households within an estate would have significantly different spending patterns than a similar income grouping in a typical suburb with a few sectional title developments in it. To make the data meaningful and easier to understand, the various groupings will be compared to a provincial average to illustrate how they are positioned in the context of the greater whole.

This is valuable information when deciding where to position a new shopping centre or cell phone mast as you are trying to understand buying patterns and area-specific requirements. When evaluating demographics, an area with a predominantly older age class distribution (over 50) would represent a different audience to that of a younger age class with school going children, where school fees may account for a large portion of the monthly expenses. Refer to Fig. 6 for a summary of the neighbourhood segmentation for a specific grouping.

Fig. 7: Map of southern Midrand illustrating new developments (black) in context of the potential growth data.

New developments

As mentioned earlier, we are able to provide indications of growth potential by integrating data and providing an estimate of areas of low, medium and high potential for change. This is, however, an indication of where change can and most likely will happen in the future but does not represent actual change. To be able to provide an indication of where change is happening, we have been investigating the use of high and medium resolution satellite imagery to highlight developments as they happen through the identification of significant changes in an area. This process will be repeated on a regular basis (every three to four months) over major built-up areas to be able to identify developments in their initial stages and to provide insight into the changing nature of an area as it happens. Typically, a large development may take one to two years to complete and knowing when the change starts will provide the necessary foresight to plan infrastructure or amenities in the area.

This dataset can be integrated into the growth potential data to highlight where growth and development are happening versus where the potential for growth exists. Refer to Fig. 7 for a map showing the potential for growth data (Fig. 5) in the context of the new developments data (black) showing where development is happening and likely to happen in the near future.


Having access to a variety of data sources integrated into a common framework can assist a geospatial analyst, town planner, retail planner or transport engineer to fully understand an area with its unique characteristics. The ability to integrate land use activity (land use and land cover), population, demographics, socio-economics, historical change and growth potential into coherent decision support tools assists people to see the “bigger picture” and allows analysis both within geospatial and non-geospatial tools.

This overview of historical, present and future scenarios also assists people to plan effectively as one sees the holistic picture of an area. Being able to quantify and qualify certain characteristics makes it very valuable to understand the implication of policy decisions.

Using this consistent reporting frame also allows one to add additional information such as the neighbourhood segmentation and new development data seamlessly and provides for a continual enhancement and improvement of the decision-making tools without a significant change in the data structure and its use.

The techniques and methodologies used in the development of this data can be applied to any other administrative or user defined boundary and this makes the approach very flexible and useable.


This paper was presented at Geomatics Indaba 2016 and is republished here with permission.

Contact Stuart Martin, GeoTerraImage, Tel 012 807-9480,