Census geography: What constitutes the ideal output area for South Africa?

July 30th, 2019, Published in Articles: PositionIT

A specific designed output (spatial) layer to disseminate census data is aimed at enabling governmental departments, municipalities, various planners and researchers, with the smallest geographic areas that will be linked to census data to facilitate better and more detailed planning and decision-making processes. By not using the census operational layer, i.e. the Enumeration Areas (EAs), as building blocks in the creation of these output areas, they do not inherit some of the characteristics of the EAs.

Statistical agencies are responsible for the creation of census output areas for dissemination of population statistics in such a way that confidentiality of individuals are maintained but at the same time provide small area data to various users with diverse needs. This will inevitably result in competing criteria that need to be prioritised to satisfy most users.

Conceptual considerations in building block design

Aspects that influence the design criteria for building blocks such as compact shapes and recognisable boundaries are mostly needed for mapping and fieldwork purposes, while statistical analysts prefer homogeneity of population size for example. Practical considerations often dictates the aspects of building block design rather than the conceptual aspects according to Cockings et al [1]. Another aspect that could influence the design is the intention to use the building blocks for publication of other datasets as well as for population data.

Decisions need to be made whether to include zero population blocks in the aggregation process to form output areas, whether there should be a physical size limit to the output areas, as well as population threshold in low density areas and whether the output layer should be contiguous or allowed to have empty spaces. According to Cockings et al, countries with large areas of unpopulated land tend to allow their building blocks to be unpopulated as this provides a more appropriate base for mapping and analysis [1].

Keep in mind that aggregations to large areas create a loss of spatial and attribute detail as averages tend to hide significant geographic variation between different area types. Detail information gets lost in the generalisation, especially if not homogeneous. Different statistical results could be generated from the same set of data when the information is grouped at different levels of spatial resolution such as the various levels of census geographies, the so-called modifiable areal unit problem (MAUP). The effects of the MAUP are referred to as the scale and zoning effect. These problems can be reduced if person or household data is used for analysis instead of already aggregated data such as EAs.

Most countries follow some nested hierarchy that co-inside with the different levels of administrative boundaries. It becomes challenging if these boundaries change regularly and the building blocks or output areas needs frequent alignment.

The availability of appropriate spatial data in the public domain will also play a role in what data sources can be used to generate the building blocks and to distribute the product unrestricted. The availability of sufficient spatial data will also determine to what extent an automated method could be used or not.

International examples of output geographies

Canada and the UK (England and Wales) are chosen as examples of purposeful designed dissemination geographies, as it seems possible to replicate some of these aspects in South Africa.

Canada

Until 1996, Canada used the EAs as primary collection areas as well as basic dissemination areas (DA). In 2001, they created separate collection and output geographies by using the result of the “block programme” as building blocks. The block programme geo-referenced all dwellings to specific blocks that are polygons formed by the intersection of streets. The aim with the design criteria for these DAs was to increase temporal stability, reduce area suppression and get more uniformity, as well as using intuitive boundaries to achieve compactness and homogeneity. Not all of these criteria could always be adhered to simultaneously and some trade-off conditions were implemented: for example, that the DA will respect census subdivisions and census tract boundaries. The resulting areas with a population count of less than 40 persons had their characteristic data removed: so-called area suppression. A minimum of 500 persons has been stipulated for a DA. The lowest level with characteristic data is the DA. Population and dwelling counts are released by block, but with no characteristic data.

United Kingdom (England and Wales)

The Census Offices of the United Kingdom and other researchers did an extensive volume of research and user consultations to introduce a number of innovations for the 2001 census that were refined for 2011. Enumeration districts (ED) were designed for census fieldwork purposes. Levental concluded that their variation in population size as well as composition make them less than ideal as a base for analysing the data [2]. The new small output areas or dissemination geography for the UK are known as Output Areas (OAs). They used the postcode areas as primary building blocks. OAs are designed such that each contain around 125 households, populations are to be as homogeneous as possible in terms of tenure and dwelling type, and areas have regular shapes and follow “natural” boundaries where possible. The exception is Scotland, where the size is around 50 households. These OA boundaries are nested within their administrative area hierarchy, i.e. civil parishes/communities, wards and local authority districts.

Various design procedures used for the UK output geography are described in detail [1, 2, 3]. The use of a geographical information system (GIS) and automated zone-design technique were central to the process. The AZtool was developed for this purpose and made available to other countries to use in their development of building blocks.

As indicated in Table 1, New Zeeland and Australia also followed by implementing the concept of split geographies using what they call “mesh blocks” to aggregate to bigger areas. The USA uses the census block as building blocks for dissemination on higher levels of geographies while Denmark, Finland and Northern Ireland used grid squares of various sizes as building blocks.

Table 1: Characteristics of building blocks commonly employed internationally [1].

Building block type Country Country –specific name Scale/size (year) Method of creation Design characteristics Relationship to key output zones
Postcode England and Wales Unit postcode Ave 17 delivery points (2001) Automated Synthetic postcode polygons. Aggregations of address polygons. Aligned with topographical features where possible. Nested within administrative boundaries (electoral wards and civil parishes, where they exist). Aggregate to census output zones (output areas, super output areas).
Northern Ireland Unit postcode Ave 17 delivery points (2001) Automated Synthetic postcode polygons. Aligned with topographical features where possible. Nested within administrative boundaries (electoral wards). Aggregate to census output zones (output areas, super output areas).
Scotland Unit postcode Ave 15 delivery points (2001) Manual Digitised postcode polygons Aggregate to census output zones (output areas, super output areas).
Street block Australia Mesh block Ave 30 – 60 dwellings (2011) Hybrid Hierarchical design criteria: initial urban/rural split then uniformity of dwelling estimates and land-use key drivers. Based on cadastral boundaries. Aligned to 2011 statistical local areas but this will not be maintained over time. Aggregate to output zones in Australian Statistical Geography Standard.
New Zealand Mesh block Ave 97 people (2006) Manual Boundaries follow cadastral boundaries, centre line of roads, rivers and other physical features Aggregate to output zones in New Zealand Standard Areas Classification.
USA Census block Ave 28 people (2010) Hybrid (mostly automated) Boundaries of higher level geographic areas (e.g., counties, places, voting districts, census tracts, block groups, etc.) must form block boundaries; visible features (streets, roads, streams, and railroad tracks) usually incorporated, depending on predetermined ranking system based on block size and boundary composition. Always aggregate to higher level output zones due to method of creation.
Grid squares Denmark National square grid (100 m) Ave 6 households (2003) Automated 100 m grid squares covering the whole country. Aggregate to larger standard grids or groups of cells meeting Statistics Denmark’s disclosure requirements.
Finland Grid cells (250 m) Mean 16 people (2010) Automated 250m grid squares covering the whole country. Aggregate to 1 km grid but not to other output zones (postal codes, municipal subareas, municipalities).
Northern Ireland 100 m grid Min 25 persons, 8 households (2001) Automated Since 2001, 100 m grid squares for the whole country; previously 100 m for urban areas, 1 km elsewhere. 100 m grids aggregate to 1 km grid; 1 km grid consistent since 1971. Neither are consistent with other census output zones (output areas) — see above.

The current model of statistical geography in South Africa

The standard geography for census data dissemination is currently a nested hierarchy starting with the Small Areas (SAs), Sub-places (SPs), Main Places (MPs), Local Municipalities, District Municipalities etc. (see Fig 1) – all collated as aggregated data from the EAs that were designed to manage census operations.

Fig. 1: South African census geography hierarchy.

The implication is therefor that the characteristics of the EAs are by default shared by that of the subsequent SAs. That includes the classification in terms of land management and land use, i.e. the geography types: urban, farms and traditional. Within this broad classification are the EA types as a sub classification: formal residential, informal residential, traditional residential, farms, parks and recreation, collective living quarters, industrial, small (and agricultural) holdings, vacant and commercial.

The issues with the current output geography of South Africa was highlighted by Avenell, who analysed the SAL of 2003 in the process of conducting research on deprivation in South Africa [4]. About half of the EAs are identical to the SAs and the remaining EAs were merged in various combinations to comply with the population requirement or 500. Problems such as fragmented EAs and therefore SAs were detected: the non-contiguous geographic structure created a problem when merging different EAs, for example, isolated small villages comprising of only one EA surrounded by another EA (mostly vacant). These so-called “island EAs” created problems with merging – especially if more than one island occurred within the same larger surrounding EA.

To eliminate some of the issues, vacant EAs and EAs with less than 10 people were omitted in the creation of the SAs of 2011. The fact that the layer was not contiguous created another set of issues for spatial analyses and visualisation, and a second spatial layer was released with EA boundaries to fill the gaps but no data. An effort was made to join EAs of the same EA type and only mix types where the population threshold was an issue. The SAs had different thresholds, as outlined in Table 2, a direct influence of the demarcation specifications for the different EA types. The SP boundaries were also adhered to as data is released on the SP level and upwards in the hierarchy.

Table 2: Small area population ranges per EA type used in 2011 for South Africa.
EA Types Minimum pop (EA average – EA standard deviation) Maximum pop (EA average + EA standard deviation)
1 Formal 300 900
2 Informal 250 921
3 Traditional 300 785
4 Farms 150 996
5 Parks and recreation 50 192
6 Collective Living Quarters 300 1017
7 Industrial 100 303
8 Small-holding 50 681
10 Commercial 200 637

 

Proposed process for consideration for South Africa

South Africa’s post code areas are vast and bigger than the SP areas and can therefore not be considered as building blocks such is the case in the UK. Stats SA could however use the automated process used by the UK and use the road system and other infrastructure to create blocks like Canada did. A spatial data audit was conducted and some experimenting in polygon creation nationally proved it possible to create blocks by using the road centrelines, railways and major rivers. More than adequate data exists in urban areas. Although enough roads, etc. are spatially available in farm areas, due to safety issues and difficulties to access (assuming that these blocks will also be used to build sample units), it is proposed to use farm boundaries for easier contact and to use the roads and place name outer boundaries in traditional areas due to the lack of cadastre indicating the extend of the settlement.

An important task will be to update the structure frame (DF) for position and type of structure as well as the number of units before the next census in 2021. This is a critical component of the building block creation. If the census records are not geocoded to the structure position or address point, the lowest level of geo-referencing will be the EA. No disaggregation will be possible to a smaller lower level.

Input from the wider user community on the required specifications of these output areas are needed. What should the population threshold be, area size, other characteristics such as housing type for example, and do they have to fit into the statistical geographies such as the administrative hierarchy as well as the place name areas?

The intent is to use the proven automated AZTool to create these zones with hopefully the minimum manual editing or demarcation. Mokhele et al used the 2001 EAs to construct output areas and concluded that based on their findings from different spatial settings and different geographical levels, the AZTool software could be used to effectively and objectively create optimised output areas in South Africa [5].

Conclusion

There is a need to put some serious thought and research into the design and creation of purposeful output areas that will have the most widely use. The following aspects should be taken into consideration with thresholds set for South Africa’s environment:

  • Population and household thresholds for the different geographic levels.
  • Target population (number of households) for output areas, and higher geographies.
  • Physical area threshold to accommodate for very low density populated areas.
  • Homogeneity using intra area correlation scores for accommodation type and tenure.
  • Shape: compactness calculating perimeter²/area. The minimum boundary length set at 10% of the total perimeter of the shared boundaries.
  • Regional constraint: the requirement that lower level output geographies must respect higher-level boundaries (nested).

Priorities of relaxation should be set as it will be unlikely that all zones will be able to comply to all criteria simultaneously. In other words, which requirement should be relaxed first and which are to be absolutely enforced?

The alternative of a grid system should also be investigated. A technical team from Stats SA, the South African National Biodiversity Institute (SANBI), and the National Geo-spatial Information (NGI) component of the Department of Rural Development and Land Reform (DRDLR), developed the Basic Spatial Unit (BSU) frame as part of the System of Environmental-Economic Accounting (SEEA). It is a 100 x 100 m grid that will be used to harmonise data from a range of different sources to be able to compare different types of data with one another as well as being a standardised area for temporal comparisons as well as between countries, for example.

References

[1] S Cockings, A Harfoot, D Martin and D Hornby, 2013. Getting the foundations right: spatial building blocks for official population statistics. Environment and Planning A, volume 45, 1403-1420. https://doi:10.1068/a45276
[2] Leventhal B, 2003. Developments in outputs from the 2001 Census. International Journal of Market Research, 45.1 (Spring 2003): p3.
[3] DJ Martin, 1998a. Optimising census geography: the separation of collection and output geographies. International Journal of Geographical Information Science 12, pp. 673-685.
[4] D Avenell, M Noble and G Wright, 2009. South African datazones: A technical report about the development of a new statistical geography for the analysis of deprivation in South Africa at a small area level, CASASP. Working Paper No. 8, Oxford: Centre for the Analysis of South African Social Policy, University of Oxford.
[5] T Mokhele, O Mutanga and F Ahmed, 2016. Development of census output areas with AZTool in South Africa. S Afr J Sci., 112 (7/8), Art. #2015-0010. http://dx.doi.org/10.17159/sajs.2016/20150010
[6] DJ Martin,1998b. 2001 Census output zones: from concept to prototype. Population trends, 94, pp 19-24.
[7] CD Lloyd, 2016. Spatial scale and small area population statistics for England and Wales. International Journal of Geographical Information Science, 30:6, 1187-1206. https://doi.org/10.1080/13658816.2015.111377
[8] L Oliver, 2001. Shifting boundaries, shifting results: The modifiable areal unit problem. http://www.geog.ubc.ca/courses/geog570/talks_2001/scale_maup.html
[9] AS Rao (sine anno). What do you mean by GIS Aggregation? http://www.publishyourarticles.net/knowledge-hub/geography/what-do-you-mean-by-gis-aggregation.html
[10] Statistics Canada, 2011. Data Quality and Confidentiality Standards and Guidelines (Public). 2011 Census Dissemination. www.statcan.gc.ca
[11] H Verhoef and A van Eeden, 2015. Identifying the challenges of creating an optimal dissemination geography for census. South African Journal of Geomatics, Vol4, No 1, 50-64. http://www.sajg.org.za/index.php/sajg/article/view/215/136

Contact Helene Verhoef, Tel 012 310-8952, helenev@statssa.gov.za

Related Articles

  • Hackathon prepares learners for fourth industrial revolution economy
  • Geospatial information is crucial for Africa’s economic development
  • South African engineering excellence celebrated
  • National development plan to be reviewed
  • How to create modern data systems for sustainable development