ASSESSMENT OF THE HOMOGENEITY OF VOLUNTEERED GEOGRAPHIC
INFORMATION IN SOUTH AFRICA
L. Siebritz*, G. Sithole®, S. Zlatanova*
* Chief Directorate: National Geospatial Information, van der Sterr Building, Rhodes Avenue, Mowbray, 7705, South Africa -
lasiebritz@ruraldevelopment.gov.za
? Geomatics Division, School of Architecture, Planning and Geomatics, University of Cape Town, Private Bag X3, Rondebosch,
7701, South Africa - george.sithole@uct.ac.za
* GISt, OTB, Delft University of Technology, Jaffalaan 9, 2628 BX Delft, The Netherlands - s.zlatanova@tudelft.nl
KEYWORDS: Mapping, Updating, GIS, Comparison, Open Systems
ABSTRACT:
The potential for volunteer groups to contribute geographic data to National Mapping Agencies has been widely recognised. Several
investigations have been done to determine the geometric accuracy of this data for the purposes of national mapping. Beyond
accuracy, from a production perspective National Mapping Agencies will also be interested in the sufficiency and uniformity of the
data. This paper presents an investigation of whether presently geographic data generated by volunteers is uniform across a country
and whether the rate of production of data is consistent. For the purpose of the test, changes in data of South Africa from
OpenStreetMap are analysed for the period 2006 to 2011. Here only point and line data are considered. The results generally show
that the rate at which data is generated varies in space and time. The results also confirm that volunteers emphasise on the capture of
certain information and that the capture doesn't average out as might be expected. The results also showed that social events, such as
a World Cup, also have the effect of spurring the generation of volunteer geographic data. The implication of these results for
National Mapping Agencies is that they cannot treat volunteer geographic information as being of a uniform standard. How National
Mapping Agencies respond to this will have to be the subject of other investigations.
1. INTRODUCTION
1.1 Background
The growth of social networking on the internet has led to the
creation of collaborative geographic information systems. This
democratization of spatial information has appealed to a
community that is driven by an open knowledge philosophy and
committed to the free sharing of knowledge. The distinct
advantage of these communities is that because of their size
they are able to generate vast quantities of current vector data.
Goodchild (2007) and Goodchild & Glennon (2010) highlight
that that each individual might act as a sensor and the crowd as
a whole can be seen as a sensor network. Citizens can greatly
support the process of data collection but the question arises:
how trustful is the information they provide. Flanagin &
Metzger (2008) suggest that volunteer efforts can be trusted
relying on the ability of the crowd to detect and edit incorrect
information. Heipke (2010) notes that mechanisms like in
Wikipedia can be employed that will encourage this process. At
the same time the author warns that information provided by
locals tend to be of better quality compared to that gathered by
volunteers unfamiliar with the environment.
National Mapping Agencies are the official custodians of
geographic information and they have typically operated as
closed systems. Because of the high costs of vector data
extraction, the increasing demand for evermore current vector
data and the emergence of collaborative GISs, National
Mapping Agencies have been motivated to consider volunteer
geographic data as a source of spatial information for map
updating. Various studies have been done to determine the
quantitative and qualitative qualities of spatial data generated by
a community of volunteers. Studies done so far, have
examined volunteer geographic information against national
mapping standards. However, mapping is also influenced by
personal and cultural traits. For example volunteers maybe
motivated to capture only those features that are socially
important to them such as schools and churches, and ignore
other equally important land marks like museums and
restaurants. Unlike other studies that have sought to determine
the geometric accuracies of volunteer geographic data, this
paper sets out to answer a more fundamental question, “How
differently do volunteers capture data ?"
1.3 Previous Work
Most VGI testing that has been done involves comparing VGI
with official or survey data. This provides a good indication of
the geometric accuracy of VGI within the test data. One of the
VGI initiatives which have been tested by numerous researches
is OpenStreetMap (OSM). The OSM repository has seen a rapid
increase in volunteer contributions over the years. The data is
also freely and easily available for testing. The types of
contributions constitute mainly GPS data as collected by the
public and vector data digitised off aerial and satellite imagery
(Geofabrik 2011).
Haklay & Ellul (2010) did a comparative study between OSM
and Ordnance Survey (OS) (the National Mapping Agency of
Great Britain) in England for the period 2008-2009. The study
measured completeness in terms of the total OSM line length
compared to the OS data. The authors found that affluent areas
see more contributions than socially excluded areas and there is
an even bigger gap between areas of varying affluence when the
comparison uses only attributed road features.
A study by AL Bakri, Fairbairn (2011) compared a subset of
OSM data to OS data and field survey data was used as the
reference data set to find (i) the geometric accuracy of the OSM
data set and (ii) the semantic similarity between the OSM and
OS data sets. The findings of the study were that the OSM data
set had (i) a poor geometric accuracy and (ii) dissimilar
semantics for the area of study.
The method by Haklay & Ellul (2010) was extended to France
by Girres & Touya (2010), comparing OSM data to BD
TOPO® data from Institut national de l'information
géographique et forestière (IGN), but included several other
assessments (e.g. temporal accuracy, logical consistency,
lineage etc). Results showed that although the OSM data is a
518