nd robust
d of this
he video
ire points
tracking
M on still
efore, the
of this
ints from
procedure
ator have
: matched
rientation
stained in
GPS and
method is
d method
orm more
rol points,
Therefore,
v altitude
that have
r mapping
spectives.
nsing and
ISPRS
rientation
ngs of the
XXXVIII,
ym scale-
er Vision,
ol, 2008.
ision and
6--359
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
AN ACCURACY ASSESSMENT OF GEOREFERENCED POINT CLOUDS PRODUCED
VIA MULTI-VIEW STEREO TECHNIQUES APPLIED TO IMAGERY ACQUIRED VIA
UNMANNED AERIAL VEHICLE
Steve Harwin and Arko Lucieer
School of Geography and Environmental Studies
University of Tasmania
Private Bag 76, Hobart, Australia 7001
Stephen.Harwin Q utas.edu.au
KEY WORDS: Point Cloud, Accuracy, Reference Data, Surface, Georeferencing, Bundle, Reconstruction, Photogrammetry
ABSTRACT:
Low-cost Unmanned Aerial Vehicles (UAVs) are becoming viable environmental remote sensing tools. Sensor and battery technology
is expanding the data capture opportunities. The UAV, as a close range remote sensing platform, can capture high resolution photog-
raphy on-demand. This imagery can be used to produce dense point clouds using multi-view stereopsis techniques (MVS) combining
computer vision and photogrammetry. This study examines point clouds produced using MVS techniques applied to UAV and terrestrial
photography. A multi-rotor micro UAV acquired aerial imagery from a altitude of approximately 30-40 m. The point clouds produced
are extremely dense («1-3 cm point spacing) and provide a detailed record of the surface in the study area, a 70 m section of sheltered
coastline in southeast Tasmania. Areas with little surface texture were not well captured, similarly, areas with complex geometry such
as grass tussocks and woody scrub were not well mapped. The process fails to penetrate vegetation, but extracts very detailed terrain
in unvegetated areas. Initially the point clouds are in an arbitrary coordinate system and need to be georeferenced. A Helmert transfor-
mation is applied based on matching ground control points (GCPs) identified in the point clouds to GCPs surveying with differential
GPS. These point clouds can be used, alongside laser scanning and more traditional techniques, to provide very detailed and precise
representations of a range of landscapes at key moments. There are many potential applications for the UAV-MVS technique, including
coastal erosion and accretion monitoring, mine surveying and other environmental monitoring applications. For the generated point
clouds to be used in spatial applications they need to be converted to surface models that reduce dataset size without loosing too much
detail. Triangulated meshes are one option, another is Poisson Surface Reconstruction. This latter option makes use of point normal
data and produces a surface representation at greater detail than previously obtainable. This study will visualise and compare the two
surface representations by comparing clouds created from terrestrial MVS (T-MVS) and UAV-MVS.
1 INTRODUCTION
Terrain and Earth surface representations were traditionally de-
rived from imagery using analogue photogrammetric techniques
that produced contours and topological maps from stereo pairs.
Digital photogrammetry has sought ways to automate the process
and improve efficiency. Modern mesh or grid based representa-
tions provide relatively efficient storage of terrain data at a wide
range of resolutions. The quality of these representations is de-
pendent on the techniques used for data capture and processing.
The representation improves with resolution and the data capture
technique must be able to accurately determine height points at
sufficient density to portray the shape of the surface. The diffi-
culty faced is that the storage and visualisation become increas-
ingly difficult as resolution increases. The surface must there-
fore be represented by an approximation that resembles reality as
closely as possible.
In recent decades photogrammetric techniques have sought to im-
prove surface representation through automated feature extrac-
tion and matching. Computer vision uses Structure from Mo-
tion (SfM) to achieve similar outputs. SfM incorporates multi-
view stereopsis (MVS) techniques that match features in multi-
ple views of a scene and derive 3D model coordinates and cam-
era position and orientation. The Scale Invariant Feature Trans-
form (SIFT) operator (Lowe, 2004) provides a robust description
of features in a scene and allows features distinguished in other
views to be compared and matched. A bundle adjustment can
then be used to derive a set of 3D coordinates of matched features.
The point density is proportional to the number of matched fea-
tures and untextured surfaces, occlusions, illumination changes
475
and acquisition geometry can result in fewer matches (Remondino
and El-Hakim, 2006). The Bundler software! is an open source
tool for performing least squares bundle adjustment (Snavely et
al., 2006). To reduce computing overheads imagery is often down
sampled. Typically the next stage is to densify the point cloud us-
ing MVS techniques, such as the patch-based multi-view stereo
software PMVS2 ?. Each point in the resulting cloud has an asso-
ciated normal. The point clouds produced from UAV imagery (re-
ferred to as UAV-MVS) acquired at 30-50 m flying height above
ground level (AGL) have a density of 1-3 points per cm?. There
can be in excess of 7 million points in a cloud (file size of 500 Mb).
The point cloud generated can be georeferenced by matching
control points in the cloud to surveyed ground control points (GCPs).
The resulting accuracy is dependent on the accuracy of the GCP
survey or reference datasets and in this case it is approximately
25-40 mm (Harwin and Lucieer, 2012). The accuracy can be im-
proved with coregistration to a more accurate base dataset.
To allow these large datasets to be used it is usually necessary
to convert them into a more storage efficient data structure so
that the data can be used in conventional GIS and 3D visuali-
sation software that rely on a surface for texturing rather than a
point cloud. Grid based (or Raster) and triangular mesh based
data models, such as Digital Surface Models (DSMs) and Trian-
gular Irregular Networks (TINs), are commonly used. After pro-
cessing and classification a Digital Elevation Model (DEM) or a
Digital Terrain Model (DTM) representation of the earth’s sur-
face, without any vegetation or man-made structures, can be de-
Lhttp://phototour.cs.washington.edu/bundler/
?http://grail.cs.washington.edu/software/pmvs/