The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B5. Beijing 2008
746
Segmentation of laser scans offers a slightly different problem,
as the data usually defines the geometric characterization of the
scanned objects. Therefore, the interest is usually in the
primitive extraction, and mostly in planar elements, e.g., Dold
and Brenner (2006) for terrestrial scans and Vossleman and
Dijkman, (2001) for aerial scans. For terrestrial scans Gorte
(2007) presented a method for extracting planar faces using
panoramic representation of the range data. Segmentation into a
more general class of well-defined primitives, e.g., planes,
cylinders, or spheres, is presented in Rabanni (2006). While
being useful for reverse engineering practices it cannot be easily
extended into general scenes.
Since most scenes are cluttered and contain entities of various
shapes and forms, among which some are structured but others
are not, approaching the segmentation problem by seeking
consistency along a single cue is likely to provide partial results.
Additionally, while some entities may be characterized by
geometric properties, others are more distinguishable by their
color content. Those realizations suggest that segmenting the
data using multiple cues and integrating data source have the
potential of providing richer descriptive information, and have
better prospects for subsequent interpretation of the data. We
present in this paper a segmentation model for terrestrial laser
scanning data including range and image data while using
multiple cues. We study how segments are defined when those
sources should be merged together, how those sources should be
integrated in a meaningful way, and ultimately how the added
value of combining the individual sources can be brought into
an integrated segmentation. Results of the proposed model show
that better results than what is obtained by the individual
segmentations can be achieved.
2. METHODOLOGY
The integration of different information sources requires
securing their co-alignment, and association. The first aspect
refers to establishing the relative transformation between the
two sensors. The second suggests that in order to incorporate
the interpretation of the two data sources, both have to refer to
the same information unit. Considering the fact that images are
a 2D projection of 3D space, whereas laser data is three
dimensional, their mode of integration is not immediate.
the scanner and the camera frames (the red and the blue
coordinate systems in the figure respectively) and t the
translation vector (Hartley and Zisserman, 2003).
Figure 1. Reference frames of the scanning system with a
mounted camera.
The projection matrix defines the image-to-scanner
transformation and so allows linking the color content to the 3D
laser points. While this transformation results in a loss of image
content due to changes in resolution, it allows processing both
information sources in a single reference frame and is therefore
advantageous.
2.2 Data Representation
3D point clouds are difficult to process due to varying scale
within the data, which leads to an uneven distribution of points
in 3D space. To alleviate this problem we transform the data
into a panoramic data representation. As the angular spacing in
the ranging is fixed (defined by system specifications),
regularity can be established when the data is transformed into a
polar representation (Eq. (2))
(x,y,z) T = (pcos6cos(p,pc,os6sm(p,pim6) T (2)
2.1 Camera Scanner Co-alignment
The camera mounted on top of the scanner can be linked to the
scanner body by finding the transformation between the two
frames shown in Figure 1. Such relation involves three offset
parameters and three angular parameters. This relation can also
be formulated via the projection matrix P. With P a 3x4 matrix
that represents the relation between world 3D point (X) and
image 2D point (x) in homogeneous coordinates. Compared to
the six standard boresighting pose parameters, the added
parameters (five in all) will account to intrinsic camera
parameters. The projection matrix can be formulated as follows:
* = KR[l|-t]Z = PX (i)
with
with x, y and z the Euclidian coordinates of a point, 6 and <p are
the latitudinal and longitudinal coordinates of the firing
direction respectively, and p is the measured range. When
transformed, the scan will form a panoramic range image in
which ranges are "intensity" measures. Figure 2a shows range
data in the form of an image where the x axis represents the <p
value, tpe(0,2n\, and the y axis represents the 0 value, de{-
n/4,n/4]. The range image offers a compact, lossless,
representation, but more importantly, makes data manipulations
(e.g., derivative computation and convolution-like operations)
simpler and easier to perform. Due to the convenience in data
processing that this representation offers, all input channels are
transformed into it.
2.3 Channel selection
f x and f y are the focal lengths in the x and y directions
respectively, s is the skew value, x 0 and y 0 are the offsets with
respect to the two image axes. R is the rotation matrix between
As noted, different cues can be used to segment the data. These
should feature attributes that can characterize the different
elements of interest or supplement the information derived by
other cues. For the segmentation, three cues are introduced. The
first is the range content, namely the "intensity" value in the
range panorama, the second is the surface normals, and the third