International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004
coordinates and intensity. This assumption seems to apply to all
kinds of circumstance. In this paper, we attempt to use self-
organizing maps (SOM) (Kohonen, 2001), an unsupervised
clustering technique to make a classification of points with a
point cloud. We adopt SOM training algorithm to group all
points into different categories according to their xyz
coordinates and intensity. Through the trained SOM - a two
dimensional grid of neurons, the similarity of points can be
interactively explored and visualized. Thus we are able to
distinguish different points belonging to different objects.
As a well-developed technique, SOM has found many
applications in various fields such as data classification, pattern
recognition, image analysis, and exploratory data analysis (for
an overview, see Oja and Kaski 1999). In the domain of GIS,
Openshaw and his colleagues have used the approach in spatial
SOM is a well-developed neural network technique for data
clustering and visualization. It can be used for projecting a large
data set of a high dimension into a low dimension (usually one
or two dimensions) while retaining the initial pattern of the
dataset. That is, data samples that are close to each other in the
input space are also close to each other on the low dimensional
space. In this sense, SOM resembles a geographic map
concerning the distribution of phenomena, in particular
referring to first law of geography: everything is related to
everything else, but near things are more related to each other
(Tobler 1970). Herewith we provide a brief introduction to the
SOM; readers are encouraged to refer to more complete
descriptions in literature (e.g. Kohonen 2001).
2.1 Basic principle
Let's represent a d-dimensional dataset as a set of input vectors
of d dimensions, i.e. Y — fo. x 2x) where n is the size of
the dataset or equally the number of input vectors. The SOM
training algorithm involves essentially two processes, namely
vector quantization and vector projection (Vesanto 1999).
Vector quantization is to create a representative set of vectors,
so called output vectors from the input vectors. Let’s denote the
output vectors as M = {m,,m,,...m, } with the same dimension as
input vectors. In general, vector quantization reduces the
number of vectors, and this can be considered as a clustering
process. The other process, vector projection, aims at projecting
the k output vectors (in d-dimensional space) onto a regular
tessellation (i.e., a SOM) of a lower dimension, where the
regular tessellation consists of Kk neurons. In the vector
projection each output vector is projected into a neuron where
the projection is performed as such, “close” output vectors in d-
dimensional space will be projected onto neighbouring neurons
in the SOM. This will ensure that the initial pattern of the input
data will be present in the neurons.
The two tasks are illustrated in figure 1, where both input and
output vectors are represented as a table format with columns as
dimension and rows as ID of vectors. Usually the number of
input vectors is greater than that of output vectors, i.e. n¢ k,
and the size of SOM is the same as that of output vectors
without exception. In the figure, the SOM is represented by a
transitional color scale, which implies that similar neurons are
being together. It should be emphasized that for an intuitive
explanation of the algorithm, we separate it as two tasks, which
are actually combined together in SOM without being sense of
one after another.
data analysis to carry out the classification of census data
(Openshaw 1994, Openshaw et al. 1995). It has been applied to
cartographic generalization for building typification (e.g.
Hojholt 1995), street selection (Jiang and Harrie 2004), and line
simplification (Jiang and Nakos 2003). All these studies rely on
the SOM’s ability in data clustering and pattern recognition.
This paper will look at how it can be used for filtering laser-
scanning data in deriving spatial object models from laser-
scanning datasets. The remainder of this paper is structured as
follows. Section 2 introduces the basic principle and algorithm
of SOM. Section 3 presents a SOM-based approach for deriving
different clusters within a laser scanned point cloud. Section 4
presents a case study for validation of the approach. Finally
section 5 concludes the paper and points out future work.
2. SELF-ORGANIZING MAP
Input vectors Output vectors SOM
Kom. c E
Nisi rs S 1.12 ]--- d =
1 © Le
N =
c ‚©
2 S S
= ä
= 6 8
1 o o
- 2 2
ordi ee
n
(d-dimension) (d-dimension) (2-dimension)
Figure 1: Illustration of SOM principle
2.2 The algorithm
The above two steps, vector quantization and vector projection,
constitute the basis of the SOM algorithm. Vector quantization
is performed as follows. First the output vectors are initialized
randomly or linearly by some values for its variables. Then in
the following training step, one sample vector x from the input
vectors is randomly chosen and the distance between it and all
the output vectors is calculated. The output vector that is closest
to the input vector x is called the Best-Matching Unit (BMU),
denoted by m,:
lx m. ll? min (llx m, ID [1]
where l|. | is the distance measure. Second the BMU or
winning neuron and other output vectors in its neighbourhood
are updated to be closer to x in the input vector space. The
update rule for the output vector 7 is:
m,(t 1) - m, (£) +a(r)h,(0[x(4)-m,(6)] forie N (1) [2]
m,(t+1)=m,(t) forie N (1)
where x(f)is a sample vector randomly taken from input
vectors, m;(f) is the output vector for any neuron / within the
neighbourhood Nc(?), and g(r) and n (#) are the learning rate
function and neighbourhood kernel function respectively.
The algorithm can be described in a step-by-step fashion as
follows.
Step 1: Define input vectors in particular their multiple
variables that determine an attribute space.
The input vectors are likely to be in a table format as shown in
Figure 1, where d variables determine a d-dimensional attribute
Internat
space. E
be impo
Step 2:
be used
The size
determi
be easy
(Wilppu
or 2-dir
are allo
default :
Step 3:
At the i
linearly
is impo:
process.
Step 4:
involvin
function
The nun
map la
Howeve
better
Neighbc
‘gaussia
but gaus
h i:
where O
distance
noted th
a functi
and end.
The tra
inversel
10). Fo
adopted
length a
3. SOI
3.1. Pri
Laser s
dimensi
For a ¢
which c;
differen:
the size
together
determir
structure
which r