4. CASE STUDY
4.1. Data
The dataset we have used to test the algorithms is the lidar point
cloud of Bloomington, IN, U.S.A, obtained from the Indiana
Spatial Data Portal. It is provided by Monroe County from
topographic survey flights by MJ Harden on April 11-12, 2010
with an Optech Gemini system. The maximum planned post
spacing on the ground is 1.4 m for unobscured areas. Reported
vertical accuracy based on the accuracy assessment carried out
is 0.347 ft/10.58 cm. The point cloud is provided in LAS 1.2
format. The point coordinates are in NAD 1983 HARN
horizontal datum and NAVD 88 vertical datum projected on the
Indiana West State Plane Coordinate System. The average point
density that we have calculated from the subset of the data is
1.4 pts/m?. Figure 3 shows the study area.
Figure 3. Orthophoto (left) and height colored lidar point cloud
(right) of the study area.
4.2. Point features
There are numerous feature definitions in the literature that are
proposed to represent various local properties of points. Similar
to surface variation, several other features are defined as a
function of the eigenvalues of the covariance matrix of the
point neighborhood. From these, we select to use the structure
tensor planarity (S.T.P) and structure tensor anisotropy (S.T.A)
features (West et. al., 2004) to identify "surface" and "scatter"
points. They are defined as
ST... Afix T4 (1 74/1 (5)
where 13 < 42 € A.
4.3. Graph construction
Adapting Boykov et. al's (2001) min-cut optimization
algorithm, we establish our graph model as follows. Each point
in the point cloud is considered as a node in the graph. The
edges between the nodes are defined such that each node is
connected to its 3-D voronoi neighbors with an edge. We use an
edge length threshold calculated as the Euclidean 3-D distance
between the two points of the edge in order to avoid very long
edges and reduce the number of unnecessary edges on the
graph. All points are also connected to the nodes representing
labels. For each point, costs for assigning each label to that
point are calculated and set as the weights for the edges that
connect the points to the label nodes. These weights correspond
to the data cost term of the energy function and will be summed
at each candidate labeling configuration. The edges that
connect points with each other are also assigned weights
representing the smoothness term of the energy function. We
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
use two different data cost functions for two labeling tasks. The
first one is for the labeling of points based on their proximity to
the clusters in the feature space. It is calculated for point p as
Dp, = Ili = Xp (6)
where y is the label mean and x, is the feature vector.
In order to calculate the data costs this way, one needs to know
how the data are distributed in the feature space. In a similar
way to Dai and Zhang’s (2006) clustering approach for image
segmentation, we perform watershed segmentation in the
feature domain to empirically determine how the data are
clustered. For a feature vector with N dimensions, we generate
a grid space in the feature domain and calculate a histogram
with the counts of data points falling within each bin. Then we
perform watershed segmentation on the complement of the
histogram to obtain the clusters in the feature space. We take
the clusters obtained by the watershed segmentation and
calculate label means from these clusters. Due to the nature of
the watershed segmentation, data points that fall on the
boundaries of the watersheds are left unlabeled. These data
points are not considered while calculating the label means.
Once the label means are determined, the data costs are
calculated and the points in the point cloud are optimally
assigned labels with the graph cut optimization algorithm.
The second cost function is used for labeling points with
respect to their likelihood of being from a given distribution.
After segmenting the points, we use the histograms of the
feature values of the segments representative of their classes
(i.e. “scatter” and “surface”) to calculate the data costs of
assigning a point to each label as negative log likelihoods. Our
feature vector consists of S.T.P. and S.T.A point features for
this purpose. Data costs are calculated as
D = —In Pr( |scatter) (7)
Pscatter
= —InPr(x,|surface) (8)
Psurface
Smoothness energy is calculated as
_lxp-xal, 1
vi; (6.5) = e za d(p,a)
(9)
where d(p, q) is the Euclidean distance between point p and q
in the spatial domain.
Smoothness cost is to be interpreted as a penalty for
discontinuity between two neighboring sites. If the labeling of
two neighbors is different (discontinuity), this type of
smoothness cost assigns a large penalty/cost to the edge if the
features of two nodes are close to each other (determined by the
parameter c). This means that labeling of these two nodes will
tend to be the same since it will cost more to label these nodes
differently which will increase the total energy. On the other
hand, if the two nodes are far from each other in the feature
space, then the smoothness function will assign a small penalty
to this edge allowing the two nodes to be labeled different than
each other since such labeling will contribute to a lower energy.
Once we construct the graph with all data and smoothness
costs, we use the software library implementation provided by
Veksler and Delong (http://vision.csd.uwo.ca/code/) based on
the al
(2004
graph
We 1
“surf
smoo
to th
relati
coheı
show
“surf:
860 -
840 -
Fig
Two
meth
appe:
point
large
point
This
of ar
After
Secot
point
comp
eiger
covai
varia
of th
oz:
We :
wate
vectc
initia
optin
does
segrr
algoi
and
smoc
Figu
area.
In th
grap
form