ISPRS, Voi.34, Part 2W2, “Dynamic and Multi-Dimensional GIS”, Bangkok, May 23-25, 2001
104
vegetation from man-made objects. Many proposed solutions
use image edges as matching features. The problem is that
these edges are usually detected without reference to the
objects themselves. In addition, most satellite images often
contain complex natural scenes with a resolution of at most a
few meters. For these reasons, edges do not provide the
discrimination performance needed. Our approach, therefore, is
to use the NDVI for feature extraction. NDVI allows us to identify
the most effective matching features. The index can reliably
detect man-made objects in satellite images.
Our experiments used satellite images acquired by a
commercial satellite. With the rapid development in remote
sensing, we are easy to get these images. Current satellite
imaging systems can take multi-spectral data over wide areas at
a time with a resolution of a few meters. However, the accuracy
of original position alignment is on the order of 1/25,000 scale
map, which usually means a numerical position of 10 meters.
Reference and feature pixels are used for matching. The former
are calculated by projecting map objects onto satellite images
with their coordinates. The latter are calculated from satellite
images as follows. The NDVI of an N x M satellite image is
given by the discrete function l(x, y), xsix= <,2,3,- - -,N), yely=
{1,2,3,- • -,M}, where l(x, y) is the index value. I(x, y) is
calculated using the following arithmetic operation.
I(x, y) = (IR(x, y) - R(x, y)) / (IR(x, y) + R(x, y)) (1)
IR (R) is the reflectance value in the near infrared channel
(visible red channel) region. I(x, y) are normally used in
identifying vegetation. Therefore, the binarization of l(x, y) leads
to an image containing only vegetation. We use l(x, y) in the
reverse sense to identify man-made objects. The effective
threshold for binarization needs to be variable within the same
image, because it is uneven to the index distribution. N x M
satellite image is divided through n x m window. In this case,
the image contains (N / n) x (M / m) windows. The window size
(n x m) depends on the characteristics of the image. I(x, y) is
binarized using the threshold for each window. Each threshold
is calculated by average filtering toward I(x0, yO), where xO is lx
included in the window region, yO is also ly included in the
window region. A pixel (x, y) is identified as a matching feature,
if l(x, y) is below the adaptive threshold. This binarization is
performed to all windows. In addition, isolated pixels are
eliminated as features to suppress the obvious errors caused by
shadow effects and spectral deviations. The remaining pixels
are identified as feature pixels.
3. MISMATCH DETERMINATION
This section describes our approach to determine the mismatch
between satellite images and vector maps. The general idea is
to determine the most reliable correspondence between
projected map objects and real objects shown in satellite
images by voting. Voting is based on the Generalized Hough
Transform (GHT), which is often used to estimate geometry
conversion parameters. GHT can be used when a known figure
(i.e. template) exists in an arbitrary background and can account
for unknown parallel displacement, rotation, and expansion
(Duda & Hart, 1972). Here, it is assumed that only position shift
(which means parallel displacement conversion) need be
considered, because it is necessary in GIS to use satellite
images and vector maps as they are. The parallel
displacements in the x-direction and y-direction are estimated
separately by one-dimensional GHT.
The displacements are determined as follows. Reference and
feature pixels are extracted as mentioned above. The scan area
is assumed to be bigger than the area that holds the map object
area projected in the satellite image. The differences between
the positions of all these pixels on each scan line are calculated.
Each calculated value is taken as one vote. The peak
corresponds to the number of votes obtained in the scan area.
Displacement candidates are identified as those with high voting
frequency on all scan lines. High vote frequency means that the
voting score exceeds a threshold.
The final displacement is selected from among the candidates.
The selection should pay attention to the consistency of pixel
matching after displacement. Displacement candidates are
estimated using the mean square error of all differences
between corresponding pairs of feature and reference pixels
(displaced). Here, it is assumed that the displacement is correct
if feature pixels are sufficiently consistent with reference pixels.
The mean square error is calculated for each candidate. The
determined displacement is the one with the lowest mean
square error value. Thus, the displacements in the x- and y-
directions are estimated separately.
4. EXPERIMENTAL RESULTS
We performed experiments to verify our approach; actual
satellite images and corresponding vector maps were used.
Fig. 1 shows one part of a typical satellite image (RGB format)
as acquired by IKONOS (Space Imaging). These images have
multi-spectral data (red channel, green channel, blue channel
and near infrared channel) with 11 bit steps and 4 meter (per
pixel) resolution.
Figure 1 An example of satellite image (RGB color)
Commercial 1/2,500 scale maps of the same test area were
used. The maps include topographical data identifying several
types of man-made objects, such as buildings. An example is
shown in Fig. 2. This figure contains some layers with regard to
buildings, houses, and roads (which are man-made objects) to
extract reference pixels.
Following the procedures described in section 2, feature pixels
regarded as man-made objects were extracted from the satellite
image. Fig. 3 shows the red channel of the satellite image in
Fig. 1. Fig. 4 shows the near infrared image. NDVI l(x, y) was
calculated by Eq. 1 using the spectral reflectance values. The
result is shown in Fig. 5, where l(x, y) values have been
converted into gray scale values. The binarization and
elimination of l(x, y) were performed using the 20 pixel x 20 pixel
window. The resulting extracted features are shown in Fig. 6.
The results in Fig. 6 demonstrate that the feature pixels were
extracted comparatively well. Extracted pixels were also quite
accurate; the main errors were excessive extraction and
extraction mistakes. The results show that the NDVI approach
is stable and reliable enough to make classification of man-