The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Voi. XXXVII. Part B7. Beijing 2008
Image A l mag e B
Figure 1. Generic framework of region based image fusion
(Piella,2003 )
First, the source images are decomposed by wavelet to get the
approximate and detailed sub-images; and then segmentation is
carried for these sub-images to get the regions of each level.
These regions are used to guide fusion process. The activity
level and match degree measure of the wavelet coefficients of
source images are computed in these regions; and the maximum
value rule and the weighted average rule are respectively used
to combine the coefficients of detailed sub-images and
approximate sub-images. At last, the combination coefficients
are inversely transformed by wavelet to obtain the final fusion
image.
The choice of segmentation is vitally important because it
directly influences the fusion decision. An appropriate
segmentation will give useful information to image fusion,
while an inappropriate segmentation will provide misleading
information to guide the fusion process. Currently, the popular
image segmentation methods used in the region based image
fusion framework (Zhang, 1997; Piella,2003; Wang,2005;
Lewis,2005) are c-means clustering, watershed algorithms, and
Canny edge detection method. But these segmentation methods
can be substituted by others. The selection of appropriate
segmentation method is the first issue to be considered.
Moreover, in the traditional region-based fusion framework, the
effect of segmenting sub-images will be more serious than that
of segmenting original images because sub-images contain
lesser information as the number of decomposition level
increases. There may be inaccuracy in segmented regions at
each level no matter what segmentation methods are used.
When inversely transformed by wavelet, the inaccuracy will
increase level by level. To reduce the inaccuracy is the second
issue to be considered.
Formalization of appropriate rules to guide the fusion process is
the third issue to be considered.
To develop a more robust technique for the fusion of high-
resolution, in this study, the following strategy is adopted:
• Use of mean shift segmentation to substitute Canny
segmentation;
• use of the original input images to get the binary image of
shared region and then map of the shared region image to
each level by down-sampling to ensure the consistency of
segmentation at each level; and
• use of Structure Similarity Index Metric (SSIM) proposed
by Wang,(2002, 2004) to guide the fusion process instead
of region match measure because SSIM has more physical
meanings.
3. MEAN SHIFT SEGMENTATION FOR EXTRACTION
OF FEATURES FROM HIGH-RESOLUTION IMAGES
Mean shift analysis is a newly developed nonparametric
clustering technique based on density estimation for the analysis
of complex feature spaces. It has found many successful
applications such as image segmentation and tracking
(Comaniciu,1999; Luo,2003).
The mean shift procedure is an adaptive local steepest gradient
ascent method. The mean shift vector is computed by the
following formula:
m
h,G
(x) = —h 2 c
V/>,*(*)
f„.o (X)
(1)
Where the subscripts G and K are kernels, their
A
corresponding profiles satisfy g(x) = —k (x) ; V f h K is
A
the density gradient estimator of kernel K ; f h G is the
probability density of new kernel G ; h is the bandwidth and
C is a constant; X is the centre of kemel(window).
It indicates that, at location x, the mean shift vector computed
with kernel G is proportional to the normalized density
gradient estimate obtained with kernel K . Therefore, to get the
A
direction of V f hK (x) , only the vector 171 h G (x) should be
calculated. The mean shift vector thus always points toward the
direction of maximum increase in the density.
The mean shift procedure is achieved by a 2-step iteration:
1) Compute the mean shift vector 171 h G (x) ,
2) Translate the kernel (window) G(x) by 171 h G (x) until
convergence.
Since the control parameter has clear physical meanings, both
gray level and color images are processed by the same
algorithm. An image is typically represented as a 2-D lattice of
p-dimensional vectors (pixels). When p=l, it denotes grey
image. When p=3, it denotes color image. When p>3, it denotes
multi-spectral image. The space of lattice is known as the