The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
1106
According to Alparone et al.,(2007), if there is no quality
difference between two images, the value of Mean Bias (MB),
Variance Difference (VD), Standard Deviation Difference
(SDD), Spectral Angle Mapper (SAM), and Relative
Dimensionless Global Error (ERGAS) should be zero, and the
value of Correlation Coefficient (CC) and Q4 Quality Index
(Q4) should be one. Larger values of MB, VD, SDD, SAM, and
ERGAS indicate larger quality difference between two images.
For CC and Q4, however, the worst value is zero.
According to the evaluation criteria of Alparone et al.,(2007)
and comparing the values in Table 2, we can find that:
• Four out of the seven indexes (MB, SAM,
ERGAS and Q4) indicate that the three images Ik-Shift,
Ik-Str and Ik-Str-Shift have different quality than that of
Ik-Orig.
• Two others (VD and SDD) indicate that Ik-Shift
has the same quality as Ik-Orig, whereas Ik-Str and Ik-
Str-Shift have different quality than Ik-Orig.
• Only one out of the seven indexes (CC)
indicates that all of the four images Ik-Orig, Ik-Shift, Ik-
Str, and Ik-Str-Shift have the same image quality.
With such a significant disagreement between the seven
indexes, can they still measure the quality difference or
similarity of two images? If yes, which index should we rely on
and how can we explain the disagreement?
On the other hand, if the seven indexes could tell the quality
difference between two images, i.e. a fused image and the
original MS image, one should be able to easily improve the
values of the measurements by just systematically shifting the
means of the fused images to the desired means of the original
MS images, and/or by systematically stretching the histograms
of the fused images to match the desired standard deviation of
the original MS images. Do these systematic adjustments and
the improvements of the measurement values actually improve
the quality of the image fusion results? Definitely not.
4. DISCREPANCY OF SAM, ERGAS, Q4 AND CC
EVALUATION
Alparone et al.,(2004) introduced a global quality measurement
—Q4 Quality Index (Q4)—for image fusion quality evaluation,
because the ERGAS method failed in measuring spectral
distortion.
In the evaluation of Alparone, et al. (2004), QuickBird MS and
Pan images were first degraded from 2.8m and 0.7m to 11.2m
and 2.8m respectively. The degraded MS and Pan images were
then fused to obtain pan-sharpened 2.8m MS images. The
original 2.8m MS image was used as a reference image (or
ground truth) to compare with the pan-sharpened MS images
for quantitative measurement of the fusion quality. The image
fusion methods evaluated were HPF (High Pass Filter), IHS,
GLP-SDM (Alparone et al., 2003) and GLP-CBD (Alparone et
al., 2003) methods. In addition, the degraded 11.2m MS image
(denoted as EXP) and a modified 2.8m MS image (denoted as
SYN) were also compared with the original 2.8m MS image for
quantitative measurements of the image quality. The modified
2.8m MS image (SYN) was generated by multiplying the 4
spectral bands of the original 2.8m MS image with a constant
1.1. The quantitative measurements are cited in Table 3.
According to the measurement values in Table 3, we can see
that SYN results should be the best (better than the GLP-SDM
and GLP-CBD results), because:
• SYN has the highest CC value, 1;
• SYN has the highest Q4 value, 0.991 (closest to
i);
• SYN has the smallest SAM value, 0°, no
spectral distortion was introduced; and
• although SYN has a higher ERGAS value than
GLP-SDM and GLP-CBD do, this value should not be
overly concerned, because ERGAS failed in measuring
spectral distortion according to Alparone et al. (2004).
When readers compare the SYN, GLP-SDM and GLP-CBD
images with the reference image (original 2.8m MS image)
displayed in Alparone et al. (2004), readers can also see that
the SYN results have the best quality, because the SYN image
is closest to the original true 2.8m MS image in terms of
spectral and spatial information, whereas the GLP-SDM image
contains significant colour distortion and GLP-CBD image is
blurred.
However, Alparone et al.,(2004) stated in the final ranking that
the results of SYN were confusing if ERGAS was compared to
Q4, and both the GLP-SDM and the more sophisticated GLP-
CBD results were the best according to the Q4 index and
correlation measurements. How can readers understand this
ranking? Was this ranking a result of the quantitative
measurements, the visual comparison, or personal preference?
EXP
SYN
HPF
IHS
GLP-SDM
GLP-CBD
cc Ave *
0.845
1
0.814
0.717
0.823
0.912
Q4
0.756
0.991
0.876
0.864
0.885
0.909
SAM(°)
2.17
0.00
2.54
2.97
2.17
1.64
ERGAS
1.793
2.292
1.943
2.540
1.579
1.180
* CC Ave = average CC of the four spectral bands (calculated according to Table III of Alparone et al.,(2004))
Table 3. Quality measurements of the pan-sharpened images (HPF, IHS, GLP-SDM, and GLP-CBD), low resolution MS image (EXP)
and modified MS image (SYN) with the original MS image as reference (data source: Alparone et al.,(2004))