The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
264
I U t -Y) 2 [!.(¥'-YXY,-Y t )] 2
R 2 =~ — = — 1 = — =
i(T,-Y) 2 itf-ry-iItf-Y) 2
i=1 M /=1
(5)
However, the value of validation coefficient^ is increased
along with the increasing amount of independent variable n (or
sample capacity). Therefore, to reflect as accurately as possible
the fitting degree of model and eliminate the effects of
independent variable amount and sample size on validation
coefficient, the adjusted validation coefficient (Adjusted R
Square) is introduced. Its formula is:
UY-Yf !{n-k-\)
AdjustedR 2 =1—— — (6)
i(Y-Y) 2 !{n-\)
/=1
In the formula, ^ is the amount of independent variables
(number of selected wavebands), ^ is the amount of observe
objectives (number of samples). When the amount of
independent variables is more than 1, the value of adjusted R2 is
less than validation coefficient R2. As shown in the formulas,
the larger n is, the greater difference between R2 and adjusted
R2.
The accuracy of predictive equation is evaluated by the total
root-mean-square error ( 1XiVlk ~ >L ^ ) (formula 7).
RMSE= J—-—- Z(f -Îj) 2
V n—k—\i=1
In the formula, 1 and 1 represent measured value and
predictive value, n is the amount of soil samples, ^ is the
amount of selected wavebands.
After establishment of the edpBnon, variance analysis was also
used to test the regression equation. The hypothesis of test was
that the global regression coefficients are 0 or not 0, and it was
the significance test for the whole regression equation.
3 ANALYSIS AND RESULTS
3.1 Correlation analytic results
The SOM contents of 174 soil samples were measured by
volumetry assay, the minimal value is 0.12% and the maximal
value is 4.86%, and the mean value is 1.18%. The mean square
deviation was 1.12. The correlation coefficient between
measured SOM content and smoothed spectral reflectance at the
range of 350nm-2500nm was calculated according to formula 4.
The results indicated that, the transforms, except the logarithmic
reciprocal of reflectance, all increased correlation of soil
organic matter content to some extent. Among them, the most
significant was the first order differential transforms for
reflectance logarithm. The maximal correlation coefficient
between original reflectance before transforming and SOM
content was 0.72 (at 2137 nm wavelength), while correlation
coefficient between first order differential transforms of
reflectance logarithm and SOM content at 2187 nm was 0.89,
the maximum of all correlation coefficients (Figure 3). This also
indicated that some subtle information obscured in original
spectral data was amplified and made clear after differential
transformation.
500 1000 1500 2000 2500
Wavelertgth(nm)
Figure. 3 Correlation Coefficients between (lg/?)' and SOM
Content
The analytic results also manifested that, SOM content was
negative correlated with spectral reflectance but positive
correlated with the reciprocal of reflectance, and the change
trend of absolute values of both correlation coefficients was
basically consistent. The changes of correlation coefficients
between differential transforms (both first and second order)
and SOM displayed no rule, different from the mild changes in
correlation coefficients of logarithmic and reciprocal transforms.
Its value oscillates between 1 and -1.
3.2 Stepwise regression results
Stepwise regression analysis methods commonly used to
identify the wavebands sensitive to a certain chemical
constituent, and to demonstrate these wavebands has a good
correlation with the concentration of a certain chemical
constituent. Accordingly, we can use these determined locations
of the wavelength (band values) to estimate the concentration of
a certain chemical composition. However, there are two aspects
of deficiency: firstly, there exists overfitting phenomenon in
establishment of regression model. This phenomenon mainly
appears while the sample size is less than the amount of
wavebands. Then spectral reflectance values may not correlated
with certain chemical composition while its noise pattern may
be related to certain chemical composition. This kind of risk is
increasing along with increase of the number of wavebands.
Secondly, the deficiency is highly correlation among wavebands.
An important hypothesis of stepwise regression method is that
some input variables in multiple regression analysis have no
significant impact on output. If this assumption is valid, it is
easy to simplify the model, retaining only those items with
statistical significance. But, in fact, multiple interactions exist