de Bie, Kees
6. Multiple linear regression to predict Ln(Yield)
Multiple regression of the 29 sites that had positive yields resulted in a yield model
with an adjusted-R? of 88.396. The model reads (all coefficients with P « 396):
Ln (Yield ‘000 Bath/ha) = - 2.76 + 0.44*SLO - 0.021*SLO? + 0.78*TXT - 1.21*TER + 0.37*pH
+ 3.10*CAN + 0.92*TRA + 2.05*MOT + 0.80*PRU
Where:
SLO = Slope (%) within the orchard
TXT = 1 if top-soil texture is SCL (not LS, SL, C, or SC)
TER = 1 if terrain is footslope (not terrace or hill)
pH = pH of the topsoil
CAN = 1 if canals were present in the direct proximity of the orchard
MOT = 1 if pest control is carried out by motor sprayer
PRU = 1 if pruning of trees is done
TRA = 1 if weeding with a tractor (not manual)
The equation suggests that yields improve if:
e The slope in the orchard and the pH of the topsoil are relatively high;
e The orchard is situated on Sandy Clay Loam but not on footslopes;
e Canals are present in its direct proximity;
e Management includes weeding by tractor, pest control through use of a motor
sprayer and pruning.
The equation is put to use to estimate yields for 16 sites that had "O" yields
(Figure 11?'^). Estimated yields of both yield categories were similarly distributed
and the two drawn normal distributions are not significantly different (P of 6696 that
they are identical?) The Ln(Yield) estimates range from -2 to 6, indicating that the
model predicts very low actual yields for several mango orchards. It supports that
orchard yields follow a lognormal distribution and that observed "O" yields
represent very low actual yields that are not commercially relevant. Results
suggest also that additional parameters are needed to break the two categories
down. Joint use with the logistic model will result in error propagation, i.e. the joint
predictive power will be as low as 54% (62% * 88%). This low predictive power
makes it attractive (to attempt) to fit a linear multiple regression model through all
yield data without previous stratification (see next section).
Logistic and multiple regression models share the independent parameters
“slope”. In both cases steeper slopes increase the probability to obtain higher
yields; in the first model the impact of slope is greatest on slopes of 10% or
steeper whereas in the latter effects are greatest if slopes are from 0-5% (Figure
11°). Joint use of the models will likely nullify these effects.
Kolmogorov-Smirnov Two Sample Test.
International Archives of Photogrammetry and Remote Sensing. Vol. XXXIII, Part B7. Amsterdam 2000.
333