er
ed
he
he
18
M)
ta
el
on
-Î
m
te
data-merged model construction and the second subset
containing the remaining (1/3) for model evaluation.
3.3 Database Building
The GIS database used in the study was constructed by using
ERDAS Imagine software module Layer Stack to overlay
elevation, slope, aspect, terrain position, and vegetation index
layers. The cycad-fern sample layer was overlaid with five
data layers, and those pixels of the five layers lying at the same
position with the cycad-fern pixels were clipped out. To build
statistical models, the sample data for both target groups
(cycad-fern) and non-target groups (background) were taken
from data layers by the random sampling to minimize spatial
autocorrelation in the independent variables (Pereira and Itami,
1991). Because non-target sites (background) correspond to
the vast majority of the study area, larger variation is expected
in environmental characteristics for this group. The number
of non-target pixels (sites) should be three times more than that
of target pixels to increase the probability of acquiring a more
representative sample of the habitat characteristics at non-target
sites (Pereira and Itami, 1991; Sperduto and Congalton, 1996).
3.4 Model Development
The predictive models for selecting potential habitat of CFs
were created using four statistical methods: (1) maximum
entropy (MAXENT), (2) genetic algorithm for rule-set
prediction (GARP), (3) generalized linear models (GLM), and
(4) discriminant analysis (DA). Model development and
validation can be done by split-sample validation approach.
Split-sample validation approach can be implemented via
dividing a dataset into two subsets, the first one (training data)
typically comprising one-half to two-thirds of all data and the
other (test data) comprising one-third to one-half of all data.
The first one is used to build and test a model. The other one
(an independent dataset) is just used to test the model, not used
to build the model.
MAXENT was implemented by using free software MAXENT
(http://Www.cs.princeton.edu/-schapire/maxent/) in the study.
GARP was implemented by using free software
(http://www.nhm ku.edu/desktopgarp/Download.html), named
“DesktopGARP.” GLM was implemented by using free
software (http://gis.ucmerced.edu/ModEco/), and DA was
implemented by using SPSS software package.
3.4.1 Maximum Entropy
MAXENT is a general-purpose method for making predictions
or inferences from incomplete information (Pearson ef al.,
2007). In estimating the unknown probability distribution
defining a species’ distribution across a study area, MAXENT
formalizes the principle that the estimated distribution must
agree with everything that is known (or inferred from the
environmental conditions at the occurrence localities) but
should avoid placing any unfounded constraints. The
approach is thus to find the probability distribution of
maximum entropy—that which is closest to uniform—subject
to constraints imposed by the information available regarding
the observed distribution of the species and environmental
conditions across the study area. MAXENT needs
species-presence data and does not need species absence or
pseudo-absence data per se, but distinguishes between species
presences and random points from a background area using a
probability distribution. MAXENT offers many advantages
and a few drawbacks; the advantages include the following: (1)
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
It needs only presence data, together with environmental
information for the study area. (2) It can use both continuous
and categorical data, and can incorporate interactions between
different variables. (3) Efficient deterministic algorithms have
been developed that are ensured to converge to the optimal
(maximum entropy) probability distribution. (4) The MAXENT
probability distribution has a clear mathematical definition, and
is therefore suitable to analysis (Phillips et al., 2006).
3.4.2 Genetic Algorithm for Rule-set Prediction
GARP has recently seen an extensive use only in recent studies.
It seeks a collection of rules that together produce a binary
prediction (Phillips et al., 2006). GARP uses a set of point
position records of species presence and a set of environmental
layers that might limit the species' capabilities to survive. The
model will use genetic algorithm to search heuristically for a
good rule-set. There are four rules available currently in
GARP software (DesktopGARP): atomic, logistic regression,
bioclimatic envelope, and negated bioclimatic envelope rules, it
uses the rules to search the correlation between species
presence and absence and environmental variables for
predicting suitable conditions for each pixel (Stockwell and
Noble, 1992). It repeats times of statistical calculation based
on runs set by user, and each of runs would generate a
predictive distribution map. The GARP algorithm starts by
inputting an initial set of rules generated by the initial program
(Stockwell and Peters 1999). The first step in the GARP
iterative loop is to select a data set by randomly sampling half
the available data. The next step is to evaluate the rules on the
sampled data.
3.4.3 Generalized Linear Models
GLM is an extended version of linear models that do not force
data into unnatural scales, allow for non-linearity and
non-constant variance in the data. GLM has an assumed
relationship between the mean of the response variable and the
linear combination of the explanatory variables. GLM is
more flexible and better fitted for analyzing ecological
relationships. (Guisan et al, 2002) The assumptions above
are implicit in OLS regression. In GLMs, the predictor
variables Xj (j=1,...,p) are combined to produce a linear
predictor LP which is related to the expected value u = E(Y) of
the response variable Y through a link function g0 :
g(E(Y))=LP=a=X"p (2)
where a is a constant called the intercept
X=/(X},....X,) is a vector of p predictor variables
B={B1..,Bp} is the vector of p regression coefficients
(one for each predictor)
We have written the model for generic variables X and Y; the
corresponding terms for the i” observation in the sample is:
g(ui) - a B ixi taxi... ByXip (3)
3.4.4 Discriminant Analysis