XXII ISPRS Congress 2012: Technical Commission VII

    
er 
ed 
he 
he 
18 
M) 
ta 
el 
on 
-Î 
m 
te 
    
data-merged model construction and the second subset 
containing the remaining (1/3) for model evaluation. 
3.3 Database Building 
The GIS database used in the study was constructed by using 
ERDAS Imagine software module Layer Stack to overlay 
elevation, slope, aspect, terrain position, and vegetation index 
layers. The cycad-fern sample layer was overlaid with five 
data layers, and those pixels of the five layers lying at the same 
position with the cycad-fern pixels were clipped out. To build 
statistical models, the sample data for both target groups 
(cycad-fern) and non-target groups (background) were taken 
from data layers by the random sampling to minimize spatial 
autocorrelation in the independent variables (Pereira and Itami, 
1991). Because non-target sites (background) correspond to 
the vast majority of the study area, larger variation is expected 
in environmental characteristics for this group. The number 
of non-target pixels (sites) should be three times more than that 
of target pixels to increase the probability of acquiring a more 
representative sample of the habitat characteristics at non-target 
sites (Pereira and Itami, 1991; Sperduto and Congalton, 1996). 
3.4 Model Development 
The predictive models for selecting potential habitat of CFs 
were created using four statistical methods: (1) maximum 
entropy (MAXENT), (2) genetic algorithm for rule-set 
prediction (GARP), (3) generalized linear models (GLM), and 
(4) discriminant analysis (DA). Model development and 
validation can be done by split-sample validation approach. 
Split-sample validation approach can be implemented via 
dividing a dataset into two subsets, the first one (training data) 
typically comprising one-half to two-thirds of all data and the 
other (test data) comprising one-third to one-half of all data. 
The first one is used to build and test a model. The other one 
(an independent dataset) is just used to test the model, not used 
to build the model. 
MAXENT was implemented by using free software MAXENT 
(http://Www.cs.princeton.edu/-schapire/maxent/) in the study. 
GARP was implemented by using free software 
(http://www.nhm ku.edu/desktopgarp/Download.html), named 
“DesktopGARP.” GLM was implemented by using free 
software (http://gis.ucmerced.edu/ModEco/), and DA was 
implemented by using SPSS software package. 
3.4.1 Maximum Entropy 
MAXENT is a general-purpose method for making predictions 
or inferences from incomplete information (Pearson ef al., 
2007). In estimating the unknown probability distribution 
defining a species’ distribution across a study area, MAXENT 
formalizes the principle that the estimated distribution must 
agree with everything that is known (or inferred from the 
environmental conditions at the occurrence localities) but 
should avoid placing any unfounded constraints. The 
approach is thus to find the probability distribution of 
maximum entropy—that which is closest to uniform—subject 
to constraints imposed by the information available regarding 
the observed distribution of the species and environmental 
conditions across the study area. MAXENT needs 
species-presence data and does not need species absence or 
pseudo-absence data per se, but distinguishes between species 
presences and random points from a background area using a 
probability distribution. MAXENT offers many advantages 
and a few drawbacks; the advantages include the following: (1) 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012 
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia 
It needs only presence data, together with environmental 
information for the study area. (2) It can use both continuous 
and categorical data, and can incorporate interactions between 
different variables. (3) Efficient deterministic algorithms have 
been developed that are ensured to converge to the optimal 
(maximum entropy) probability distribution. (4) The MAXENT 
probability distribution has a clear mathematical definition, and 
is therefore suitable to analysis (Phillips et al., 2006). 
3.4.2 Genetic Algorithm for Rule-set Prediction 
GARP has recently seen an extensive use only in recent studies. 
It seeks a collection of rules that together produce a binary 
prediction (Phillips et al., 2006). GARP uses a set of point 
position records of species presence and a set of environmental 
layers that might limit the species' capabilities to survive. The 
model will use genetic algorithm to search heuristically for a 
good rule-set. There are four rules available currently in 
GARP software (DesktopGARP): atomic, logistic regression, 
bioclimatic envelope, and negated bioclimatic envelope rules, it 
uses the rules to search the correlation between species 
presence and absence and environmental variables for 
predicting suitable conditions for each pixel (Stockwell and 
Noble, 1992). It repeats times of statistical calculation based 
on runs set by user, and each of runs would generate a 
predictive distribution map. The GARP algorithm starts by 
inputting an initial set of rules generated by the initial program 
(Stockwell and Peters 1999). The first step in the GARP 
iterative loop is to select a data set by randomly sampling half 
the available data. The next step is to evaluate the rules on the 
sampled data. 
3.4.3 Generalized Linear Models 
GLM is an extended version of linear models that do not force 
data into unnatural scales, allow for non-linearity and 
non-constant variance in the data. GLM has an assumed 
relationship between the mean of the response variable and the 
linear combination of the explanatory variables. GLM is 
more flexible and better fitted for analyzing ecological 
relationships. (Guisan et al, 2002) The assumptions above 
are implicit in OLS regression. In GLMs, the predictor 
variables Xj (j=1,...,p) are combined to produce a linear 
predictor LP which is related to the expected value u = E(Y) of 
the response variable Y through a link function g0 : 
g(E(Y))=LP=a=X"p (2) 
where a is a constant called the intercept 
X=/(X},....X,) is a vector of p predictor variables 
B={B1..,Bp} is the vector of p regression coefficients 
(one for each predictor) 
We have written the model for generic variables X and Y; the 
corresponding terms for the i” observation in the sample is: 
g(ui) - a B ixi taxi... ByXip (3) 
3.4.4 Discriminant Analysis
1
2
...
250
251
252
253
254
...
560
561
Full text: Technical Commission VII (B7)

Access restriction

Copyright

Note to user