The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. Beijing 2008
fitness function that combines the goals of high classification
accuracy and low computation cost into one.
Hyperspectral Images
Testing Set
Training Set
Feature Gene
Genetic Operations
(Crossover, mutation)
—. Classification accuracy
'Are the termination tor testing subset
conditions satisfied
Fitness Evaluation
Optimized Core Parameters and Feature Subset
Figure.2. Flowchart of the GA-SVM wrapper approach
2.3.1 Design of chromosome
To optimize the kernel parameters and feature subset
simultaneously, the chromosome was designed to comprise
three parts, C, , and the features mask. The binary coding
system was used to represent the chromosome. Fig. 3 shows the
design of the binary chromosome.
c
Y
/
c,
...Ci...
Cnc
n
f
-A-
U
Because the number of features of hyperspectral data, i.e., the
number of bands, is so large, traditional binary coding method
will produce an enormously large solution space, which makes
it impossible to find the optimal feature subset. Therefore a
novel binary coding strategy was adopted, which set the length
of chromosome different to the number of features. Suppose
that the length of chromosome for the feature subset is nf, nf
features were randomly selected from all the feature subset and
sorted according to their identity number. A nf bits bit string
was then generated and used as a mask for the selected nf
features. Thus the number of solutions decreased from 2n to
2nf*Cnfn. When nf is much less than n, computation efficiency
will be improved greatly.
2.3.2 Design of fitness function
A fitness function is needed in the Genetic Algorithm to
evaluate whether an individual is “fit” to survive. In the GA-
SVM model, we used two criteria, namely classification
accuracy and the number of selected features, to design the
fitness function. The principle is that individuals with high
classification accuracy and small number of features has a high
fitness value, and thus high probability to be pass its genes to
the next generation. A single objective fitness function that
combines the two goals into one was designed to solve the
multiple criteria problem. The formula is as below.
fitness = w a x accuracy +
Figure.3. Coding of chromosome
In Figure. 3, Ci represents the ith bit’s value of bit string that
represents parameter C, and nc is the number of bits
representing parameter C; j represents the jth bit’s value of
bit string that represents , and n is the number of bits
representing parameter ; f k represents the mask value of kth
feature, and nf is the number of bits representing the selected
features, nc, n and nf can be modified according to the
calculation precision and/or efficiency required.
The bit strings representing the genotype of parameter C and
should be transformed into phenotype by Eq. (1). Note that the
precision of representing parameter depends on the length of the
bit string (nc and n ), and the minimum and maximum value
of the parameter is determined by the user. For chromosome
representing the feature mask, the bit with value ‘ 1 ’ represents
the feature is selected, and ‘0’ indicates feature is not selected.
T(R,l) = min Ä +
max ß — min ^
2' -1
x d
(1)
T (R, 1) represents phenotype of the chromosome (or part of it)
that has 1 bits, i.e. genes. minR and maxR represents the
minimum and maximum value of parameter R; d represents the
decimal value corresponding to the bit string; the number of bits
representing the parameter 1 can be modified according to the
calculation precision required.
Where wa f] represents the weight value for classification
accuracy, wf for the number of features; fi is the mask value of
the ith feature, ‘ 1 ’ represents that feature i is selected; ‘ 0 ’
represents that feature i is not selected; wa can be adjusted to
100% if accuracy is the most important. Generally, wa can be
set from 75 to 100% according to user’s requirements. In our
study, we set wa to 80% and wf to 20%. It can be inferred that
high fitness value is determined by high classification and small
feature number.
2.3.3 Basic Steps of the GA-SVM method
The basic steps of the GA-SVM method are as below:
1) Create a initial population of certain size, i.e. a group
individuals with different chromosomes. Each individual’s
chromosome consists of three parts, namely parameter C,
parameter and band subset of the hyperspectral data. The
chromosomes of the initial population were randomly created.
The size of the initial population should be determined properly
by user to include as many possible solutions as possible.
2) Calculate the fitness value of each individual in the initial
population using Eq. (2) and rank them according to their
fitness. To calculate the fitness value of an individual, or a
chromosome, the genotypes are firstly converted to phenotypes,
i.e. converting the binary codes to the parameter C and , and
the identities of the selected features; These values, together
with the training sets of the hyperspectral image, are then used
as input to the SVM classifier to perform classification; After
that, the classification accuracy is evaluated based on the
testing sets; finally, fitness value of the individual is calculated
using Eq. (2) based on the classification accuracy and number
of selected features.