SELF-ORGANIZING NEURAL NETWORKS IN FEATURE EXTRACTION
Mr. Markus Törmä
Institute of Photogrammetry and Remote Sensing
Helsinki University of Technology
Espoo, Finland
markus@mato.hut.fi
Commission II, Working group 3
KEY WORDS: Feature Extraction, Neural Networks, Classification
ABSTRACT
Due to large datavolumes when remote sensing or other kind of images are used, there is need for methods to
decrease the volume of data. Methods for decreasing the feature dimension, in other words number of channels, are
called feature selection and feature extraction. In the feature selection, important channels are selected using some
search technique and these channels are used for current problem. In the feature extraction, original channels are
transformed to lower dimensional channels and these are used for problem. Widely used feature extraction method
is Karhunen-Löwe transformation. In this study Karhunen-Löwe transformation is compared to transformation made
by Kohonen self-organizing feature map. Tests made using artificially generated datasets show that the differences
between compared methods are small.
1. INTRODUCTION
Usually remote sensing instruments carry out
measurements using several areas of the
electromagnetic spectrum. As a result, image provided
by a remote sensing instrument consist of several
spectral channels. The number of the channels can be
seven like in LANDSAT TM-image, but it can go as high
as several hundred when spectrometers (e.g. AVIRIS,
224 channels) are used. Important step in data
processing before e.g. land use classification is to find
relevant channels for the current problem, so that
feature dimension would decrease.
We can choose relevant channels using knowledge about
spectral properties of the targets represented in the
image. For example, if we want to separate land areas
from water areas we can use LANDSAT TM channel 4,
because the reflectance of water is nearly zero on the
near-infrared part of the spectrum. But usually in the
more complicated problems we do not have this kind of
a priori information, or it is quite time consuming to
utilize a priori information to channel selection. In this
case, we can perform mathematical feature selection.
The structure of this paper is as follows: chapter 2
represents different approaches for the feature selection
and chapter 3 one of these methods, Karhunen-Löwe
transformation, called also principal component analysis,
is reviewed. In chapter 4 self-organizing neural network
called Kohonen self-organizing feature map (SOM) is
presented and its use in the feature selection is
discussed. Chapter 5 presents experiments made for
comparing Karhunen-Löwe transformation and SOM
and chapter 6 discusses about results. Finally, chapter
7 represents conclusions.
374
2. FEATURE SELECTION
The methods for feature selection are divided into two
groups: feature selection in feature space and feature
selection in transformed space. Feature selection in
feature space is made by choosing those features, which
contain useful information and deleting those features
which contain redundant or unnecessary information. In
other words, we have all features in featureset Y and we
seek the best subset of Y called X. The best subset of Y
is chosen by maximizing some criterion function. In the
ideal case this best subset maximizes the probability of
correct classification compared to other possible
combinations. Usually feature selection in the feature
space is simply called feature selection. Feature selection
in the transformed space is made by transforming the
original measurement vector y to lower dimensional
feature vector x. In this case the decrease of redundant
and unnecessary information depends on used
transformation. Transformation can be any kind of
vector function of y, but usually linear transformations
are used. Linear transformation can be written as
x = Ay, (1)
where À is transformation matrix. The problem is how
to determine a good matrix À, so that useful information
is not destroyed. Feature selection in transformed space
is also called feature extraction.
The best subset of all features in the feature selection is
chosen using criterion function and search algorithm.
Criterion function J to be maximized can based on
probability of error, interclass distance, probabilistic
distance, probabilistic dependence or entropy. The idea
in all these criterion functions is to measure the
separability of classes. The best subset could be found by
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B2. Vienna 1996
FR VS A N Fi bd NN
© ©“ OO FHM ct oí — ce 23
pd +
Y^
40 Q We Me
VE