In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 — Paris, France, 3-4 September, 2009
4 FILTERING
Once the image is segmented, the system must be able to
select which regions contain text (letters) and which do
not. A part of these regions is obviously non text (too
big/too small regions, too large...). The aim of this step is
to dismiss most of these obviously non text regions with
out loosing any good character. A small collection of fast
filter (criteria opening) eliminate some regions with sim
ple geometric criteria (based on area, width and height).
These simple filters help saving time because they rapidly
eliminate many regions, simplifying the rest of the process
(which is a bit slower).
5 PATTERN CLASSIFICATION
Some segmented regions are dismissed by previous filters
but a lot of false positives remain. To go further, we use
classifiers with suitable descriptors.
Due to the variability of analysed regions, descriptors must
(at least) be invariant to rotation and scale. The size and the
variability of examples in training database ensure to be in
variant to perspective deformations. We have tested a lot of
different shape descriptors (such as Hu moments, Fourier
moments...). Among them, we have selected two families
of moments : Fourier moments and the pseudo zemike mo
ments. We select them empirically as during our test, they
get a better discrimination ratio than others. We choose
also to work with a third family of descriptors: polar repre
sentation is known to be efficient (Szumilas, 2008) but the
way this representation is used does not match our need.
Then we define our own polar descriptors: the analysed re
gion is expressed into polar coordinate space centered into
the gravity center (Figure 6). The feature is then mapped
into a normalized rectangle (the representation is then in
variant in scale factor). To be rotation invariant, many peo
ple use this representation by computing a horizontal his
togram within this rectangle but this leads to a loss of too
much information. Another way to be rotation invariant
if the representation used is not rotation invariant is to re
define the distance computed between samples (Szumilas,
2008). But this leads to a higher complexity. To be rota
tion invariant, we simply take the spectrum magnitude of
Fourier transform of each line in the normalized rectan
gle. These results carry much more information than sim
ple histograms, and are easier than changing the distance
used.
Once we choose the descriptors, we train a svm classi
fier (Cortes and Vapnik, 1995) for each family of descrip
tors. To give a final decision, all outputs of svm classifier
are processed by a third svm classifier (Figure 7). We tried
to add more classifiers in the first step of the configuration
(with other kinds of descriptors) but this makes the overall
accuracy systematically decreasing.
GROUPING
angle
Figure 6: The region is expressed in a polar coordinate
space and to have a rotation invariant descriptor we take
the spectrum of Fourier transform of every line.
pzm^
SVM
fourier
►
SVM
►1 SVM H fina L
° decision
polar
►
SVM
Figure 7: Our classifier is composed of 3 svm classifiers
that use common family of descriptors and a svm that take
the final decision.
are grouped all together with neighbour to recover text re
gions. The conditions to link two characters to each other
are the one given in (Retomaz and Marcotegui, 2007). They
are based on the distance between the two regions rela
tively to their height. This steps will soon be improved
to handle text in every direction as this approach is re
stricted to nearly horizontal text. During this process, iso
lated text regions (single character of couple of letters)
are dismissed. This aggregation is mandatory to generate
words and sentences to integrate as an input in an O.C.R.
but it also suppresses a lot of false positive detections.
7 LETTER DETECTION EXPERIMENTS
In this section, we evaluate segmentation and classification
steps.
Segmentation The segmentation evaluation is always dif
ficult as it is, for a part, subjective. Most of time, it is
impossible to have a ground truth to be used with a repre
sentative measure. To evaluate segmentation as objectively
as possible for our application, we have constituted a test
image database by randomly taking a subset of the image
database provided by I.G.N. (Institut Géographique Na
tional, n.d.) to the project (¡Towns ANR project, 2008). We
segment all images from this database and we count prop
erly segmented characters. We define as clearly as possi
ble what properly segmented means: the character must be
readable, it must not be split or linked with other features
around it. The thickness may vary a little provided that its
shape remains correct. We compare the result with 3 other
segmentation methods:
We are able to analyse main regions in the image and ex
tract characters. Once these characters are selected, they