290
In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds). 1APRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3, 2010
score calculated from colour detection for example. They pro
pose edge-based score, inspired by (Jolly et al.. 1996). which de
pends on the distance from template’s edges to image’s edges,
weighted by the magnitude of the oriented gradient difference
— 7 P - 'fp and t p represent the orientation of gradient of tem
plate and image w.r.t point p respectively. However, we observed
that the bigger an object’s representation in an image, the harder
it is for the template to converge. This is logical considering that
the accumulated distance can easily become large when we work
on bigger objects while the template is relatively fine-positioned.
For this reason, we use a similar score but we integrate the scale
s of the template to be more tolerant of large distances on bigger
templates. The score is calculated to be in the interval [0.1].
S oc ex p( I|V ^ IItc ^ N ] \cos(<p}p 7p)1 (10)
Consequently, we deduce the edge-based error e e; = 1 — S which
corresponds to perfect matching. This error is important because
we use this criterion to auto-adapt the forces repartition in primi
tives fusion.
versely, the influence of a template’s edges will be higher when
the error is low. We will show the advantage of this function in
section 5.
Figure 4: Function a used to change influence from primitives
along iteration
4.3 Population & initialization
4.2 Primitives fusion
Now let us consider a population of individuals which represents
the association of deformable templates and spatial configuration
of 6 parameters (6, s, t u , t v ,e, d) 1 in an given image. Every tem
plate is initially associated to a configuration and every individual
is able to converge to minimize its local error thanks to a deter
ministic process described in 3.4. This ability depends on error
which, itself, depends on particular primitive. For example, the
error in relation to the edge extraction of an image I would be
E(Ie,Pe, a). In fact, we are able to write this error for every
primitive we wish. Each error would be associated to infinitesi
mal shift da w’hich can be used in addition to the configuration
vector a. In default case, we could perform a simple linear com
bination of infinitesimal shifts :
a t +i = a* + =— akdak) (11)
J2k=e,r a k >
However, if we look the extraction result, we can easily under
stand that colour extraction is not locally smart due to the Bayer
filter of the camera and local colour aberration. On the other
hand, colour extraction presents only few possible areas because
red regions are unusual in natural environments. Edge extrac
tion is locally considerably more precise than colour extraction
but has the disadvantage of being very noisy. This is simply due
to the fact that many objects in an image scene possess edges.
We reasoned that we could intelligently find a dynamic relation
between infinitesimal shifts to improve both the number of itera
tions and the convergence precision. Consequently, we think that
an individual should first use the colour-based primitive to rapidly
converge around one area and then use the edge-based primitive
to converge with precision.
Considering the only red road sign detection application, we ob
tain the following relation :
da — ct-r(ce e re /)da t : -1- (1 O'r(ce Cre/))ria7- (12)
Where a T is a function defined as :
CX T .
[0.1]
X
[0Д]
Otr(x) =
(13)
This function a T allows to auto-adapt the influence of primitives
during an individual’s life time. This individual will be more
influenced by colour extraction when its error is high, and con-
As well as the proposition in (Siarry, 2007). we choose to ini
tialize our population using connected colour extraction compo
nents. As stated previously, these connected components are not
perfect because colour extraction is not a noiseless process. We
therefore decided to use random variables to initialize the pop
ulation around connected components. We take N p to denote
the number of templates in our population and N cc to denote
the number of connected components in our image associated
to region of interest ((u,vY,w.h). We take 5 random vari
ables K, ©, S, U and V which follow a uniform distribution
on (1,..., Ncc.}- and a normal distribution Af(0,0.1), J\f(l,0.2),
M(0,0.2) respectively. Our initialization process is :
а к,
A’ G [1, N p ] <
в =0
g —g x min (u'fc./tfc)
{t~u,t v y=(u,v)i + (U x Wk, V x Ilk) 1
e =1
d =0
04)
Where R is the default current template size.
5 RESULTS
We have used the same image database as in (Siarry, 2007). We
used 3436 images in this database, grabbed using a front-view
camera embedded in a vehicle. This database was used to eval
uate different pre-detection algorithms in (Foucher et al., 2009).
We found 18 red triangular road signs on 48 images. Figure 2
shows one of these road signs. Every primitive (edge and red ex
traction images) is calculated from every original image but we
do not try to detect road signs if there are no connected compo
nents with an area of more than 100 pixels. The population con
sists of 20 individuals per template. We note that the complexity
of this algorithm is linear with respect to the population size. Re
combination of individuals is not used because we observed that
this brought no improvement. We therefore just used natural se
lection by removing the 8 worst individuals from the population
at each iteration and we re-initialized 8 brand-new individuals.
The first result to remark is the very good rate of convergence,
due in part to edge-based score calculation. Figure 5 shows the
“Receiver Operating Characteristic” (ROC) curve representing
the true positive detection rate with respect to the false positive
detection rate. Each point on this curve represents a different de
cision threshold based on e e . The oriented gradient-based score