Full text: Recording, documentation and cooperation for cultural heritage

  
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-5/W2, 2013 
XXIV International CIPA Symposium, 2 — 6 September 2013, Strasbourg, France 
to match an input image to sample images of every class using 
densely computed SIFT descriptors. Since legends and depicted 
local coin images are highly discriminative, techniques dedicated 
to their recognition were developed. The proposed legend recog- 
nition method computes scores for a set of densely sampled can- 
didate character locations and combines them to meaningful lex- 
icon words using dynamic programming. Finally, a coin image 
recognition providing a coarse-grained classification based on the 
depicted images using Bags of Visual Words (BoVW) was re- 
searched. Moreover, a fruitful combination of the two former 
methods was implemented and showed an increased classifica- 
tion accuracy. 
4.1 Global Image Matching 
Global image matching has the advantage that it does not rely 
on machine learning techniques, and thus, no training data is re- 
quired. This is a considerable advantage when working with an- 
cient coins, because museums usually prefer to have a few ex- 
amples of many different types to provide their visitors a broader 
overview. Thus, only a few images per type are available, which 
impedes the use of machine learning techniques. The missing 
training is compensated by the flexible matching model for dense 
correspondence that can handle the spatial variations of local struc- 
tures between coins of the same class. The matching algorithm is 
reminiscent of the SIFT flow method (Liu et al., 2011). For every 
pixel in the input image, SIFT features are computed and form a 
dense field of features. This allows the computation of pixel-to- 
pixel correspondences between two images. The Euclidean dis- 
tance between two SIFT features is considered as their matching 
costs. The matching of the entire SIFT field can be described as 
an energy term, which compares every SIFT descriptor of one 
field with the descriptors in the respective pixel neighborhood 
of another field. Minimizing this term yields the final matching 
score for the two images. However, minimizing this function for 
large-scale images is a complex operation. In order to accelerate 
the classification process, a coarse-to-fine approach was proposed 
(Zambanini and Kampel, 2013a). That is, SIFT flow matching is 
performed in multiple steps with each subsequent step operating 
on images in a higher resolution than the previous one. For each 
step k, the acceptance rate defining how many images are passed 
on to be inspected at a higher resolution in the classification stage 
can be tweaked with a parameter Ax, ranging from 1% to 100%. 
The experiments were carried out on 180 images of the reverse 
side of coins belonging to 60 different classes. Each class com- 
prises 3 different images and the coarse-to-fine granularity was 
set to four steps. Moreover, two different test/training set config- 
urations were used; the first option uses only one reference image 
while the other configuration uses two reference images per class. 
The best result of 83.3% classification rate is achieved when two 
reference images per class are used and 70% of all images are re- 
jected in the first subselection step as well as 50% in the second 
and third step. The same classification rate can be achieved when 
no subselection is performed during matching, but takes approx. 
8 times longer to compute. 
4.2 Legend Recognition 
The legend recognition method (Kavelar et al., 2012) uses object 
recognition techniques rather than standard OCR methods, since 
OCR relies on successful binarization for the separation of text 
and background. Coin legends have the same color as the rest of 
the coin, thus intensity changes result only from the coin surface 
relief structure. Therefore, effective binarization is not possible. 
Instead, the appearance of the individual letters occurring in the 
coin legends is taught to a Support Vector Machine (SVM) using 
376 
  
[ DIa-5[nzió[n-20 [n-30| 
  
  
  
  
  
  
  
  
  
  
Best, initially 31.7% | 21.7% 17.8% 13.9% 
Top 3, initially 62.2% | 39.4% 21.7% 18.3% 
Best, re-scored || 53.3% | 42.8% | 34.4% | 28.9% 
Top 3, re-scored || 81.1% | 66.1% | 57.8% | 51.7% 
  
  
Table 1: Legend recognition results. Top 3 indicates that the cor- 
rect word is among the three most probable words found. 
a single SIFT descriptor (Lowe, 2004) (see Fig. 4 (a)). The leg- 
end words considered comprise 18 different characters which the 
SVM has to distinguish. The training process uses 50 100 x 100 
pixel-sized, manually segmented images per class. The recogni- 
tion process works as illustrated in Fig. 4 (b). The input image is 
first scaled down to a standardized size of 348 x 348 pixels to en- 
sure an approximately equal font size for all legends. This spares 
the computation of SIFT features in various scales. Next, in the 
keypoint extraction step, regions of interest are detected with an 
entropy filter. From this regions, a list of candidate character lo- 
cations (CCLs) is densely sampled and passed to the character 
recognition step, which computes a SIFT descriptor for each of 
the CCLs and tests it against all 18 SVMs to receive 18 scores 
for every CCL. Hence, this step leads to a dense likelihood map 
for every letter. In the final word recognition step, pictorial struc- 
tures (Felzenszwalb and Huttenlocher, 2005) are used to generate 
word hypotheses for all legend words provided via a lexicon. To 
increase the confidence in the hypotheses, they are re-scored by 
computing SIFT descriptors for fixed orientations based on the 
CCLs in the individual hypothesis. Finally, words that received 
scores above a certain threshold are rejected. Thus, for an input 
image and a lexicon containing the possible legend words, the 
legend recognition algorithm outputs a list of legend words or- 
dered by their scores. The classification rate depends on the lexi- 
con size n used and whether the word hypothesis are re-scored or 
not. Table 1 gives an overview of the results achieved for 180 coin 
images. Best indicates the cases where the word with the lowest 
score (i.e., the best match) equals the ground truth while Top-3 
indicates that the correct word is among the three best matching 
words detected. 
4.3 Coin Image Recognition 
Just like the legend, the coin image provides a highly discrim- 
inative feature. For Roman Republican coins, the coin image, 
such as a she-wolf or a dolphin, is depicted on the reverse side, 
while the obverse usually shows the head of a god. Since mul- 
tiple coin types share the same coin images, a fine-grained clas- 
sification cannot be performed based solely on the coin image. 
However, it can serve as a preselection step, which prunes the set 
of classes. The proposed algorithm (Anwar et al., 2013) is based 
on a Bag of Visual Words (BoVW) algorithm. For an input im- 
age, SIFT features are densely extracted at a constant pixel stride. 
The computed features are quantized using k-means clustering; 
the number of clusters k determines the size of the visual vocab- 
ulary. To describe a novel image, the computed SIFT features are 
mapped to their closest visual word in Euclidean space. Finally, 
a histogram based on the number of features assigned to the in- 
dividual words is constructed and describes the depicted symbol. 
This process, however, does not consider spatial relations of the 
individual parts of a symbol. Thus, various tiling patterns capable 
of capturing spatial relationships (rectangular, circular, log-polar) 
were evaluated. The best classification rate of 90.676 is achieved 
with a vocabulary size of 100, a pixel stride of 5 and rectangular 
tiling.
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.