CMRT09

Stilla, Uwe

Bibliographic data

Monograph

Persistent identifier:: 856955019

Author:: Stilla, Uwe

Title:: CMRT09

Sub title:: object extraction for 3D city models, road databases, and traffic monitoring ; concepts, algorithms and evaluation ; Paris, France, September 3 - 4, 2009 ; [joint conference of ISPRS working groups III/4 and III/5]

Scope:: X, 234 Seiten

Year of publication:: 2009

Place of publication:: Lemmer

Publisher of the original:: GITC

Identifier (digital):: 856955019

Illustration:: Illustrationen, Diagramme, Karten

Language:: English

Usage licence:: Attribution 4.0 International (CC BY 4.0)

Publisher of the digital copy:: Technische Informationsbibliothek Hannover

Place of publication of the digital copy:: Hannover

Year of publication of the original:: 2016

Document type:: Monograph

Collection:: Earth sciences

Chapter

Title:: TEXT EXTRACTION FROM STREET LEVEL IMAGES J. Fabrizio, M. Cord, B. Marcotegui

Document type:: Monograph

Structure type:: Chapter

In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3A/V4 — Paris, France, 3-4 September, 2009 TEXT EXTRACTION FROM STREET LEVEL IMAGES J. Fabrizio 12 , M. Cord 1 , B. Marcotegui 2 1 UPMC Univ Paris 06 Laboratoire d’informatique de Paris 6, 75016 Paris, France 2 MINES Paristech, CMM- Centre de morphologie mathématique. Mathématiques et Systèmes, 35 rue Saint Honoré - 77305 Fontainebleau cedex, France KEY WORDS: Urban, Text, Extraction, Localization, Detection, Learning, Classification ABSTRACT We offer in this article, a method for text extraction in images issued from city scenes. This method is used in the French iTowns project (iTowns ANR project, 2008) to automatically enhance cartographic database by extracting text from geolocalized pictures of town streets. This task is difficult as 1. text in this environment varies in shape, size, color, orientation... 2. pictures may be blurred, as they are taken from a moving vehicle, and text may have perspective deformations, 3. all pictures are taken outside with various objects that can lead to false positives and in unconstrained conditions (especially light varies from one picture to the other). Then, we can not make the assumption on searched text. The only supposition is that text is not handwritten. Our process is based on two main steps: a new segmentation method based on morphological operator and a classification step based on a combination of multiple SVM classifiers. The description of our process is given in this article. The efficiency of each step is measured and the global scheme is illustrated on an example. 1 INTRODUCTION Automatic text localization in images is a major task in computer vision. Applications of this task are various (au tomatic image indexing, visual impaired people assistance or optical character reading...). Our work deals with text localization and extraction from images in an urban en vironment and is a part of iTowns project (iTowns ANR project, 2008). This project has two main goals : 1. al lowing a user to navigate freely within the image flow of a city, 2. Extracting features automatically from this im age flow to automatically enhance cartographic databases and to allow the user to make high level queries on them (go to a given address, generate relevant hybrid text-image navigation maps (itinerary), find the location of an orphan image, select the images that contain an object, etc.). To achieve this work, geolocalized set of pictures are taken every meter. All images are processed off line to extract as many semantic data as possible and cartographic databases are enhanced with these data. At the same time, each mo saic of pictures is assembled into a complete immersive panorama (Figure 1). Many studies focus on text detection and localization in images. However, most of them are specific to a con strained context such as automatic localization of postal addresses on envelopes (Palumbo et al., 1992), license plate localization (Arth et al., 2007), text extraction in video sequences (Wolf et al., 2002), automatic forms reading (Kavallieratou et al., 2001) and more generally "documents” (Wahl et al., 1982). In such context, strong hypothesis may be asserted (blocks of text, alignments, temporal re dundancy for video sequences...). In our context (natural scenes in an urban environment), text comes from vari ous sources (road sign, storefront, advertisements...). Its extraction is difficult: no hypothesis can be made on text (style, position, orientation, lighting, perspective deforma tions...) and the amount of data is huge. Today, we work on 1 TB for a part of a single district in Paris. Next year, more districts will be processed (more than 4 TB). Differ- Segmentation ^R Fast filters Classification ^R Grouping Figure 2: General principle of our system. ent approaches already exist for text localization in natu ral scenes. States of the art are found in (Mancas-Thillou, 2006, Retomaz and Marcotegui, 2007, Jung et al., 2004, Jian Liang et al., 2005). Even if preliminary works ex ist in natural scene (Retomaz and Marcotegui, 2007, Chen and Yuille, 2004), no standard solution really emerges and they do not focus on urban context. The paper presents our method and is organized as follows: the text localization process is presented and every step is detailed followed by the evaluation of main steps. In the last part, results are presented. Then comes the conclusion. 2 SEGMENTATION BASED STRATEGY The goal of our system is to localize text. Once the lo calization is performed, the text recognition is carried out by an external O.C.R. (but the system may improve the quality of the region by correcting perspective deforma tions for example). Our system is a region based approach and starts by isolating letters, then groups them to restore words and text zones. Region based approach seems to be more efficient, such approach was ranked first (Retomaz and Marcotegui, 2007) during ImagEval campaign (Im- agEval, 2006). Our process is composed of a cascade of filters (Figure 2). It segments the image. Each region is analysed to determine whether the region corresponds to text or not. First stages during selection eliminate a part of non text regions but try to keep as many text region as possible (at the price of a lot of false positives). At the end, detected regions that are close to other text regions are grouped all together. Isolated text regions are canceled. 199

Access restriction

Copyright

fullscreen: CMRT09

Monograph

Chapter

Table of contents

Cite and reuse

Monograph

Chapter

Image

Image fragment

Citation links

Citation recommendation