Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
  
ISPRS Commission III, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002 
  
tion images. They use local directional histograms to segment 
regions showing similar grid orientation. In (Price, 2000) mul- 
tiple high resolution images and Digital Surface Models (DSM) 
are combined to extract the urban road grid in complex, though 
stereotypical, residential areas. After manual initialization of two 
intersecting road segments defining the first mesh, the grid is it- 
eratively expanded by hypothesizing new meshes and matching 
them to image edges. During final verification, the contextual 
knowledge is exploited that streets are elongated structures whose 
sides may be defined by high objects like buildings or trees. Thus, 
so-called extended streets (few consecutive road segments) are 
simultaneously adjusted by moving them to local minima of the 
DSM while isolated and badly rated segments are removed. The 
internal evaluation of a road segment mainly depends on the edge 
support found during hypothesis matching. However, ratings of 
single segments may be altered during verification of extended 
streets, which seems justified since this verification is carried out 
from a more global perspective on the object "road". 
An interesting approach regarding the role of internal evalua- 
tion is employed in the system of (Tupin et al., 1999) for find- 
ing consistent interpretations of SAR scenes (Synthetic Aperture 
RADAR). In a first step, different low level operators with spe- 
cific strengths are applied to extract image primitives, i.e., cues 
for roads, rivers, urban/ industrial areas, relief characteristics, etc. 
Since a particular operator may vote for more than one object 
class (e.g. road and river), a so-called focal and non-focal el- 
ement is defined for each operator (usually the union of real- 
world object classes). The operator response is transformed into 
a confidence value characterizing the match with its focal ele- 
ment. Then, all confidence values are combined in an evidence- 
theoretical framework to assign unique semantics to each prim- 
itive attached with a certain probability. Finally, a feature adja- 
cency graph is constructed in which global knowledge about ob- 
jects (road segments form a network, industrial areas are close to 
cities, ...) is introduced in form of object adjacency probabilities. 
Based on the probabilities of objects and their relations the final 
scene interpretation is formulated as a graph labelling problem 
that is solved by energy minimization. In (Tónjes et al., 1999), 
scene interpretation is based on a priori knowledge stored in a se- 
mantic net and rules for controlling the extraction. Each instance 
of an object, e.g., a road axis, is hypothesized top-down and inter- 
nally evaluated by comparing the expected attribute values of the 
object with the actual values measured in the image. Competing 
alternative hypotheses are stored in a search tree as long as no 
further hypotheses can be formed. Finally, the best interpretation 
is selected from the tree by an optimum path search. 
In summary, many approaches derive confidence values from low 
level features such as lines or edges. In the following steps the 
values are propagated and aggregated providing eventually a ba- 
sis for the final decision about the presence of the desired ob- 
ject. This procedure may cause problems since the evaluation 
is purely based on local features while global object properties 
are neglected. Therefore, some approaches introduce additional 
knowledge (e.g., roads forming a network or fitting to "valleys" 
of a DSM) at a later stage when more evidence for an object has 
been acquired. All mentioned approaches have in common that 
they use one predefined model for simultaneously extracting and 
evaluating roads. Due to the complexity of urban areas, however, 
it is appropriate to use a flexible model for extraction and evalu- 
ation, which can easily adapt to specific situations occurring dur- 
ing the extraction, e.g., lower intensities and weaker contrast in 
shadow areas. Before describing our evaluation methodology in 
more detail we give a brief summary of the extraction system. 
3 SYSTEM OVERVIEW 
Our system tries to accommodate aspects having proved to be of 
great importance for road extraction: By integrating a flexible, 
detailed road and context model one can capture the varying ap- 
pearance of roads and the influence of background objects such 
as trees, buildings, and cars in complex scenes. The fusion of dif- 
ferent scales helps to eliminate isolated disturbances on the road 
while the fundamental structures are emphasized (Mayer and Ste- 
ger, 1998). This can be supported by considering the function 
of roads connecting different sites and thereby forming a fairly 
dense and sometimes even regular network. Hence, exploiting 
the network characteristics adds global information and, thus, 
the selection of the correct hypotheses becomes easier. As basic 
data, our system expects high resolution aerial images (resolution 
« 15 cm) and a reasonably accurate DSM with a ground resolu- 
tion of about 1 m. In the following, we sketch our road model and 
extraction strategy. For a comprehensive description we refer the 
reader to (Hinz et al., 2001a, Hinz et al., 2001b). 
3.1 Road and Context Model: 
The road model illustrated in Fig. 1 a) compiles knowledge about 
radiometric, geometric, and topological characteristics of urban 
roads in form of a hierarchical semantic net. The model rep- 
resents the standard case, i.e., the appearance of roads is not 
affected by relations to other objects. It describes objects by 
means of “concepts”, and is split into three levels defining dif- 
ferent points of view. The real world level comprises the objects 
to be extracted: The road network, its junctions and road links, 
as well as their parts and specializations (road segments, lanes, 
markings,...). These concepts are connected to the concepts of 
the geometry and material level via concrete relations (Tonjes et 
al., 1999). The geometry and material level is an intermediate 
level which represents the 3D-shape of an object as well as its 
material describing objects independently of sensor characteris- 
tics and viewpoint (Clément et al., 1993). In contrast, the image 
level which is subdivided into coarse and fine scale comprises the 
features to detect in the image: Lines, edges, homogeneous re- 
gions, etc. Whereas the fine scale gives detailed information, the 
coarse scale adds global information. Because of the abstraction 
in coarse scale, additional correct hypotheses for roads can be 
found and sometimes also false ones can be eliminated based on 
topological criteria, while details, like exact width and position 
of the lanes and markings, are integrated from fine scale. In this 
way the extraction benefits from both scales. 
The road model is extended by knowledge about context: So- 
called context objects, i.e., background objects like buildings or 
vehicles, may hinder road extraction if they are not modelled ap- 
propriately but they substantially support the extraction if they 
are part of the road model. We define global and local context: 
Global context: The motivation for employing global context 
stems from the observation that it is possible to find semantically 
meaningful image regions — so-called context regions — where 
roads show typical prominent features and where certain rela- 
tions between roads and background objects have a similar im- 
portance. Consequently, the relevance of different components 
of the road model and the importance of different context rela- 
tions (described below) must be adapted to the respective context 
region. In urban areas, for instance, relations between vehicles 
and roads are more important since traffic is usually much denser 
inside of settlements than in rural areas. As (Baumgartner et al., 
1999), we distinguish urban, forest, and rural context regions. 
Local context: We model the local context with so-called con- 
text relations, i.e., certain relations between a small number of 
road and context objects. In dense settlements, for instance, the 
footprints of buildings are almost parallel to roads and they give 
therefore strong hints for road sides. Vice-versa, buildings or 
other high objects potentially occlude larger parts of a road or cast 
shadows on it. A context relation "occlusion" gives rise to the se- 
lection of another image providing a better view on this particular 
part of the scene, whereas a context relation "shadow" can tell an 
extraction algorithm to choose modified parameter settings. Also 
vehicles occlude the pavement of a lane segment. Hence, vehicle 
outlines as, e.g., detected by the algorithm of (Hinz and Baum- 
gartner, 2001) can be directly treated as parts of a lane. In a very 
A - 164
1
2
...
177
178
179
180
181
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user