Full text: Proceedings, XXth congress (Part 4)

ibul 2004 
th streets 
ase B as 
tabase B. 
he streets 
> 4 b) all 
veen the 
he object 
own with 
tabase A 
1 the first 
indirectly 
eets from 
this step 
s in the 
ere js an 
sets into 
ite which 
of town. 
ided into 
compare 
by direct 
tial input 
icterizing 
rtitioning 
ne) class 
of classes 
ed in the 
5. 
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004 
  
  
1 * 
ObjectClass AttributeType 
: 
Object DB A [Belangs to — — — 
dem 
DEN Belangs 10. — bjectType P PartitioningAttribute | 
4. 
1 1 
CompareSet 
DBB DB A 
T ClassMatching TT 
  
  
QL-- 
  
  
  
  
  
  
Figure 6: Schema for semantic integration 
3.2 Buffer growing 
An algorithm to compute possible matchings between line- 
objects is "buffer growing" (see Walter, 1997). It acts on the 
assumption that representations of the same real-world object 
have similar locations (after possibly necessary coordinate 
transformations). In the cited paper the algorithm is described in 
a purely iterative way prioritising one of the two datasets that 
should be matched. We have made it symmetric for use with 
respect 10 two datasets on a par, and we have adapted it for the 
more set oriented process in a database system. 
  
ao 81 az 
Oe eee essen) 
e- e- e 
© - bd 
a) bo b, 
Intersections: 
Do, a1 
b4, 81 
Matchings: 
bz bob «3 C = dod1 
b= bob, d= 23182 
bz bo bi — à = 8p818»? 
Intersections: 
  
  
  
Figure 7: Buffer Growing 
The algorithm executes as follows: A buffer is built around 
every object, and all objects of the other dataset which are 
totally inside this buffer as well as all geometrically possible 
aggregations of them are identified as possible matching 
partners as shown in figure 7 b) for the objects by and b,. If two 
175 
buffers are intersecting with the same object, like a, intersecting 
the buffers around b, and b, in our example, the corresponding 
objects of the buffers are aggregated (if possible) and the 
aggregated object is treated like a single object, that is, another 
buffer is built around this new object to find possible matching 
partners, for example in figure 7 c) the matchings containing 
object b. To avoid duplicate aggregated objects, new 
aggregations are only built, if they contain the intersecting 
object. Matchings, which are just extensions of another 
matching (e. g. bob; «> a, is an extension of by & ag in our 
example) are not stored as a possible matching. 
In this way the algorithm computes all possible matchings of 
cardinalities one-to-one, one-to-many and many-to-many. Of 
course, it depends on parameters like the buffer distance which 
have to be tuned for the datasets at hands. 
3.3 Selection 
The set of possible matchings does not only contain correct 
matchings, but some incorrect suggestions too. Therefore the 
elements of the set have to be subdivided into confirmed and 
discarded matchings as mentioned above. 
3.3.1 Conflicts 
If the same object is part of two or more aggregated objects 
involved in different possible matchings, no two of these 
matchings can be simultaneously correct. The matchings are 
said to be in conflict with one another. Therefore, if a possible 
matching is confirmed, all possible matchings, that are 
conflicting with it, have to be discarded. 
On the other hand, if a possible matching is "good enough", that 
is, it fulfills all quality criteria (see below), and is not 
conflicting with any other still possible matching, it can be 
confirmed. 
3.3.2 Automatic Selection 
Automatic selection of matchings, that is confirming or 
discarding of them, can be controled by rules. A special type of 
rules are rules for checking quality criteria. Quality criteria are 
measures for the resemblance of a single aspect of the matched 
objects, e. g. length of matched lines, similarity of names etc.. 
There are different approaches to use quality criteria for 
automatic selection of correct matchings. 
One can compute all the measures of each criterion for all 
possible matchings and then compute the "best combination" of 
matching, that is the combination with the highest sum of 
measures. This problem is equivalent to searching the best 
complete subgraph (clique) in a graph with weighted vertices. 
Another approach is to define a treshold value and discard all 
matchings, with quality measure below the treshold. After 
discarding, all remaining possible matchings, which are not in 
conflict with any other any more, can be confirmed. 
The latter approach can be refined to an iterative method, by 
starting the selection with a high treshold value and then 
decrease it stepwise until a minimum treshold value is reached. 
In every step all matchings with quality measure below the 
treshold are "temporarily" discarded and non-conflicting 
matchings are confirmed. When confirming a matching, all 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.