ibul 2004
th streets
ase B as
tabase B.
he streets
> 4 b) all
veen the
he object
own with
tabase A
1 the first
indirectly
eets from
this step
s in the
ere js an
sets into
ite which
of town.
ided into
compare
by direct
tial input
icterizing
rtitioning
ne) class
of classes
ed in the
5.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B4. Istanbul 2004
1 *
ObjectClass AttributeType
:
Object DB A [Belangs to — — —
dem
DEN Belangs 10. — bjectType P PartitioningAttribute |
4.
1 1
CompareSet
DBB DB A
T ClassMatching TT
QL--
Figure 6: Schema for semantic integration
3.2 Buffer growing
An algorithm to compute possible matchings between line-
objects is "buffer growing" (see Walter, 1997). It acts on the
assumption that representations of the same real-world object
have similar locations (after possibly necessary coordinate
transformations). In the cited paper the algorithm is described in
a purely iterative way prioritising one of the two datasets that
should be matched. We have made it symmetric for use with
respect 10 two datasets on a par, and we have adapted it for the
more set oriented process in a database system.
ao 81 az
Oe eee essen)
e- e- e
© - bd
a) bo b,
Intersections:
Do, a1
b4, 81
Matchings:
bz bob «3 C = dod1
b= bob, d= 23182
bz bo bi — à = 8p818»?
Intersections:
Figure 7: Buffer Growing
The algorithm executes as follows: A buffer is built around
every object, and all objects of the other dataset which are
totally inside this buffer as well as all geometrically possible
aggregations of them are identified as possible matching
partners as shown in figure 7 b) for the objects by and b,. If two
175
buffers are intersecting with the same object, like a, intersecting
the buffers around b, and b, in our example, the corresponding
objects of the buffers are aggregated (if possible) and the
aggregated object is treated like a single object, that is, another
buffer is built around this new object to find possible matching
partners, for example in figure 7 c) the matchings containing
object b. To avoid duplicate aggregated objects, new
aggregations are only built, if they contain the intersecting
object. Matchings, which are just extensions of another
matching (e. g. bob; «> a, is an extension of by & ag in our
example) are not stored as a possible matching.
In this way the algorithm computes all possible matchings of
cardinalities one-to-one, one-to-many and many-to-many. Of
course, it depends on parameters like the buffer distance which
have to be tuned for the datasets at hands.
3.3 Selection
The set of possible matchings does not only contain correct
matchings, but some incorrect suggestions too. Therefore the
elements of the set have to be subdivided into confirmed and
discarded matchings as mentioned above.
3.3.1 Conflicts
If the same object is part of two or more aggregated objects
involved in different possible matchings, no two of these
matchings can be simultaneously correct. The matchings are
said to be in conflict with one another. Therefore, if a possible
matching is confirmed, all possible matchings, that are
conflicting with it, have to be discarded.
On the other hand, if a possible matching is "good enough", that
is, it fulfills all quality criteria (see below), and is not
conflicting with any other still possible matching, it can be
confirmed.
3.3.2 Automatic Selection
Automatic selection of matchings, that is confirming or
discarding of them, can be controled by rules. A special type of
rules are rules for checking quality criteria. Quality criteria are
measures for the resemblance of a single aspect of the matched
objects, e. g. length of matched lines, similarity of names etc..
There are different approaches to use quality criteria for
automatic selection of correct matchings.
One can compute all the measures of each criterion for all
possible matchings and then compute the "best combination" of
matching, that is the combination with the highest sum of
measures. This problem is equivalent to searching the best
complete subgraph (clique) in a graph with weighted vertices.
Another approach is to define a treshold value and discard all
matchings, with quality measure below the treshold. After
discarding, all remaining possible matchings, which are not in
conflict with any other any more, can be confirmed.
The latter approach can be refined to an iterative method, by
starting the selection with a high treshold value and then
decrease it stepwise until a minimum treshold value is reached.
In every step all matchings with quality measure below the
treshold are "temporarily" discarded and non-conflicting
matchings are confirmed. When confirming a matching, all