tion.
n of
nding
les in
there
end the
the
of the
state
s apply
ngs to
ule.
ed with
| not
essing
ors
large
r
ere is
e 3).
g
d a
ass
ssing.
ion for
tes of
r not
close,"
erent
vectorization rules.
Vector representation and processing
Rule | Neighborhood states to which rule applies
no.
1 0
2 2:34.6.7 810 11 12.14.18 16 18 19 20 22
23 24 26 27 28 30 31
3 121
4 32 42 43 46 47 48 56 58 59 60 62 63 96
106 107 110 111 112 120 122 123 124 126
127
5 64 74 75 78 79 82 83 84 86 87 88 90 91 92
94 95
6 128 129 131 135 138 139 142 143 146 147
148 149 150 151 154 155 158 159 192 193
195 199 202 203 206 207 210 211 212 213
214 215 218 219 222 223
7 5.9 13.17 25 29
8 34 35 36 38 39 40 44 50 51 52 54 55 98 99
100 102 103 104 108 114 115 116 118 119
9 66 67 68 70 71 72 76 80
10 130 132 133 134 136 137 140 141 144 145
152 153 156 157 194 196 197 198 200 201
204 205 208 209 216 217 220 221
11 33 49 97 113
12 65
13 160 161 176 177
14 37 41 45 53 57 61 101 105 109 117 121 125
15 69 73 77 81 85 89 93
16 162-175 178-191 224-255
Table 2. Neighborhood states related to
The sequential pass through the raster data set
associates each linework pixel with a vector line
string (except for cases where rule 1 is applied).
That is, the coordinates of the current pixel are
added to the coordinate list of some vector.
connects to
line string
temporary nodes
1
xu
)
©
connects to two
ine strings
true nodes
connects to three
line strings
Figure 3 Examples of node placement and line
string endpoint identification.
The output of the raster processing pass is a
vector data set that is not very useful.
It will
typically consist of a very large number of line
fragments that are tied together by temporary
Consolidating these fragments by removing
unnecessary nodes requires further processing of
the vector data.
nodes.
The vector processing uses a form of a depth-first
search to visit all the coordinates in the data
set.
to more than two vectors,
Nodes that connect to exactly one vector,
are true ends of lines.
These types of nodes are the starting points for
traversals of line strings.
Any node found during
such a traversal that connects to exactly two
or
vectors is a false node; the vectors that it
points to are combined and the traversal
continues. The traversal continues through nodes
recursively and ends when all lines that connect,
either directly or indirectly, to the start node
have been traversed.
Most cartographic data sets are not fully
connected graphs. The traversal of vectors is,
therefore, a series of graph traversals.
No linework coordinates have yet been removed from
the data set. The final step of the vector
processing is to remove coordinate values that are
not necessary to characterize the line. The
coordinates are filtered using the Douglas-Peucker
algorithm (Douglas and Peucker, 1973). The
resultant coordinates are sent to an output module
for storage.
RESULTS
The algorithms and structures described above were
implemented in software. The program is written
in ANSI C and is very portable. It has been
tested on DOS (v3.3 and higher) PC's and on two
UNIX workstations.
Implementation
The three factors (neighborhood state, rule set,
predecessor pixels) to the vectorization process
are implemented as distinct processing steps.
First, the neighborhood state of the current pixel
is determined by an evaluation function.
Second, a function that codes one of the rules is
called. Each of the rules in table 1 is coded in
a C function. Although there are only 16 rules,
there are 256 possible neighborhood states. The
implementation uses an array of pointers to
functions to minimize the search time for the
appropriate rule.
Each element of an array of 256 pointers to
functions is initialized to point to the
appropriate rule (Figure 4). The result of the
evaluation of the neighborhood state becomes an
array index which permits the appropriate rule to
be called with a single array access.
Third, a set of utility functions provides support
to the rule functions. These utility functions
supply the intelligence of the third input to the
vectorization decision.
Production environment
The U.S. Geological Survey (USGS) uses its graphic
products as data sources for a variety of digital
products. Map separates are scanned at
resolutions of 750 or 1000 lines per inch. Such
data sets typically contain 250 to 450 million
pixels. Run-length encoding will usually compress
such data sets to less than 20 Mb. The goal of
this project was to produce nonproprietary
vectorization software that would operate directly
on these run-length encoded data sets.
The software was designed to use relatively little
memory. In general, directory structures (e.g.
node-line and line-node lists) are kept in memory,
but at any one time the bulk of the coordinate
data are on the disk. The memory required by the
directory structures can be as high as 20 Mb for
large USGS maps scanned at high resolutions
(although less than 10 Mb is average), so it is
not practical to vectorize these very large data
sets on PC's. But UNIX workstations with 32 Mb or
more of memory can easily vectorize the largest
USGS data sets.
Keeping coordinate data on disk instead of in
memory results in slower performance. But
vectorization is a batch process that takes a
small amount of time compared with the total time
and cost of producing a digital data set from a
map. On a 17-mip UNIX workstation, this software
will typically vectorize a 1:24,000-scale
quadrangle overlay in less than 1 hour.