XVIIth ISPRS Congress: XVIIth ISPRS Congress

fritz, lawrence w.; lucas, james r.
  
  
  
  
  
  
  
  
  
  
  
  
  
tion. 
n of 
nding 
les in 
there 
end the 
the 
of the 
state 
s apply 
ngs to 
ule. 
ed with 
| not 
essing 
ors 
large 
r 
ere is 
e 3). 
g 
d a 
ass 
ssing. 
ion for 
tes of 
r not 
close," 
erent 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
vectorization rules. 
Vector representation and processing 
Rule | Neighborhood states to which rule applies 
no. 
1 0 
2 2:34.6.7 810 11 12.14.18 16 18 19 20 22 
23 24 26 27 28 30 31 
3 121 
4 32 42 43 46 47 48 56 58 59 60 62 63 96 
106 107 110 111 112 120 122 123 124 126 
127 
5 64 74 75 78 79 82 83 84 86 87 88 90 91 92 
94 95 
6 128 129 131 135 138 139 142 143 146 147 
148 149 150 151 154 155 158 159 192 193 
195 199 202 203 206 207 210 211 212 213 
214 215 218 219 222 223 
7 5.9 13.17 25 29 
8 34 35 36 38 39 40 44 50 51 52 54 55 98 99 
100 102 103 104 108 114 115 116 118 119 
9 66 67 68 70 71 72 76 80 
10 130 132 133 134 136 137 140 141 144 145 
152 153 156 157 194 196 197 198 200 201 
204 205 208 209 216 217 220 221 
11 33 49 97 113 
12 65 
13 160 161 176 177 
14 37 41 45 53 57 61 101 105 109 117 121 125 
15 69 73 77 81 85 89 93 
16 162-175 178-191 224-255 
Table 2. Neighborhood states related to 
The sequential pass through the raster data set 
associates each linework pixel with a vector line 
string (except for cases where rule 1 is applied). 
That is, the coordinates of the current pixel are 
added to the coordinate list of some vector. 
  
  
   
connects to 
line string 
temporary nodes 
1 
xu 
) 
© 
connects to two 
ine strings 
true nodes 
connects to three 
line strings 
  
  
Figure 3 Examples of node placement and line 
string endpoint identification. 
The output of the raster processing pass is a 
vector data set that is not very useful. 
It will 
typically consist of a very large number of line 
fragments that are tied together by temporary 
Consolidating these fragments by removing 
unnecessary nodes requires further processing of 
the vector data. 
nodes. 
The vector processing uses a form of a depth-first 
search to visit all the coordinates in the data 
set. 
to more than two vectors, 
Nodes that connect to exactly one vector, 
are true ends of lines. 
These types of nodes are the starting points for 
traversals of line strings. 
Any node found during 
such a traversal that connects to exactly two 
or 
vectors is a false node; the vectors that it 
points to are combined and the traversal 
continues. The traversal continues through nodes 
recursively and ends when all lines that connect, 
either directly or indirectly, to the start node 
have been traversed. 
Most cartographic data sets are not fully 
connected graphs. The traversal of vectors is, 
therefore, a series of graph traversals. 
No linework coordinates have yet been removed from 
the data set. The final step of the vector 
processing is to remove coordinate values that are 
not necessary to characterize the line. The 
coordinates are filtered using the Douglas-Peucker 
algorithm (Douglas and Peucker, 1973). The 
resultant coordinates are sent to an output module 
for storage. 
RESULTS 
The algorithms and structures described above were 
implemented in software. The program is written 
in ANSI C and is very portable. It has been 
tested on DOS (v3.3 and higher) PC's and on two 
UNIX workstations. 
Implementation 
The three factors (neighborhood state, rule set, 
predecessor pixels) to the vectorization process 
are implemented as distinct processing steps. 
First, the neighborhood state of the current pixel 
is determined by an evaluation function. 
Second, a function that codes one of the rules is 
called. Each of the rules in table 1 is coded in 
a C function. Although there are only 16 rules, 
there are 256 possible neighborhood states. The 
implementation uses an array of pointers to 
functions to minimize the search time for the 
appropriate rule. 
Each element of an array of 256 pointers to 
functions is initialized to point to the 
appropriate rule (Figure 4). The result of the 
evaluation of the neighborhood state becomes an 
array index which permits the appropriate rule to 
be called with a single array access. 
Third, a set of utility functions provides support 
to the rule functions. These utility functions 
supply the intelligence of the third input to the 
vectorization decision. 
Production environment 
The U.S. Geological Survey (USGS) uses its graphic 
products as data sources for a variety of digital 
products. Map separates are scanned at 
resolutions of 750 or 1000 lines per inch. Such 
data sets typically contain 250 to 450 million 
pixels.  Run-length encoding will usually compress 
such data sets to less than 20 Mb. The goal of 
this project was to produce nonproprietary 
vectorization software that would operate directly 
on these run-length encoded data sets. 
The software was designed to use relatively little 
memory. In general, directory structures (e.g. 
node-line and line-node lists) are kept in memory, 
but at any one time the bulk of the coordinate 
data are on the disk. The memory required by the 
directory structures can be as high as 20 Mb for 
large USGS maps scanned at high resolutions 
(although less than 10 Mb is average), so it is 
not practical to vectorize these very large data 
sets on PC's. But UNIX workstations with 32 Mb or 
more of memory can easily vectorize the largest 
USGS data sets. 
Keeping coordinate data on disk instead of in 
memory results in slower performance. But 
vectorization is a batch process that takes a 
small amount of time compared with the total time 
and cost of producing a digital data set from a 
map. On a 17-mip UNIX workstation, this software 
will typically vectorize a 1:24,000-scale 
quadrangle overlay in less than 1 hour.
1
2
...
44
45
46
47
48
...
1020
1021
Full text: XVIIth ISPRS Congress (Part B4)

Access restriction

Copyright

Note to user