310 Prakt. Met. Sonderband 52 (2018)
different resolution are rescaled up or down based on the base resolution scale with
bilinear interpolation to achieve uniform meters/pixel for all images. Thus the images are
converted to be same in scale but different in resolution.
As shown in Fig. 2 the U-Net only allows the insertion of images with a resolution of 572 x
572. To be able to segment images with an arbitrary resolution each image is sliced into n
quadratic tiles in a way that later on those tiles can be concatenated to recover the original a
resolution of the input image. The implementation is based on the overlap-tile strategy i:
presented by Ronneberger et al. [2]. ie m
Fig. 2 shows, that the output segmentation map (388 x 388) of the U-Net is smaller than a
the input image tile (572 x 572). Which means that certain pixels of the input image will not
be part of the output. For the inner tiles (A in Fig. 5) simply pixels of the neighboring tiles
are attached to the input and give a “view” over the tile borders. For outer tiles (Bin Fig. 4) j Resu
the missing green and blue areas are extrapolated by mirroring the content of the image.
572 Ho
ps
Lo
ve
HN
Y input image (2560 x 1920)
' mirrored area to have a whole-number
of tiles
| necessary space for segmenting pixels
in outer tiles
: chessboard pattern
Fig. 4: Overlap-tile strategy for an image (orange) with added regions
Unfortunately, mirroring can introduce new microstructures in an input tile when e.g. a dark
patch is only partially in the input image. Then simple mirroring can result in an artificially
generated large dark patch. And since the FCN should be able to learn features based on
size and shape it should be avoided to combine those areas. Therefore a border pattern
was implemented which was inspired by a chessboard as the structure is very unlikely to The ou
occur in the microstructure. It creates a twenty pixel width space of alternating light pixels liege
and dark pixels. fi fo
ih error
Fora ‘oo
4. Variations in Architecture te Sz
ER
It turned out that some feature maps (feature depth of 1024) showed no systematic
segmentation at all. Removing these feature maps omits roughly 18 million parameters of
the whole U-Net leaving a “small U-Net with sliced images”. This architecture was then
trained from scratch in approximately 300000 iterations. The mloU is 0.81 confirming that
the omitted convolutional layers do not contribute to the final segmentation map. Due to
the removed convolutions the resolution of the output tile increases to 452 x 452. The use
of the Small U-Net leads to a shorter training time (from 7 days on our hardware
configuration (Large U-Net) to 5 days).