master, pipeline clock ne mm
logd clock = — —
th th ge
n Stage n-1"" stage 1" stage
(0) 77 707. (n=1)
Vpn T7 7 (h ) Anz! Yign-3 Yi (hy)
n —
i ale.
Fig. 1. Systolic convolver array. (during cycle t)
The compression filter realizes the convolution computation as
follows (refer also to fig. 1): «it consists of a linear pipelined
array of processor cells i, each of which contains a read/write
register which can be loaded with one filter kernel coefficient hy.
Each input datum x; is broadcast to all cells, where it is multiplied
with the corresponding coefficient. This product is added to the
partial result y;, which moves systolically through the processor
chain, from the left to the right, and has been initialised to zero
before entering the first stage. During each cycle, Ye accumulates
one (datum * coefficient) product term, before it is passed to the
following processor stage. Thus, the partial results have grown into
full convolution results by the time they leave the pipeline. In
table 2, for a number of computation cycles it is shown what happens
in each stage, assuming a filter kernel length N = 3.
From the table, it is seen that all inner product terms in which
a particular input datum is involved, are calculated while the latter
is broadcast to all cells. This avoids the necessity to store the
input data stream. Systolic architectures which make use of broad-
casting in order to distribute data are usually referred to as semi-
systolic [3].
404