me XXXIX-B4, 2012
ther words, there is no
ional publish/subscribe
from data sources.
>, this paper proposes a
idaptor module in a
iy accept pushed data,
nat pull data from data
a to the next module in
e this proposed solution
UTION
adaptor module in a
sor data from pull-based
er. Other modules of a
us query engine, are out
'ee major components,
ve feeder, and (3) sensor
nit, the query aggregator
oid redundant requests.
et new data with the
Finally, the sensor data
sor data according to the
chitecture is shown in
ensor data source as it is
ds to share sensor data
nly supports pull-based
iajor issue we mentioned
ree
zi
che 4
—À
7T notifications
tor
uu
: and workflow
present the details of the
T.
the query aggregator, we
ensor web context. Since
phenomenon (e.g. wind
ion and time point, each
following five elements,
entifier, à measurement
'raphical location, and à
International Archives of the Photogrammetry, Remote Sensin
g and Spatial Information Sciences, Volume XXXIX-B4, 2012
XXII ISPRS Congress, 25 August — 01 September 2012, Melbourne, Australia
time point. Moreover, since sensor readings are pushed to a
sensor web service for users to retrieve, some additional
parameters are required to locate the sensor readings, such as
service location on the Internet (i.e., service URL) and the
observation offering ID in the OGC context.
Therefore, when users want to register a query for sensor data in
OGC SOS, they need to specify the service location, an
observation offering ID, a observed property URI (which is the
identifier for the physical phenomenon), a geographical
coverage (i.e., a bounding box), and a temporal coverage (i.e., a
time period). In addition, since the objective of this proposed
system is to retrieve “new” data in a timely manner, the
temporal coverage could move forward as time goes by, which
is called the sliding window. Besides the sliding window, there
are two other types of temporal window, namely, fixed window
(the temporal coverage will not change) and landmark window
(the start time point is fixed while the end time point is
moving). Therefore, in our system, users need to specify the
type of temporal window they want to use.
After defining what a query is in the senor web context, we now
present the functionality of the query aggregator. Since most
sensor web services are based on pulling interaction model, the
input adaptor needs to proactively requests data from services.
However, since queries from users could have different but
overlapped geographical and temporal coverage, if we pull data
from sensor web services based on each query, the overlapped
spatio-temporal coverage will be transmitted redundantly. These
redundant transmissions could cause huge and unnecessary
burden on both service-side and client-side as the amount of
sensor data growing rapidly. Therefore, we propose the query
aggregator to aggregate and filter out unnecessary requests to
pull data from sensor web service efficiently. We consider this
query aggregator as one of the major contributions of this paper.
In the query aggregator, we utilize the LOading Spatio-
Temporal Indexing Tree (LOST-Tree) (Huang et al. 2011) as
data loading management component to aggregate user queries
and avoid redundant data transmission. LOST-Tree uses two
key ideas to aggregate requests and specify the loaded portions.
First, LOST-Tree applies predefined hierarchical spatial and
temporal frameworks, so that both the spatial and temporal
extents of requests can be indexed for loading management.
Since the frameworks are predefined, LOST-Tree can simply
compare spatial and temporal indices between requests to filter
out redundant transmission. Also, because the frameworks are
hierarchical, LOST-Tree can aggregate several indices to attain
a smaller tree size, which consequently results in a smaller
memory footprint and query latency. In this paper, we use
quadtree as the spatial framework and Gregorian calendar as the
temporal framework. Second, LOST-Tree uses only the spatio-
temporal extent of requests to specify the loaded portions. Since
LOST-Tree only manages the spatio-temporal extent of
requests, LOST-Tree does not grow with the sensor data
volume, which also allows LOST-Tree to attain a small memory
footprint and query latency.
23 Adaptive Feeder
After the query aggregator aggregates and filters out
unnecessary requests, the aggregated requests are forwarded to
the adaptive feeder. The major problem to retrieve sensor data
from a pull-based data source is that we do not know when a
New data will be available in the service. À naïve solution is to
frequently and periodically send requests to the SOS servers.
However, this approach could generate many unnecessary
requests with empty-hit response (i.e., no data contains in the
response).
Therefore, in order to address this issue, the adaptive feeder
attempts to predict when new data will be available in SOS
servers. By detecting the sensor sampling frequency (i.e., the
frequency that a sensor measure a phenomenon), the adaptive
feeder modifies the requesting frequency accordingly. Although
the sampling time (the time that the data was measured) and
valid time (the time that the data is available online) are
different, a client can only speculate the valid time from the
sampling time, as the valid time is not available for the client.
In our current adaptive feeder design, the best scenario is that
the new sensor reading becomes available right after it is
measured (i.e., small difference between sampling time and
valid time). The adaptive feeder will be able to retrieve the data
in a timely manner as the prediction is close to reality.
However, sensor readings sometimes need to be buffered or
calibrated before being inserted into web service. In this case,
even though the valid time could be very different from the
prediction, the adaptive feeder can still retrieve data no later
than the sampling frequency as soon as the data becomes
available online.
3. EXPERIMENTAL RESULTS
In this section, we present the preliminary experimental results
of the proposed system. We tested the proposed solution on two
existing sensor web services (here we name them as service A
and service B). While both services have the same sampling
frequency (around 15 minutes), these two services have
different data update behaviour. Service A makes the sensor
data available as soon as it receives data from sensors, which
could be our best scenario. Service B first buffers or calibrates
sensor data before making them available online, in which the
sampling time is far from the valid time.
It is worth to note that in addition to the aforementioned
prediction time, we also add a buffer time (i.e., 30 seconds) to
accommodate the possible delay when services make data
available online. In this case, our results would be 30 seconds
worse than the best scenario. This buffer time will be adjusted
to a shorter setting after we get more testing results.
We record the difference between the time point that we get the
new data and the time point that the latest reading was
measured. This time difference evaluates how "real time" the
proposed system can achieve. Table 1 shows the preliminary
experimental results including the average and standard
deviation of time difference, the number of unnecessary
requests (i.e., request that does not retrieve any new data), and
the total number of feedings performed in this experiment.
As we can see in the column of service A (i.e, the best
scenario), we can retrieve new data in the time slightly larger
than 30 seconds, which is the buffer time. In addition, all 21
feedings are able to retrieve new data, which means there is no
unnecessary request in the case of service A.
On the other hand, as we can see in the column of service B,
since service B does not make data available online as soon as it
is measured, the adaptive feeder will send requests every
detected sampling frequency, which consequently causes many
unnecessary requests. As we can see from Table 1, there is a 90