more time. It even can not load metadata more than 100,000
one time, because it may result in‘“out of memory” error .
Consequently, the efficiency of query and search service
becomes lower. GeoNetwork use lucene index engine for
querying and searching, but optimization and acceleration
strategy are not enough. When the metadata amount is big
enough, the query efficiency become slow. Meanwhile, stability
and robustness are both ruined since huge amount of metadata.
The final flaw is that the requirements of multi-user concurrent
accessing based on massive data are not effectively satisfied on
the internet.
The original GeoNetwork 2.1 can not satisfy these requirements,
so we need an optimization solution to improve.
2. HIERACHICAL OPTIMIZATION MODEL
Hierachical optimizatioin model(HOM) consists of software
level and deploy level. The software level means the methods
can be taken in some software, maybe GeoNetwork itself. The
deploy level means that these methods can be taken when
deploy the metadata service system . The software level needs
modify the GeoNetwork project source code, it is inside. The
deploy level needs construct a smart web deploy solution and
uses some specific software to get some excellent function, like
disaster recovery. It is outside.
Software
5 Cache
Processing
Deploy
Disaster Server
Recovery
Cluster
Figure 1. HOM
2.1 Software Level
1.Batch Processing
GeoNetwork's data and index Operations are based on single
record. It is almost no influence for small amount of metadata.
When the amount increases to ten thousand, several hundred
thousand or even millions, the impact is very large, the system
will be surprisingly slow. When the amount of metadata is large,
the index is also becoming large, the operation on one single
metadata, like inserting, updating or deleting, the system will
modify the index library, and then optimize it for effective
management and high speed search on index. It will take several
minutes to complete the operation on single metadata.
Batch processing is a effective and time saving solution to
resolve this kind of repeat operations. Each operation first
writes the modified metadata to database, and records the
metadata id, when all metadata writing complete, the system
will rebuild these metadata index once. It will save a lot of time.
2 .Cache
Cache technology has been considered one of the effective way
to reduce server load, network congestion and customer
accessing delay(HE Chen,2004).In the field of geo information
service, web cache technology is also widely used. Each big
electronic map website, use tiles based cache technology for
map service. A large number of cache using in client side and
server side to avoid map redraw on map server. It consumes the
processing time for request to the server, and enhance the
clients response. OGC also release WMTS 1.0.0
implementation standard, which can be used to develop scalable
and high performance services which WMS can not.
Cache technology can be used in metadata service system. On
one hand, the number of user querying is times more than
system metadata and index updating. On the other hand, users
are usually compare the query results, even repeat query, so the
results are repeatable.
We design the result cache technology based on database. When
the system gets the first query request, it performs coding
algorithm( such as MDS algorithm ), the query string encoded
as a unique value, then writes query string, coding value and
query result into database. When server gets the same request
again, it encode the query string to a value, find the value in the
database, and returns result as response. Here we can build
index for the encoded value, it is unique, to speedup query and
select efficiency.
2.2 Deploy Level
1.Web Cluster
Web cluster technology is the important method in solving the
capacity and scalability of web server system(Li
Shuangqing,2002).Dispatcher based request dispatching
mechanism is our metadata service system's load balancing
mechanism.
The metadata service system on surveying and mapping results
run on a “4+1” service cluster, shown in Figure2.The system is
deployed on a hardware server, we build 5 virtual machine,
which 4 for normal use, 1 as a backup, when any one of the 4
normal crashing, the 1 backup will be instead.
The system uses dispatcher based request dispatching
mechanism. The front-end node server uses Nginx as request
dispatcher which is a reverse proxy server. As the service
system and portal use session for service, Nginx uses ip hash as
load balancing mechanism. Each request will be dispatched to a
fixed server by Hash result of access IP, so it can effectively
solve the session problem.
The advantages of this technology are: it can ensure the system
performance and service capabilities, it is extensible, it can
overcome the Java limits on a single machine. System service
capability is related to the number of machine in cluster. The
disadvantage is that the background data synchronization is
more complex, we need synchronize several times.