Australian Spatial Data Infrastructure logo
Header image.

Australian Spatial Data Directory (ASDD)

Site search:
home | about | feedback
Modified: 2005-02-09

Implementing new ASDD nodes: Isite on UNIX


Overview

Isite was originally developed by Center for Networked Information Discovery and Retrieval (CNIDR) and is now managed by A/WWW Enterprises

  • index, search, present XML document collections
  • can connect to a separate program to present a full metadata record
  • various other structured text data formats ... DIF, colon-delimited, HTML, email archives

Documentation references

Further information is available at ...

Download the software

The current Isite2 is available at A/WWW Enterprises (follow the link from the home page to "Isite Distribution", then to the "Isite2" directory).

Unpack the software into a preparation directory on your system. Follow the Isite2 instructions if you are building from source.

ASDD customisation

Configuration files and the Isearch "doctypes" called "anzmeta" for indexing and searching ANZLIC metadata documents have already been added to the Isite2 distribution.

Create some directories

Set up the following directories. They can be anywhere on your server and do not need to be under the HTTP server documents root directory. Probably in a different place to your current installation ...

  • $ISITE_BUILD ... where you unpacked the Isite distribution
  • $ISITE_BIN ... where Isite binaries and configuration files will be installed
  • $ISITE_CONF ... you might prefer to keep the configuration files separate to the binaries
  • $ISITE_DATA ... where the collections of metadata documents are located
  • $ISITE_DB ... where Isite virtual databases will be built during indexing

Compiling

If you are using the pre-compiled binaries, then you can skip to the next section.

To compile and build Isite you will need to have the "gcc/g++" compiler and libraries properly installed.

  • Change directory to where you unpacked Isite ($ISITE_BUILD)
  • You may need to edit the top-level Makefile to set the appropriate compiler flags for your platform
  • Type "make" (do not use "make install" unless you are very familiar)
  • The binaries that are produced will be in the $ISITE_BUILD/bin directory

Installation

Installation is currently a manual process.

Copy the following binaries from $ISITE_BUILD/bin to $ISITE_BIN ...

  • zserver ... the Isite Z39.50 server
  • zbatch ... command-line client for conducting batch queries
  • zping ... command-line client for testing if a server is alive
  • zclient ... command-line client for conducting queries
  • izclient ... interactive text-based Z39.50 client
  • Iindex ... used to build the Isite index databases
  • Isearch ... command-line search client
  • Iutil ... utility for interrogating Isite databases

Copy all of the configuration files from $ISITE_BUILD/conf/anzmeta to $ISITE_CONF. They will be explained in the next section.

Configuration

There are example configuration files included in the Isite2 distribution at ./conf/anzmeta/

The only two configuration files that should differ from the distribution are sapi.ini and zserver.ini. All configuration files are described below ...

sapi.ini

  • Defines attributes of each document collection that is available to Isearch.
  • Defines the location of the database on the file system.
  • Identifies which method to use to search the collection:
    • Keyword ISEARCH says to use this Isearch database that was created by running Iindex (which used the relevant Isearch doctype to parse the structured data files).
  • Defines which mapping files (see below) to use with the collection.
  • Example sapi.ini

zserver.ini

  • Used for initialisation when the Zserver is started.
  • Defines which databases (that are described in sapi.ini) are to be mounted and made available for searching via the Zserver.
  • You could define a different port number in this file if you want to leave your existing Zserver running.
  • Example zserver.ini

anzlic.fields

  • The Isearch doctype uses this file to define the data type of each particular metadata element (field) within the XML document.
  • The anzlic.fields file is specified in the Iindex command at indexing time.
  • Values: num (numeric fields e.g. <northbc>), date (single date e.g. <begdate>, <metd>), date-range (a range of dates i.e. <timperd>), gpoly (greater bounding polygon e.g. <bounding>).
  • Any field that is not defined in this configuration file is assumed to be a text field.
  • Example anzlic.fields

"Use Attribute" maps

  • Z39.50 insulates the user from the actual schema (structure and field names) of each document management system.
  • These configuration files enable the standard interface "Use Attribute" numbers (the numbers behind the WWW interface pick-list names) to be mapped to the actual field names in each particular document collection.
  • Each collection of documents can have various different mapping files.
  • These mapping files are specified in the sapi.ini configuration file.
  • Examples for use with the Isearch doctype "anzmeta" for ANZMETA DTD XML files

Prepare the metadata collection

Every node must be able to present HTML, SUTRS (plain text) and XML versions of the metadata. Geospatial metadata management facilities are used to prepare, store, and present the metadata.

The Z39.50 server can conduct searching and presentation using either a collection of XML metadata documents or connecting to a database or other repository.

The structured metadata documents of XML files must conform to the ANZMETA Document Type Definition (DTD) and should have .xml filename extension to allow the Z39.50 server to present the XML file if requested by a capable client.

The document collection will be indexed by a Isearch "doctype".

Index the metadata

This section explains how to index a collection of XML metadata documents. If you are using a database to store, search and access your metadata, then you need to configure relational database access.

The "Isearch" component of Isite uses software called "doctypes" to read and interpret the XML metadata documents. The doctypes have dual roles: to index the metadata to create a searchable database, and to conduct searching and present the results in whatever form that is requested by the client.

Each collection of dataset descriptions has three files for each dataset description ...

  • basename.xml ... the structured metadata
  • basename.html ... the HTML file which is used at presentation time (could have .htm or .html extension)
  • basename.txt ... the plain text file (SUTRS) which is used at presentation time

Below are example Iindex commands to prepare a searchable database of your metadata. You could place the command in a shell script. The examples assume that all documents are in one data directory. If your data is in separate directories then you could use the UNIX commands "find" and "sed" to automatically prepare the list of pathnames to feed into Iindex.

To index a collection of XML files that conform to the ANZMETA Document Type Definition (DTD) ...

# the name for your index database of dataset descriptions
DB_NAME=test1

# run Iindex to parse the XML files
# using the Isearch doctype called "anzmeta" 
$ISITE_BIN/Iindex -d $ISITE_DB/$DB_NAME -t anzmeta -m 4 \
-o fieldtype=$ISITE_BIN/anzlic.fields $ISITE_DATA/*.xml

There are some issues with indexing large collections of documents ...

  • refer to the "Tutorial" in the Isite documentation
  • use the "find" command to feed a list of files to Iindex
  • the -m option tells Iindex how many megabytes of metadata to read into memory for indexing
  • if you use a small value for -m, then you will build a fragmented index (more than one .inx file) that will be slightly slower to search (we have not found this to be a problem)
  • do not use the "optimize" command anymore
  • here is an example indexing command for large collections ...
    find $ISITE_DATA -name "*.xml" -print | \
    $ISITE_BIN/Iindex -d $ISITE_DB/$DB_NAME -t anzmeta -m 8 \
    -o fieldtype=$ISITE_BIN/anzlic.fields -f -
    
  • be patient, it may take a long time

Start the Z39.50 server

Start the Zserver by issuing the following UNIX command. The server will start up, mount the specified databases, and then listen on the specified port. You could place this command in a shell script.

# start the Z39.50 server and run it in the background
$ISITE_BIN/zserver -i$ISITE_BIN/zserver.ini &

You may also want to redirect the output to a log file if you want detailed connection and searching information for parsing of log files and generating usage reports. The standard log file (specified in zserver.ini) collects only very basic connection information.

# start the Z39.50 server,
# redirect STDOUT and STDERR to be appended to a log file,
# and run zserver in the background
$ISITE_BIN/zserver -i$ISITE_BIN/zserver.ini > zserver-200102.log 2>&1 &

Testing your node

The best way to test your node is to use the Isite program called "zbatch" which allows you to specify a set of queries in a plain text file. Zbatch will connect to the specified server and run the queries sequentially.

See the document Testing Isite ASDD nodes.

Hosted collections

Any one node can also host a collection of geospatial metadata for another organisation. In that way, an organisation that does not have an actual Z39.50 server can appear to be a fully-fledged node of the ASDD. The collection of XML documents and the corresponding presentation documents need to be on the same machine on which the Zserver is running. Simply index the collection of documents using Iindex and define all of the collections in the sapi.ini and zserver.ini configuration files.

Register your node

Follow the instructions to register your new node with the ASDD gateway WWW interface.