Multi-Lingual Search - Overview
Prev		Next

Components

Data Servers

Data servers contain the information that user actually wants to obtain: everything else in the system is essentially scaffolding.

The content of each data server is maintained by an appropriate domain specialist institution (e.g. a museum, a government department, etc.) and tools are made available to these participating institutions so that they can do this essentially independently of the rest of the system.

The content is made available to the broker by means of the standard Z39.50 protocol (see lcweb.loc.gov/z3950/agency/) There are numerous mechanisms and toolkits for providing such access from pre-existing database systems, including some which were created under the aegis of previous European projects such as Aquarelle, ELISE and Decomate.

Metadata Repository

The metadata repository does not hold information that is of interest to the user - that is the job of the data servers - but rather information about the data servers. This information is only available, and only of interest, to the broker.

Information about a data server typically includes the set of natural languages in which its data is written, the application domain of its data, perhaps some information on minimum, maximum and median response times, etc.

The metadata repository itself consists of two sub-components: a database and a program, known as the metadata handler, which mediates the broker's access to the database. Multiple brokers may run with reference to the same repository via a single handler.

The broker's access to the metadata repository is predominantly read-only, but it may occasionally provide information to the handler for it to update the database - for example, search-performance statistics.

Multilingual Thesaurus

The multilingual thesaurus provides the framework for translation of search terms between languages and into more and less precise forms.

A thesaurus in the sense of ISO 2788 (Guidelines for the establishment and development of monolingual thesauri) and ISO 5964 (Guidelines for the establishment and development of multilingual thesauri) is a semantic hierarchy of concepts together with the words or phrases that represent them in some specific language or languages. Such a hierarchy is very suitable for traversal by computer systems, and de facto standard mechanisms already exist for computers to query thesauri across networks.

Like the metadata repository, the multilingual thesaurus consists of two sub-components: a thesaurus database and a program, the thesaurus handler, which mediates the broker's access to the database. Multiple brokers may run with reference to the same thesaurus via a single handler.

The broker's access to the metadata repository is exclusively read-only. Maintenance and update of the thesaurus lies beyond the scope of this proposal: there already exist tools for thesaurus editing.

Broker

The broker lies at the heart of the system: it receives requests from users via the front end; interprets and translates them by reference to the multilingual thesaurus; determines which data servers might contain appropriate data by reference to the metadata repository; forwards the various translated versions of the queries to the appropriate data servers; receives and collates the responses; and returns those responses to the front end for display to the user.

All of the broker's network connections use the Z39.50 protocol, giving a degree of homogenaeity to its implementation.

Front End

The front end is the only part of the system that the user sees: it presents a simple interface to searching and retrieval, together with the ability optionally to specify searches' languages and application domains.

Prev	Home	Next
Architecture		Issues