The architecture message read as follows:
From: Mike TaylorTo: William.BONNET@tcc.thomson-csf.com Cc: dinosearch@egroups.com Subject: Re: Info Request : Dino place in paris > Date: Thu, 18 Jan 2001 11:36:04 +0100 > From: William.BONNET@tcc.thomson-csf.com > > I'm really interested by the message you posted about dinosearch. I > had answered to Ekaterina Amalitzkaya about the finding of reprints > in this way. I was thinking about the idea to set up such an online > database. > > If I could be any help let me know. I'm a beginner here about > paleontology (amongst others fields...). It's one of my > passions. So I saying many stupid things :) but I'm better at > computer science (at least I hope :) ). Well, because it's a European-funded thing, and they like to deal with companies and other instuitutions rather than individuals, I think that the most helpful thing you could do at this stage would be to stir up interest in your employer (or any other organisation that you're affiliated with). Do you work for someone relevant? And what country are they in? Beyond that, at some stage we'll be looking to launch a volunteer program to get papers into the system: this could be a matter of scanning and OCRing (whole papers or just abstracts), or re-typing abstracts, or choosing keywords, or any number of other activities. At some stage, we'll need to someone to co-ordinate that work. Interested? And of course, we'll need people to actually _do_ the work! > I actually don't know how your project will be set up. I was > expecting to build an online database with MySql, Apache and a Linux > Server I can access at will (as long as what I put online is > non-commercial, legal, etc.). We're planning bigger than that! :-) The architecture is a federated hierarchy of brokers. Here's what I mean: Any number of institution -- individual publishers, reprint agencies and other organisations -- can maintain their own archive servers. Each server presents a standardised Z39.50 interface, which can be searched uniformly by a single, standard client (accessed via the web.) Rather than connect a client directly to a server, what you'll more often do is connect it to a broker that forwards the query to a bunch of servers that it knows about, and synthesises all the responses into a result set that it forwarded back to the client. From the point of view of the client, the broker is Just Another Server; and from the point of view of the server, the broker is Just Another Client. Because this system is "plug-compatible", brokers may in fact delegate a searching not directly to a server, but via another broker: it neither knows nor cares whether any given server is "real" or a broker. So one can build arbitrary hierarchies, allowing autonomy to departments, institutions, provices, countries, etc. Consider the following example simplified topology (hope you're reading this with a fixed-width font): Client | | global broker ______________/\______________ / \ USA European broker broker ______/\______ ______/|\______ / \ / | \ AMNM NMNH France Germany England ... server broker broker broker broker ______/\______ /|\ ______/\______ / \ ..... / \ NMNH NMNH Oxford NHM fossils cladistics server broker server server ______/\______ / \ NHM NHM Kensington Glasgow broker server ______/\______ / \ NHM NHM Saurischia Ornithischia broker server ______/\______ / \ NHM NHM Sauropod Theropod server server So if the client issues a search for Baryonix, then that search will get propagated down through the hierarchy, and if it gets a hit on a paper help by (say) the English Natural History Museum's conjectural sub-department of Theropoda, then it will be passed up through the NHM Saurischia broker, then NHM kensington broker, the NHM broker, the England broker, the European broker and the global broker, which returns it to the client _exactly_ as though the global broker had been an archive server containing the paper directly. In effect, it behaves as a union catalogue for the entire world; or the client could instead speak directly to the USA broker, or indeed (if you know exactly what you want already) the NHM Ornithischia server. You can bet, though, that Linux, Apache and MySQL (and Perl) will be right in the heart of what gets built. If if you have money to spend on tools, you still want to use what's best, right? Even if it's free :-)
The original architecture message is stored in the DinoSearch mailing list's archive, at www.egroups.com/message/dinosearch/2