Non-Bibliographic Applications of Z39.50 John A. Kunze jak@violet.berkeley.edu, 510-642-1530 Information Systems and Technology 293 Evans Hall, UC Berkeley Berkeley, CA 94720 5 March 1992 Although the Z39.50 Information Retrieval protocol is rapidly gaining acceptance as a standard for interoperability among networked library automation systems, it has not been obvious how to make it work for non-bibliographic applications. This article describes how Z39.50 is being used as the basis for a networked campus information system called Infocal [Kunz90] at the Univer- sity of California at Berkeley.* The datasets we are making available include phone directories, class schedules, library catalogs, public announcements, and com- puter documentation and software. As Infocal is being developed within the Information Systems and Technology department, the latter two sets provided the original implementation incentive. The majority of the information will originate on campus, though some will be purchased or licensed. Most of the non-proprietary information will be available over the Internet. We avoid describing the project as a Campus Wide Information Sys- tem (CWIS) because CWISses often come with extra baggage such as electronic mail, bulletin boards, and student registration sys- tems. Ours is a straightforward read-only, client-server-based campus information system with two extra goals: (a) accommodating non-textual data and (b) interoperating with other information systems. As a result, we went looking for a protocol, and Z39.50 seemed the logical choice. The challenge became how to use it to sup- port all our non-bibliographic requirements, which is what this article addresses from the point of view of the designer of the client program. This is the program sitting between the user and _________________________ * This work was partially supported by Digital Equip- ment Corporation and Sun Microsystems, Inc. DRAFT 5 March 1992 DRAFT - 2 - the network, whose job it is to translate user commands into Z39.50 protocol requests for the server and to translate server protocol responses into information to display back to the user. 1. Interface Features Involving Z39.50 In Figure 1 is a list of interface features whose support will involve Z39.50. + bibliographic databases + non-bibliographic databases + data of unknown, but learnable, semantics + full text documents + non-textual documents + hierarchical browsing + hypermedia links + retrieval by object/document ID Figure 1. Client Features to Support The protocol, having come out of the library automation commun- ity, is straightforward to apply to the first feature, namely, retrieval from databases of bibliographic citations, where records have highly predicatable structure and semantics. What about databases whose record structure and semantics are non- bibliographic? What about retrieval from databases whose record structure and semantics are known to the server programmer but not known to the client programmer; is there a way for the proto- col to transmit meta-information that would allow the client to retrieve data from such a database? Then there's the problem of full text documents; should the pro- tocol view a document as a database? Should it view a document as a record? As an element of a record? Or, as a set of records? The protocol's retrieval model requires that records have a fixed maximum size for the duration of a session; this means that the programmer, knowing that documents can have highly skewed size distributions, must decide whether to set the maximum record size ridiculously high and forgo any optimization that might have otherwise been obtained, or to fragment documents across record boundaries. The retrieval model also requires that records belong to the result set of a query, which is to say that you are not allowed to retrieve a record unless it has satisfied some set of search criteria in advance. Therefore if you are retrieving document fragments, you may have to invent a way to conduct a ``fragment search''. One problem in retrieving non-textual data, such as images or video, is that the sizes involved require the support of data compression. There are other more serious complications, such as real-time protocol needs of video data, that are not addressed here. DRAFT 5 March 1992 DRAFT - 3 - Hierarchical browsing, however unfashionable, is still an impor- tant access method for campus information system users. It seems that there will always be a very popular subset of information on a server that people prefer to get at by exploring a small, shal- low tree of information rather than by doing an index search. This is analogous to going to a restaurant and preferring to browse the menu instead of asking the waiter to list all dishes containing potatoes but not anchovies. To support hierarchies, the protocol will have to allow clients to discover tree node names inside menus, and to retrieve node contents using the search facility. The next feature, hypermedia links, would get a great boost from getting hierarchical browsing to work. The last feature listed, retrieval by object (or document) ID would likewise benefit greatly from being approached at the same time as the previous two features. If we had unique, server-qualified document IDs, users could re-use them to retrieve a known document without hav- ing to re-construct the entire search leading to its original discovery. Document IDs would be valid across sessions, could be shared with other users, and submitted in relevance feedback queries without incurring the overhead of sending entire docu- ments in the query. The Wide Area Information Server (WAIS) pro- ject has made a proposal on this [Kahl90], and the Digital Object ID Project under the Coalition for Networked Information is just getting under way. 2. Mainstream Bibliographic Z39.50 As a basis for later discussion, it is useful to outline how to apply Z39.50 to straightforward bibliographic information retrieval. The protocol has seven separate services, of which only three need mention here: the Search, Present, and Explain facilities (Explain is currently under development [Lync90]). The Search facility defines how a query is represented and Present defines how information and diagnostic records are represented. Although there are four kinds of queries, we need only consider the Type-1 query, since it is the only one currently being used for interoperability testing. Semantic Modules. One of the claims of Z39.50 is that it does not restrict the kind of query you submit or the kind of data you get back. On the other hand, it is easy to see how an implementor might think, having just read the standard [NISO91], that too few details are given to build an interoperable system. This is because Z39.50 is modularized in such a way as to isolate those parts of the protocol that would require data-dependent assumptions. Modules are identified and can be replaced to accommodate new sets of assumptions. Each module is assigned an unambiguous tag or identifier by the National Information Standards Organization (NISO) and the DRAFT 5 March 1992 DRAFT - 4 - protocol guarantees that the client and server always understand what modules are currently assumed during a session. The way it pulls this off is to make sure that the module identifiers get tucked into protocol control messages whenever they are needed. Figure 2 identifies four protocol modules that completely define the structure and semantics for protocol control, queries, infor- mation, and error messages. Module Name Describes Bibliographic Instance Z39.50 kernel control query type/attribute set queries type-1/bib-1 record syntax information MARC diagnostic set error messages bib-1 Figure 2. Z39.50 Semantic Modules The official Z39.50 terminology for these modules is the query type/attribute set pair, record syntax, and diagnostic set. The attribute set applies only to the Type-1 query, which we are already assuming. It defines several concepts about the search term and assigns a number to each one. A list of these numbers accompanies the search term so that the server can understand what element is to be searched, what relational operator to use, what position in the element, etc. The record syntax, often called transfer syntax, is much more than simple lexical struc- ture; it embodies intimate structural and semantic knowledge about a sequence of bits. A diagnostic set is just a set of mes- sages and message numbers that the two sides agree on. The standard document goes only so far as to define effectively one set of modules (and a few close variants), the minimum required to build an implementation, in particular, a biblio- graphic implementation. The specific bibliographic modules it defines or refers to are listed in Figure 2: the Type-1 query with bib-1 attribute set, the MARC machine readable archive record format [MARC88], and the bib-1 diagnostic set. It also defines the module identifiers, armed with which the programmer keeps track of the current semantic context. Software Modules. The client's user interface is written with code hardwired to understand whatever combinations of modules it cares to support. The control module makes sure that neither client nor server ever has to deal with semantics that it did not agree to (through lower level negotiation). For the bibliographic case, the client programmer could construct menus for the user interface that list searchable elements such as author, date, and title, or valid relational operators, such as equals, less than, or greater than, since they are all in the semantic module known as bib-1. Code can also be written that understands how to read a record as a sequence of bits in the MARC format and display it to the user. DRAFT 5 March 1992 DRAFT - 5 - Non-bibliographic Modules. This completes the outline of how to apply Z39.50 to the imple- mentation of the bibliographic case. Now, what do you do dif- ferently to support new, non-bibliographic semantics? The answer is, (1) define new semantics for search, display, and diagnostics, (2) get the new attribute set, record syntax, and diagnostic set registered with NISO, (3) write code to support searching and displaying these seman- tics, and (4) plug in the new code modules to support the new semantic modules. For example, consider supporting a student directory database. You need a new attribute set which might include element names such as NAME and EMAIL ADDRESS, but not PHONE NUMBER; note that even though we might have record semantics that support the display of phone numbers, we might be using an attribute set that does not support searching on them. In the example in Figure 3, the record syntax is the same one used in the Campus Wide Infor- mation System Protocol (CWISP) [CWIS91], which has a printable ASCII format. As for error messages, you might just choose to keep on using the bib-1 diagnostic set as it is fairly generic. + record syntax = CWISP name: Kunze, John A. email: jak@violet.berkeley.edu phone: 510-642-1530 + attribute set = student-1 + diagnostic set = bib-1 Figure 3. Example: Student directory database This approach to non-bibliographic data comes with problems. There is a lot of work in writing code to support each new set of semantics. Imagine defining a new set for each one of your data- base formats, going through the registration process with NISO, then writing code to support each. If a database format or user search requirement changes, you need to re-register with NISO and stop interoperating with the systems you used to work with until the new format gets approved. If you want to support more than one set of semantic modules, your code will have to link in all your corresponding program modules. With code hardwired to each attribute set and record format, there is nothing to prevent one user interface to ten database formats from looking, in the worst case, like ten different user interfaces, which is exactly the situation that Z39.50 was conceived to avoid. Berkeley's Infocal system will have half a dozen database formats at least, and they will each change probably once a year, so this DRAFT 5 March 1992 DRAFT - 6 - scheme will not work for us. Fortunately, there is another approach to non-bibliographic information that simultaneously addresses the problem of retrieving from databases whose format is unknown until the database is first accessed. What is needed, as before, is an attribute set, a record syntax, and a diagnostic set, but they must be general-purpose and dynamic. 3. The Dynamic General-Purpose Attribute Set info-1 For the purposes of discussion it will be valuable to define some terms. Use Attributes and Elements. Server information records are structured so as to allow search- ing on and retrieval of different combinations of individual record components. These processes, completely defined by a server, are mapped onto canonical Z39.50 processes so that clients have a common language to access information. In order to give servers adequate flexibility, Z39.50 distinguishes between searchable and retrievable record components. Searchable components are called Use attributes and retrievable components are called elements. A server is free, for example, to map a Use attribute called ``name'' onto five different record components and return records containing a corresponding element that is mapped from a sixth component, or perhaps containing no corresponding element at all. Explain Facility. The Explain facility is a set of mechanisms integrated with the Search and Present facilities that retrieves information, both human and machine readable, about an information server. For example, searches against the reserved database name, ``_IR_Explain_'', can return a schedule of server availability, a list of database names, and a list of browsable hierarchies. Each database name may be searched on the reserved Use attribute, ``_IR_Explain_'', to return a list of valid attribute combina- tions, various levels of human readable record component descrip- tions, element set names, etc. When the client encounters an un- known database, it uses the Explain facility to build menus for searching, together with help text for explaining the query semantics. Objects. In this article, an object is anything (book, painting, person, digitized image, electronic document, etc.) having an associated electronic information record that may be accessed through a unique identifier, called the object identifier (objectID), about which more will be said later. This record may be thought of as an object citation, the elements of which describe aspects of the object such as its name, what kind of thing it is, where it comes from, when it came into being, where it is currently located, and DRAFT 5 March 1992 DRAFT - 7 - a brief description. An object citation maps easily onto the Z39.50 record model and is a natural access point for many, if not all, aspects of the object. Therefore, if the object exists primarily in electronic form, a citation element may contain a copy of the object itself. In cases where Explain is not used and where sophisticated pro- cessing is not required, a server may map these record components to a small set of generic, pre-defined Use attribute values and element names for search and retrieval, respectively. To be use- ful most of the elements would be printable ASCII suitable for human consumption. This provides a kind of base level service from databases of otherwise unknown semantics without requiring the Explain facility. The table in Figure 4 below shows the gen- eric names (given symbolically here, although the protocol uses integer tags) in the first column, and examples of how four dif- ferent databases might be mapped to them. The same tags can be used to identify returned elements. Tag Biblio db Personnel db Course db Art db type "book" "employee" "class" "objet d'art" name title empl. name class title title by author organization instructor/dept artist location publisher phone/address room/building owner date pub. date hire date meeting time creat. date abstract table of cont. job descript. course descript. descript. object text or bitmap or Figure 4. Generic Use Attributes (Elements) and Example Mappings The attribute set info-1 (registered with NISO as 1.2.840.10003.3.1000.2.1) was created by first making a copy of the bib-1 attribute set, complete with its broad categories, or types, of attributes, and erasing all the pre-defined bib-1 Use attribute names. A small set of Use attribute names was then reserved, combining the above tags with those in Figure 5. DRAFT 5 March 1992 DRAFT - 8 - Attribute Types Type Values Use [ statically undefined except for generic tags ] Relation =, !=, >, >=, <, <= Position first in element, last in element, ... Structure date, name, word, ... Truncation right, left, ... Completeness ... Other Pre-defined Use Attribute (Element) Tags userinfo anything not covered elsewhere any any element(s) the server wants keywords extra index terms record size estimate in bytes record update last update of this record provider who maintains this record objectID for efficient object access _IR_Explain_ to bootstrap definition of other tags Figure 5. Dynamic Attribute Set info-1 This sketches out a minimal generic query interface. What about the problem of displaying server responses to the user? A gen- eric diagnostic set is not hard to define, and the bib-1 diagnos- tic set with a few additions would be an adequate start. Coming up with a generic record syntax is a little more complicated. 4. The Dynamic General-Purpose Record Syntax info-1 Referring back to the list of features in Figure 1 helps motivate some of the design decisions for a dynamic general-purpose record syntax. The client still needs a way to discover what is being returned from a database of dynamically defined or unknown seman- tics. It also needs to support documents, images, hierarchies, and hypertext. There is no obvious way to do all this within the existing protocol control kernel, so the next place to look is in the semantic modules for queries and information. The semantics of information returned by a server is given in three ways. In classical Z39.50, a registered record syntax, such as MARC, informs the client that a stream of bits is struc- tured into specific elements containing well-defined types of data. In a second scheme (the flip side of the generic Use attribute tags in Figures 4 and 5), generic element tags accom- pany returned data elements. This involves using the record syn- tax info-1 (registered with NISO as 1.2.840.10003.5.1000.2.1) to hold elements that for the most part contain visible ASCII. The third scheme uses the info-1 syntax for records with semantics dynamically defined with Explain and the element set name DRAFT 5 March 1992 DRAFT - 9 - parameter (below). This allows for tagged data or, when effi- cient volume transfer is called for, positional, untagged data. Documents. As mentioned earlier, an object citation record provides a natural way to access the object itself, provided it is stored in an electronic form. In considering online documents, probably the most important example, several issues come up. How does a client request a document citation minus the full text? How does a full text data element fragment across records? How do cita- tions relate to links needed for hierarchical browsing and hyper- text? How does a client select the form of a document that varies in several dimensions, such as word processing format, language, compression technique, and version? How does it even find out what these variant forms are? These questions general- ize easily to other electronic objects such as digitized images. Variants. It is not always feasible to index a group of closely related objects separately. This means that there will be cases when a single Z39.50 result set record is the only access point for mul- tiple underlying variant objects. Objects can vary in four dimensions, called composition, encoding, language, and version. For example, a pamphlet may be available with composition of TEX, Postscript, and Troff, compression encodings of JPEG and ``compress'', and in languages French, Spanish, and English. One citation for this pamphlet would cover 18 variant pamphlets (note that some variants, and objects for that matter, may not need to be stored but are generated on demand). Registered tags known as qualifiers identify variants in client requests using the element set name parameter described below. With the info-1 record syntax, server responses can include qualifiers with returned data. The version qualifier together with a variant message allows a server to define a variant dimension of its own. Clients may find out what variants exist for an element also using the element set name parameter; the server responds with a record, each element of which is empty except for a combination of variant qualifiers. The current qualifiers for info-1 appear in Figure 6. Composition Encoding Language 1 Text 1 UNIX compress 1 English 2 Hytext 2 UNIX tar 2 French 3 PostScript 3 JPEG 3 Spanish 4 TIFF 5 MARC Figure 6. Currently Defined info-1 Qualifier Tags DRAFT 5 March 1992 DRAFT - 10 - The composition Text refers to an ASCII text variant that would display reasonably in an 80-column window, containing lines ter- minated by ASCII NL and possibly CR. The composition Hytext refers to a hypertext variant similar to Text that may contain short or long objectID references of the form @(shortref@ objectID @)shortref@ or @(longref@ objectID @)longref@ Although the generality afforded by these variant qualifiers is indispensable, it is unlikely that any given Z39.50 dialogue will make use of more than a handful of them. Instead of including them in each Present, in the long term it makes sense to let them default to values determined when the dialogue is first esta- blished, but at the moment the protocol does not support this. ObjectIDs. Central to any system that supports hypermedia and hierarchical browsing (the second being a special case of the first) is a robust, generalized way to reference a networked object. As pointers to objects, objectIDs are efficient to exchange and remain valid as the underlying objects are updated. If indivi- dual records in a search result set have objectIDs, they provide a short cut to accessing those records the next time. Earlier an objectID was defined as a unique identifier used to access an electronic information record associated with an object. An object may have multiple associated information records, as long as each has a unique identifier. Identical copies of an object (e.g., for redistribution) may have different objectIDs for routine access, even though this poses problems for clients that collect objects from disparate servers and need to know when they have more than one copy of the same object. For this reason the original objectID is carried within the objectIDs of copies. Each element of an information record contains an aspect of an object, including, for example, an electronic instance of it. In order to represent this with the info-1 record syntax, we need to define a field as an info-1 component containing either an entire element or a fragment of it. In order to access an element frag- ment, a minimal objectID must contain the server name, internal object control number, and fragment address. In turning such an objectID into a Z39.50 retrieval, many parameters would have to be supplied by convention and much of the generality of Search and Present would go unused. For example, a more generalized Z39.50 objectID might contain server, port, database, attributes, term, element, and frag- ment. Beyond the Z39.50 context, a proposal for Universal Document (not restricted to text) Identifiers [UDI92] has made a compelling DRAFT 5 March 1992 DRAFT - 11 - argument for UDIs (objectIDs) encoded in visible ASCII that tran- scend protocol (e.g., FTP, WAIS, Z39.50). The visible ASCII requirement allows objectIDs to be exchanged in e-mail messages and to appear in printed publications. An important idea missing from the UDI proposal is optional descriptive information. Normally stored as elements of the associated info-1 record, this information needs to be bound closely to objectIDs that will appear in menus, since a separate server access will be required to return elements that the user needs to see (e.g., document title) before a selection will be possible. The close binding makes it easy to update remote menus through a kind of re-linking process. The proposal also requires that no whitespace characters appear in a UDI, but given the length of Z39.50 objectIDs, non-significant whitespace needs to be allowed to assist readability. To restrict even one of the many objectID components to, say, the number of characters that fit on one text line is not feasible. It is worth noting the similarity between this objectID format and a text-based query language. Equipped with hybrid UDI-style objectIDs, the client programmer has a powerful tool for building hypermedia systems. A few operations on objectIDs have to be defined: Open begin accessing object Read sequential or random access to object Close end accessing object Sync get fresh copy of objectID Compare client test for identical objectIDs In an object-oriented sense, these operations will depend heavily on the server and object's type. Open and Close are merely advice-giving operations; for example, they might be ignored on stateless servers that do not keep track of whether an object is open. The Read operation will likely resemble file I/O for docu- ment and image objects, with a mix of sequential and random access capability depending on the server. On the other hand, if an object is a database, a server, or a person, even sequential access is unlikely. The Sync operation returns an updated objec- tID, providing a mechanism for replacing stale descriptive infor- mation and obtaining a new forwarding address. One reason for not putting copyright disposition inside an objectID is that even with the Sync operation, an objectID could not in general be trusted to be either current or authoritative. Operations are specified using the element set name parameter. ObjectID's are hierarchical in that they consist of a sequence of increasingly specific components. The shorter the sequence, the higher level the object (e.g., a fragment has a long sequence). Some of the high level components may be inferred, so that, for example, every object in a database need not be stored with its full objectID. Not only servers but also clients must be able to parse Z39.50 objectIDs in order to understand what level of DRAFT 5 March 1992 DRAFT - 12 - object (e.g., server, fragment) is implied. A received objectID could then be modified to imply lower or higher level objects. Here is an example of a Z39.50 objectID identifying the first 4096 bytes of a simple ASCII text version of this article: {ir infocal.berkeley.edu 210 DB_docid objectID kunz92 object_text 1-4096 {} {title "Non-Bibliographic Applications of Z39.50" date 05/03/92}} Rather than use the special purpose notation of the UDI proposal, this example uses a different structure (offered without further explanation at this time as it has not been finalized) expressed with Tcl [Oust90]. This was an easy choice since Tcl expresses hierarchical lists with quoting and non-ASCII capability, and the Tcl software to read and build such lists is freely available for UNIX, Macintosh, and PC platforms. Satisfying Sections. Sometimes a search locates a record based on criteria not immedi- ately apparent to the user. For example, a server may match on synonyms generated from the user's term, or using a relevance feedback mechanism. Particularly in cases where an element that triggered a match is large and matches at multiple locations, there is a need to return a list of satisfying sections or ``hits'' within an element. While a satisfying section may indi- cate a simple segment of bytes, for some objects a server may disallow byte addressing and offer section addressing instead, where an element (e.g., a document) is divided at the server into a sequence of variable length sections (e.g., physical page frames, SGML elements). In this case, each section has its own objectID, called a sectionID, that a client submits as the ``current'' section, relieving the server of having to maintain state information needed for the client to retrieve the next, previous, or current section. Element Set Names. The element set name parameter is used by clients to request that returned records be composed of a particular combination of ele- ments. It has been provisionally extended to a composite of several hierarchically arranged parameters that currently do not have a well-defined role within Z39.50, and they use the element set name parameter as a temporary home. They specify element set, fragment, variant, and more, as listed in Figure 7. Until better solutions are found through experience, a Tcl-based list format for this parameter would be the natural choice, with the provisional format selected whenever the first character of the parameter is `{'. At the top level, clients can request an element set name under- stood through the Explain facility to refer to a particular com- bination of elements. By default, info-1 elements are returned DRAFT 5 March 1992 DRAFT - 13 - in a tagged format, but an efficient untagged (positional) format may be requested. An ordered sequence of field parameters can be used to request a particular record makeup. Each field parameter identifies an element and optional operation, variant, and frag- ment information. Top Level Element Set Name Parameters esname set name discovered via Explain zsname pre-defined classical Z39.50 set name ("F") untagged if non-zero, do not tag returned elements field per field specifiers and qualifiers Per Field Specifiers and Qualifiers name element (tag) requested objectID element or fragment identifier operation Open, Close, Read, Sync fragment fragment specifier variants get list of variants if non-zero composition format (text, hytext, TIFF, MIDI, etc.) encoding archive/compression (tar, JPEG, MPEG, etc.) language for text (French, English, Spanish, etc.) version server-defined variant (VMS, Ultrix, DOS, etc.) Fragment Specifiers start offset of beginning of fragment (bytes) end offset of end of fragment (bytes) length length of fragment (bytes) section Next, Previous, First, Last, Current, Best Figure 7. Element Set Name Parameters A fragment may be requested by relative section or byte offset, depending on the element variant. Info-1 field fragments come with flags indicating if the beginning or ending of an element was returned. An element or fragment objectID (once the format is finalized) is capable of expressing all the per field requests and can be used instead of the per field parameters. It is worth mentioning that the ``Best'' section specification above makes most sense when the server is returning a record from a result set built by the user; otherwise, it may mean anything the server likes. Extensions to Present. Some extensions to the Present facility to support object retrieval would be extremely desirable, but formal extensions may benefit by a few short-term conventions. Current proposals for document retrieval call for an internal control number search with a piggy-backed Present of what is expected to be a single record result set. This method was partly chosen to allow for DRAFT 5 March 1992 DRAFT - 14 - stateless servers that do not keep result sets. In terms of objects, this sort of search is so common and so specialized (in that the result count must be zero or one), that it really belongs as a special kind of Present. Using the objectID element set name parameter in a Present against the reserved result set name, ``_IR_NoResultSet_'', a server could be truly stateless and not have to support Search at all, let alone the piggy-backed document Present kludge. Another urgently required extension is batched Present Responses. A request for an entire object (e.g., an image) fragmented into multiple Present Responses at the server would allow for much more efficient data transfer than having the client take receipt of each fragment before being able to request the next one. In the short term, a client Present against the reserved result set name, ``_IR_BatchPresent_'', could authorize a server to fragment the requested element and send the pieces in multiple responses. A new temporary fragment flag could indicate when the batched responses are done, or if the server refuses to batch them. Those clients that need to retrieve scattered records (e.g., in result set browsing), or those that need the server to send a sampling from a result set, also need protocol extensions. Currently record numbers are not returned, nor is it possible to request non-contiguous records or records from multiple result sets in a single request. Temporary solutions using new element set name parameters and a new reserved result set name may be called for. Copyrights. When the copyright component of an info-1 field is present, it contains an objectID that can be used to obtain a copyright statement. Once the client has retrieved and displayed the statement, future occurrences of that copyright identifier during the same session may obviate the need to display it again, depending on the legal obligations. When the copyright component is not present, the client may assume that the element is freely redistributable. Formal info-1 Structure. In Figure 8 is the formal ASN.1 [ASN187] structure of the general-purpose record format being described. Its main job is to contain a sequence of data fields. A substantial number of diagnostics still need to be added to the bib-1 diagnostic set to support the new functionality that info-1 promises. BEGIN -- Note that lots of things are VisibleString because the client may -- need to resubmit them in a Present as an element set name parameter GenericRecord ::= SEQUENCE { DRAFT 5 March 1992 DRAFT - 15 - elementSetName ElementSetName OPTIONAL, fieldCount INTEGER OPTIONAL, -- these fields allow client shortcuts in record parsing userMessage VisibleString OPTIONAL, -- anything that might not be covered elsewhere rank INTEGER OPTIONAL, -- for weighted result sets -- large data (fields and records) go at the end so clients doing -- partial parsing of records see headers first positionalFields SEQUENCE OF OCTET STRING OPTIONAL, -- lightweight for efficient bulk transfer (e.g. tables, files) -- positional semantics gotten from Explain and elementSetName taggedFields SEQUENCE OF TaggedField OPTIONAL, -- generalized fields records SEQUENCE OF GenericRecord OPTIONAL -- composite/hierarchical record, e.g., holdings records -- records at end allows tail recursion elimination } TaggedField ::= SEQUENCE { tag INTEGER, -- Tagged fields are for variable or unExplained element sets. -- The same tag may occur in more than one field, e.g., for -- element variants or multiple abstracting levels. value OCTET STRING, -- data element fragment fragmentID VisibleString OPTIONAL, -- objectID identifying current fragment; server need not -- maintain client's location within an element hits SEQUENCE OF SatisfyingSection OPTIONAL, -- byte offsets and lengths of hits within this element -- not necessarily in this fragment copyrightID VisibleString OPTIONAL, -- means element is copyrighted; contains a special objectID -- used to retrieve actual copyright on demand since we don't -- want to ship a legal document with each element flags BIT STRING { endOfElement (0), beginningOfElement (1) }, qualifiers SEQUENCE OF Qualifier OPTIONAL, variantMessage VisibleString OPTIONAL } SatisfyingSection ::= SEQUENCE { fragmentID VisibleString OPTIONAL, offset INTEGER, length INTEGER } DRAFT 5 March 1992 DRAFT - 16 - Qualifier ::= SEQUENCE { qualifierType INTEGER, -- Composition, Encoding, Language, Version qualifierValue INTEGER -- Composition: TEX, TIFF, MIDI, ... -- Encoding: compress, tar, JPEG, MPEG, ... -- Language: French, English, ... -- Version: [ statically undefined ] } ElementSetName ::= [103] IMPLICIT VisibleString END Figure 8. Formal info-1 Record Structure 5. Conclusion Z39.50 is in fact a workable protocol for more than bibliographic information retrieval even though the client programmer still has some details to work out. Steady growth in the number of non- bibliographic implementors has flushed out some weaknesses in the protocol. Active development and interest from the computer industry and educational institutions are providing exactly the kind of cross-pollination the protocol needs to become more robust. Key to this evolutionary process will be the containment of a potential explosion in the number of semantic contexts while at the same time making sure that a few contexts are rich enough to build compelling general-purpose user interfaces from them. 6. References [ASN187] ISO, ``OSI Specification of Abstract Syntax Notation One (ASN.1)'', 1987, Omnicom, Inc., Vienna, VA [CWIS91] CWISP Working Group, ``Campus Wide Information System Proto- col - Version 0.50 RFC, Draft 4'', October 1991, from Tim McGovern, MIT [Kahl91] Kahle, Brewster, ``Document Identifiers, or International Standard Book Numbers for the Electronic Age'', September 1991, Z39.50 Implementors Group Document ZIG91-46, available via anonymous ftp from think.com [Kunz90] Kunze, John A., ``UCB Network Information Server - Project Overview'', January 1990, unpublished paper prepared for 1989 U.C. Academic Computing Conference, available via anonymous ftp in help/dist/nis.txt from mailhost.berkeley.edu DRAFT 5 March 1992 DRAFT - 17 - [MARC88] Network Development and MARC Standards Office, ``USMARC Con- cise Formats for Bibliographic, Authority, and Holdings Data'', June 1988, Cataloging Distribution Service, Library of Congress, Washington, DC [Lync90] Lynch, Clifford, ``Extensions to ISO DP 10162/10163 to Sup- port an Explain Service'', December 1989, ZIG90-9, available via anonymous ftp from think.com [NISO91] ANSI/NISO Z39.50-199X, ``Proposed ANSI Information Retrieval Application Service Definition and Protocol Specification for OSI'', December 1991 [Oust90] Ousterhout, John, ``Tcl: An Embeddable Command Language'', Usenix Conference Proceedings, January 1990 [UDI92] Berners-Lee, Tim, et al., ``Universal Document Identifiers on the Network'', February 1992, memo in preparation for Internet RFC available via anonymous ftp from file://info.cern.ch./pub/www/doc/udi1.ps The following trade names were used in this article: Ultrix and VMS, trademarks of Digital Equipment Corporation; and UNIX, a trademark of AT&T. DRAFT 5 March 1992 DRAFT