Non-Bibliographic Applications of Z39.50


                          John A. Kunze
              jak@violet.berkeley.edu, 510-642-1530
               Information Systems and Technology
                   293 Evans Hall, UC Berkeley
                       Berkeley, CA  94720

                          5 March 1992






Although the Z39.50 Information  Retrieval  protocol  is  rapidly
gaining  acceptance  as  a  standard  for  interoperability among
networked library automation systems, it has not been obvious how
to make it work for non-bibliographic applications.  This article
describes how Z39.50 is being used as the basis for  a  networked
campus  information system called Infocal [Kunz90] at the Univer-
sity of California at Berkeley.*

The datasets we are making available include  phone  directories,
class schedules, library catalogs, public announcements, and com-
puter documentation and software.  As Infocal is being  developed
within  the  Information  Systems  and Technology department, the
latter two sets provided the original  implementation  incentive.
The  majority of the information will originate on campus, though
some will be purchased or licensed.  Most of the  non-proprietary
information will be available over the Internet.

We avoid describing the project as a Campus Wide Information Sys-
tem  (CWIS) because CWISses often come with extra baggage such as
electronic mail, bulletin boards, and student  registration  sys-
tems.   Ours  is a straightforward read-only, client-server-based
campus information system with two extra goals:

     (a) accommodating non-textual data and
     (b) interoperating with other information systems.

As a result, we went looking for a protocol,  and  Z39.50  seemed
the  logical  choice.  The challenge became how to use it to sup-
port all our non-bibliographic requirements, which is  what  this
article  addresses  from the point of view of the designer of the
client program.  This is the program sitting between the user and
_________________________
* This work was partially supported by  Digital  Equip-
ment Corporation and Sun Microsystems, Inc.




DRAFT                     5 March 1992                      DRAFT





                              - 2 -


the network, whose job it is  to  translate  user  commands  into
Z39.50  protocol  requests for the server and to translate server
protocol responses into information to display back to the user.

1.  Interface Features Involving Z39.50

In Figure 1 is a list of interface features  whose  support  will
involve Z39.50.


                + bibliographic databases
                + non-bibliographic databases
                + data of unknown, but learnable, semantics
                + full text documents
                + non-textual documents
                + hierarchical browsing
                + hypermedia links
                + retrieval by object/document ID

        Figure 1.  Client Features to Support

The protocol, having come out of the library  automation  commun-
ity,  is  straightforward  to apply to the first feature, namely,
retrieval  from  databases  of  bibliographic  citations,   where
records  have  highly predicatable structure and semantics.  What
about databases whose record structure  and  semantics  are  non-
bibliographic?   What about retrieval from databases whose record
structure and semantics are known to the  server  programmer  but
not known to the client programmer; is there a way for the proto-
col to transmit meta-information that would allow the  client  to
retrieve data from such a database?

Then there's the problem of full text documents; should the  pro-
tocol  view  a document as a database?  Should it view a document
as a record?  As an element  of  a  record?   Or,  as  a  set  of
records?   The  protocol's  retrieval model requires that records
have a fixed maximum size for the duration  of  a  session;  this
means that the programmer, knowing that documents can have highly
skewed size distributions, must decide whether to set the maximum
record  size  ridiculously  high  and forgo any optimization that
might have otherwise been  obtained,  or  to  fragment  documents
across record boundaries.  The retrieval model also requires that
records belong to the result set of a query, which is to say that
you  are not allowed to retrieve a record unless it has satisfied
some set of search criteria in advance.   Therefore  if  you  are
retrieving  document  fragments,  you may have to invent a way to
conduct a ``fragment search''.

One problem in retrieving non-textual data,  such  as  images  or
video,  is  that  the  sizes involved require the support of data
compression.  There are other more serious complications, such as
real-time  protocol  needs  of video data, that are not addressed
here.




DRAFT                     5 March 1992                      DRAFT





                              - 3 -


Hierarchical browsing, however unfashionable, is still an  impor-
tant access method for campus information system users.  It seems
that there will always be a very popular subset of information on
a server that people prefer to get at by exploring a small, shal-
low tree of information rather than by  doing  an  index  search.
This  is  analogous  to  going  to a restaurant and preferring to
browse the menu instead of asking the waiter to list  all  dishes
containing  potatoes  but not anchovies.  To support hierarchies,
the protocol will have to allow clients  to  discover  tree  node
names  inside  menus,  and  to  retrieve  node contents using the
search facility.

The next feature, hypermedia links, would get a great boost  from
getting  hierarchical browsing to work.  The last feature listed,
retrieval by object  (or  document)  ID  would  likewise  benefit
greatly  from  being  approached at the same time as the previous
two features.  If we had unique, server-qualified  document  IDs,
users could re-use them to retrieve a known document without hav-
ing to re-construct the entire search  leading  to  its  original
discovery.  Document IDs would be valid across sessions, could be
shared with other users,  and  submitted  in  relevance  feedback
queries  without  incurring  the overhead of sending entire docu-
ments in the query.  The Wide Area Information Server (WAIS) pro-
ject has made a proposal on this [Kahl90], and the Digital Object
ID Project under the Coalition for Networked Information is  just
getting under way.

2.  Mainstream Bibliographic Z39.50

As a basis for later discussion, it is useful to outline  how  to
apply   Z39.50   to   straightforward  bibliographic  information
retrieval.  The protocol has seven separate  services,  of  which
only  three  need  mention here: the Search, Present, and Explain
facilities (Explain is  currently  under  development  [Lync90]).
The  Search  facility  defines  how  a  query  is represented and
Present  defines  how  information  and  diagnostic  records  are
represented.   Although  there are four kinds of queries, we need
only consider  the  Type-1  query,  since  it  is  the  only  one
currently being used for interoperability testing.

Semantic Modules.

One of the claims of Z39.50 is that it does not restrict the kind
of  query  you  submit  or the kind of data you get back.  On the
other hand, it is easy to see how  an  implementor  might  think,
having  just read the standard [NISO91], that too few details are
given to build an interoperable system.  This is  because  Z39.50
is  modularized  in  such  a way as to isolate those parts of the
protocol that would require data-dependent assumptions.   Modules
are  identified  and  can  be replaced to accommodate new sets of
assumptions.

Each module is assigned an unambiguous tag or identifier  by  the
National   Information  Standards  Organization  (NISO)  and  the



DRAFT                     5 March 1992                      DRAFT





                              - 4 -


protocol guarantees that the client and server always  understand
what  modules are currently assumed during a session.  The way it
pulls this off is to make sure that the  module  identifiers  get
tucked  into  protocol control messages whenever they are needed.
Figure 2 identifies four protocol modules that completely  define
the structure and semantics for protocol control, queries, infor-
mation, and error messages.


Module Name                  Describes         Bibliographic Instance
Z39.50 kernel                control           <n/a>
query type/attribute set     queries           type-1/bib-1
record syntax                information       MARC
diagnostic set               error messages    bib-1

Figure 2.  Z39.50 Semantic Modules

The official Z39.50 terminology for these modules  is  the  query
type/attribute  set pair, record syntax, and diagnostic set.  The
attribute set applies only to the  Type-1  query,  which  we  are
already  assuming.   It defines several concepts about the search
term and assigns a number to each one.  A list of  these  numbers
accompanies  the  search  term  so that the server can understand
what element is to be searched, what relational operator to  use,
what  position  in  the  element,  etc.  The record syntax, often
called transfer syntax, is much more than simple  lexical  struc-
ture;  it  embodies  intimate  structural  and semantic knowledge
about a sequence of bits.  A diagnostic set is just a set of mes-
sages and message numbers that the two sides agree on.

The standard document goes only so far as to  define  effectively
one  set  of  modules  (and  a  few  close variants), the minimum
required to build an implementation,  in  particular,  a  biblio-
graphic  implementation.   The  specific bibliographic modules it
defines or refers to are listed in Figure  2:  the  Type-1  query
with  bib-1  attribute  set,  the  MARC  machine readable archive
record format [MARC88], and the bib-1 diagnostic  set.   It  also
defines  the  module identifiers, armed with which the programmer
keeps track of the current semantic context.

Software Modules.

The client's user interface is written  with  code  hardwired  to
understand  whatever combinations of modules it cares to support.
The control module makes sure that neither client nor server ever
has  to  deal  with  semantics  that it did not agree to (through
lower level negotiation).  For the bibliographic case, the client
programmer could construct menus for the user interface that list
searchable elements such as author, date,  and  title,  or  valid
relational operators, such as equals, less than, or greater than,
since they are all in the semantic module known as  bib-1.   Code
can  also  be  written that understands how to read a record as a
sequence of bits in the MARC format and display it to the user.




DRAFT                     5 March 1992                      DRAFT





                              - 5 -


Non-bibliographic Modules.

This completes the outline of how to apply Z39.50 to  the  imple-
mentation  of  the  bibliographic case.  Now, what do you do dif-
ferently to support new, non-bibliographic semantics?  The answer
is,

  (1) define new semantics for search, display, and diagnostics,
  (2) get the new attribute set, record  syntax,  and  diagnostic
  set registered with NISO,
  (3) write code to support searching and displaying these seman-
  tics, and
  (4) plug in the new code modules to support  the  new  semantic
  modules.

For example, consider supporting a  student  directory  database.
You  need  a  new attribute set which might include element names
such as NAME and EMAIL ADDRESS, but not PHONE NUMBER;  note  that
even  though  we  might  have  record  semantics that support the
display of phone numbers, we might be using an attribute set that
does  not support searching on them.  In the example in Figure 3,
the record syntax is the same one used in the Campus Wide  Infor-
mation  System  Protocol  (CWISP) [CWIS91], which has a printable
ASCII format.  As for error messages, you might  just  choose  to
keep on using the bib-1 diagnostic set as it is fairly generic.


                + record syntax = CWISP
                        name: Kunze, John A.
                        email: jak@violet.berkeley.edu
                        phone: 510-642-1530

                + attribute set = student-1
                + diagnostic set = bib-1

        Figure 3.  Example:  Student directory database

This approach to  non-bibliographic  data  comes  with  problems.
There is a lot of work in writing code to support each new set of
semantics.  Imagine defining a new set for each one of your data-
base  formats,  going through the registration process with NISO,
then writing code to support each.  If a database format or  user
search requirement changes, you need to re-register with NISO and
stop interoperating with the systems you used to work with  until
the  new  format gets approved.  If you want to support more than
one set of semantic modules, your code will have to link  in  all
your  corresponding program modules.  With code hardwired to each
attribute set and record format, there is nothing to prevent  one
user interface to ten database formats from looking, in the worst
case, like ten different user interfaces, which  is  exactly  the
situation that Z39.50 was conceived to avoid.

Berkeley's Infocal system will have half a dozen database formats
at least, and they will each change probably once a year, so this



DRAFT                     5 March 1992                      DRAFT





                              - 6 -


scheme will not work  for  us.   Fortunately,  there  is  another
approach  to  non-bibliographic  information  that simultaneously
addresses the problem of retrieving from databases  whose  format
is unknown until the database is first accessed.  What is needed,
as before, is an attribute set, a record syntax, and a diagnostic
set, but they must be general-purpose and dynamic.

3.  The Dynamic General-Purpose Attribute Set info-1

For the purposes of discussion it will be valuable to define some
terms.

Use Attributes and Elements.

Server information records are structured so as to allow  search-
ing  on  and  retrieval  of  different combinations of individual
record components.  These  processes,  completely  defined  by  a
server,  are  mapped  onto  canonical  Z39.50  processes  so that
clients have a common language to access information.   In  order
to   give  servers  adequate  flexibility,  Z39.50  distinguishes
between searchable and retrievable record components.  Searchable
components  are  called Use attributes and retrievable components
are called elements.  A server is free, for example, to map a Use
attribute  called  ``name'' onto five different record components
and return records containing a  corresponding  element  that  is
mapped   from   a  sixth  component,  or  perhaps  containing  no
corresponding element at all.

Explain Facility.

The Explain facility is a set of mechanisms integrated  with  the
Search  and  Present  facilities that retrieves information, both
human and machine readable, about  an  information  server.   For
example,   searches   against   the   reserved   database   name,
``_IR_Explain_'', can return a schedule of server availability, a
list  of  database  names,  and  a list of browsable hierarchies.
Each database name may be searched on the reserved Use attribute,
``_IR_Explain_'',  to  return  a list of valid attribute combina-
tions, various levels of human readable record component descrip-
tions, element set names, etc.  When the client encounters an un-
known database, it uses the Explain facility to build  menus  for
searching,  together  with  help  text  for  explaining the query
semantics.

Objects.

In this article, an object is anything (book,  painting,  person,
digitized  image, electronic document, etc.) having an associated
electronic information record that  may  be  accessed  through  a
unique identifier, called the object identifier (objectID), about
which more will be said later.  This record may be thought of  as
an object citation, the elements of which describe aspects of the
object such as its name, what kind of thing it is, where it comes
from, when it came into being, where it is currently located, and



DRAFT                     5 March 1992                      DRAFT





                              - 7 -


a brief description.  An object citation  maps  easily  onto  the
Z39.50  record  model  and is a natural access point for many, if
not all, aspects of the object.  Therefore, if the object  exists
primarily  in  electronic  form, a citation element may contain a
copy of the object itself.

In cases where Explain is not used and where  sophisticated  pro-
cessing is not required, a server may map these record components
to a small set of generic, pre-defined Use attribute  values  and
element names for search and retrieval, respectively.  To be use-
ful most of the elements would be printable  ASCII  suitable  for
human  consumption.   This  provides a kind of base level service
from databases of otherwise unknown semantics  without  requiring
the Explain facility.  The table in Figure 4 below shows the gen-
eric names (given symbolically here, although the  protocol  uses
integer  tags) in the first column, and examples of how four dif-
ferent databases might be mapped to them.  The same tags  can  be
used to identify returned elements.


Tag       Biblio db      Personnel db   Course db         Art db
type      "book"         "employee"     "class"           "objet d'art"
name      title          empl. name     class title       title
by        author         organization   instructor/dept   artist
location  publisher      phone/address  room/building     owner
date      pub. date      hire date      meeting time      creat. date
abstract  table of cont. job descript.  course descript.  descript.
object    text or <n/a>  <n/a>          <n/a>             bitmap or <n/a>

Figure 4.  Generic Use Attributes (Elements) and Example Mappings


The   attribute   set   info-1   (registered   with    NISO    as
1.2.840.10003.3.1000.2.1)  was  created by first making a copy of
the bib-1 attribute set, complete with its broad  categories,  or
types,  of  attributes, and erasing all the pre-defined bib-1 Use
attribute names.  A small set of Use  attribute  names  was  then
reserved, combining the above tags with those in Figure 5.



















DRAFT                     5 March 1992                      DRAFT





                              - 8 -




Attribute Types

     Type          Values
     Use           [ statically undefined except for generic tags ]
     Relation      =, !=, >, >=, <, <=
     Position      first in element, last in element, ...
     Structure     date, name, word, ...
     Truncation    right, left, ...
     Completeness  ...

Other Pre-defined Use Attribute (Element) Tags

     userinfo       anything not covered elsewhere
     any            any element(s) the server wants
     keywords       extra index terms
     record size    estimate in bytes
     record update  last update of this record
     provider       who maintains this record
     objectID       for efficient object access
     _IR_Explain_   to bootstrap definition of other tags

Figure 5.  Dynamic Attribute Set info-1


This sketches out a minimal generic query interface.  What  about
the  problem  of displaying server responses to the user?  A gen-
eric diagnostic set is not hard to define, and the bib-1 diagnos-
tic  set with a few additions would be an adequate start.  Coming
up with a generic record syntax is a little more complicated.

4.  The Dynamic General-Purpose Record Syntax info-1

Referring back to the list of features in Figure 1 helps motivate
some of the design decisions for a dynamic general-purpose record
syntax.  The client still needs a way to discover what  is  being
returned from a database of dynamically defined or unknown seman-
tics.  It also needs to support documents,  images,  hierarchies,
and hypertext.  There is no obvious way to do all this within the
existing protocol control kernel, so the next place to look is in
the semantic modules for queries and information.

The semantics of information returned by a  server  is  given  in
three  ways.   In  classical  Z39.50, a registered record syntax,
such as MARC, informs the client that a stream of bits is  struc-
tured  into  specific  elements  containing well-defined types of
data.  In a second scheme (the  flip  side  of  the  generic  Use
attribute  tags  in Figures 4 and 5), generic element tags accom-
pany returned data elements.  This involves using the record syn-
tax  info-1 (registered with NISO as 1.2.840.10003.5.1000.2.1) to
hold elements that for the most part contain visible ASCII.   The
third  scheme  uses  the info-1 syntax for records with semantics
dynamically  defined  with  Explain  and  the  element  set  name



DRAFT                     5 March 1992                      DRAFT





                              - 9 -


parameter  (below).   This  allows for tagged data or, when effi-
cient volume transfer is called for, positional, untagged data.

Documents.

As mentioned  earlier,  an  object  citation  record  provides  a
natural way to access the object itself, provided it is stored in
an electronic form.  In considering  online  documents,  probably
the  most  important example, several issues come up.  How does a
client request a document citation minus the full text?  How does
a  full  text data element fragment across records?  How do cita-
tions relate to links needed for hierarchical browsing and hyper-
text?   How  does  a  client  select  the form of a document that
varies in several dimensions, such  as  word  processing  format,
language,  compression  technique, and version?  How does it even
find out what these variant forms are?  These questions  general-
ize easily to other electronic objects such as digitized images.

Variants.

It is not always feasible to index a  group  of  closely  related
objects  separately.   This means that there will be cases when a
single Z39.50 result set record is the only access point for mul-
tiple  underlying  variant  objects.   Objects  can  vary in four
dimensions, called composition, encoding, language, and  version.
For example, a pamphlet may be available with

        composition of TEX, Postscript, and Troff,
        compression encodings of JPEG and ``compress'', and in
        languages French, Spanish, and English.

One citation for this pamphlet would cover 18  variant  pamphlets
(note  that  some  variants, and objects for that matter, may not
need to be stored but are generated on demand).  Registered  tags
known  as  qualifiers  identify variants in client requests using
the element set name parameter described below.  With the  info-1
record  syntax,  server  responses  can  include  qualifiers with
returned data.  The version qualifier  together  with  a  variant
message allows a server to define a variant dimension of its own.
Clients may find out what variants  exist  for  an  element  also
using  the element set name parameter; the server responds with a
record, each element of which is empty except for  a  combination
of  variant qualifiers.  The current qualifiers for info-1 appear
in Figure 6.


        Composition    Encoding          Language
        1 Text         1 UNIX compress   1 English
        2 Hytext       2 UNIX tar        2 French
        3 PostScript   3 JPEG            3 Spanish
        4 TIFF
        5 MARC

        Figure 6.  Currently Defined info-1 Qualifier Tags



DRAFT                     5 March 1992                      DRAFT





                             - 10 -


The composition Text refers to an ASCII text variant  that  would
display  reasonably in an 80-column window, containing lines ter-
minated by ASCII NL and  possibly  CR.   The  composition  Hytext
refers  to  a  hypertext variant similar to Text that may contain
short or long objectID references of the form

     @(shortref@ objectID @)shortref@    or
     @(longref@ objectID @)longref@

Although the generality afforded by these variant  qualifiers  is
indispensable, it is unlikely that any given Z39.50 dialogue will
make use of more than a handful of them.   Instead  of  including
them in each Present, in the long term it makes sense to let them
default to values determined when the  dialogue  is  first  esta-
blished, but at the moment the protocol does not support this.

ObjectIDs.

Central to any system that supports hypermedia  and  hierarchical
browsing  (the  second  being  a  special case of the first) is a
robust, generalized way to  reference  a  networked  object.   As
pointers  to  objects,  objectIDs  are  efficient to exchange and
remain valid as the underlying objects are updated.   If  indivi-
dual  records in a search result set have objectIDs, they provide
a short cut to accessing those records the next time.

Earlier an objectID was defined as a unique  identifier  used  to
access  an  electronic  information  record  associated  with  an
object.  An  object  may  have  multiple  associated  information
records,  as  long  as  each  has a unique identifier.  Identical
copies of an object (e.g., for redistribution) may have different
objectIDs for routine access, even though this poses problems for
clients that collect objects from disparate servers and  need  to
know  when  they have more than one copy of the same object.  For
this reason the original objectID is carried within the objectIDs
of copies.

Each element of an information record contains an  aspect  of  an
object, including, for example, an electronic instance of it.  In
order to represent this with the info-1 record syntax, we need to
define a field as an info-1 component containing either an entire
element or a fragment of it.  In order to access an element frag-
ment,  a  minimal objectID must contain the server name, internal
object control number, and fragment address.  In turning such  an
objectID  into  a Z39.50 retrieval, many parameters would have to
be supplied by convention and much of the  generality  of  Search
and  Present  would  go  unused.  For example, a more generalized
Z39.50 objectID might contain

     server, port, database, attributes, term, element, and frag-
     ment.

Beyond the Z39.50 context, a proposal for Universal Document (not
restricted  to  text)  Identifiers  [UDI92] has made a compelling



DRAFT                     5 March 1992                      DRAFT





                             - 11 -


argument for UDIs (objectIDs) encoded in visible ASCII that tran-
scend  protocol  (e.g.,  FTP,  WAIS,  Z39.50).  The visible ASCII
requirement allows objectIDs to be exchanged in  e-mail  messages
and to appear in printed publications.

An important idea missing  from  the  UDI  proposal  is  optional
descriptive  information.   Normally  stored  as  elements of the
associated info-1 record, this  information  needs  to  be  bound
closely  to objectIDs that will appear in menus, since a separate
server access will be required to return elements that  the  user
needs  to  see  (e.g., document title) before a selection will be
possible.  The close binding makes it easy to update remote menus
through a kind of re-linking process.  The proposal also requires
that no whitespace characters appear in  a  UDI,  but  given  the
length  of  Z39.50 objectIDs, non-significant whitespace needs to
be allowed to assist readability.  To restrict even  one  of  the
many  objectID  components to, say, the number of characters that
fit on one text line is not feasible.  It  is  worth  noting  the
similarity  between  this  objectID format and a text-based query
language.

Equipped with hybrid UDI-style objectIDs, the  client  programmer
has  a  powerful  tool  for  building  hypermedia systems.  A few
operations on objectIDs have to be defined:

     Open      begin accessing object
     Read      sequential or random access to object
     Close     end accessing object
     Sync      get fresh copy of objectID
     Compare   client test for identical objectIDs

In an object-oriented sense, these operations will depend heavily
on  the  server  and  object's  type.   Open and Close are merely
advice-giving operations; for example, they might be  ignored  on
stateless  servers that do not keep track of whether an object is
open.  The Read operation will likely resemble file I/O for docu-
ment  and  image  objects,  with  a  mix of sequential and random
access capability depending on the server.  On the other hand, if
an  object  is a database, a server, or a person, even sequential
access is unlikely.  The Sync operation returns an updated objec-
tID, providing a mechanism for replacing stale descriptive infor-
mation and obtaining a new forwarding address.   One  reason  for
not putting copyright disposition inside an objectID is that even
with the Sync operation, an objectID  could  not  in  general  be
trusted  to  be  either current or authoritative.  Operations are
specified using the element set name parameter.

ObjectID's are hierarchical in that they consist of a sequence of
increasingly  specific components.  The shorter the sequence, the
higher level the object (e.g., a fragment has a  long  sequence).
Some  of  the high level components may be inferred, so that, for
example, every object in a database need not be stored  with  its
full objectID.  Not only servers but also clients must be able to
parse Z39.50 objectIDs in  order  to  understand  what  level  of



DRAFT                     5 March 1992                      DRAFT





                             - 12 -


object  (e.g., server, fragment) is implied.  A received objectID
could then be modified to imply lower or  higher  level  objects.
Here  is  an  example  of a Z39.50 objectID identifying the first
4096 bytes of a simple ASCII text version of this article:

        {ir infocal.berkeley.edu 210
                DB_docid objectID kunz92 object_text 1-4096 {}
                {title "Non-Bibliographic Applications of Z39.50"
                date 05/03/92}}

Rather than use the special purpose notation of the UDI proposal,
this  example uses a different structure (offered without further
explanation at this time as it has not been finalized)  expressed
with  Tcl  [Oust90].  This was an easy choice since Tcl expresses
hierarchical lists with quoting and non-ASCII capability, and the
Tcl software to read and build such lists is freely available for
UNIX, Macintosh, and PC platforms.

Satisfying Sections.

Sometimes a search locates a record based on criteria not immedi-
ately  apparent  to the user.  For example, a server may match on
synonyms generated from the user's term,  or  using  a  relevance
feedback  mechanism.  Particularly in cases where an element that
triggered a match is large and  matches  at  multiple  locations,
there  is  a  need  to  return  a  list of satisfying sections or
``hits'' within an element.  While a satisfying section may indi-
cate  a  simple  segment  of bytes, for some objects a server may
disallow byte addressing and offer  section  addressing  instead,
where an element (e.g., a document) is divided at the server into
a sequence of  variable  length  sections  (e.g.,  physical  page
frames,  SGML  elements).  In this case, each section has its own
objectID, called a  sectionID,  that  a  client  submits  as  the
``current''  section,  relieving the server of having to maintain
state information needed for the client  to  retrieve  the  next,
previous, or current section.

Element Set Names.

The element set name parameter is used by clients to request that
returned  records be composed of a particular combination of ele-
ments.  It has been provisionally  extended  to  a  composite  of
several  hierarchically arranged parameters that currently do not
have a well-defined role within Z39.50, and they use the  element
set  name  parameter  as  a temporary home.  They specify element
set, fragment, variant, and more, as listed in Figure  7.   Until
better  solutions  are found through experience, a Tcl-based list
format for this parameter would be the natural choice,  with  the
provisional  format  selected whenever the first character of the
parameter is `{'.

At the top level, clients can request an element set name  under-
stood  through the Explain facility to refer to a particular com-
bination of elements.  By default, info-1 elements  are  returned



DRAFT                     5 March 1992                      DRAFT





                             - 13 -


in a tagged format, but an efficient untagged (positional) format
may be requested.  An ordered sequence of field parameters can be
used to request a particular record makeup.  Each field parameter
identifies an element and optional operation, variant, and  frag-
ment information.


Top Level Element Set Name Parameters

     esname       set name discovered via Explain
     zsname       pre-defined classical Z39.50 set name ("F")
     untagged     if non-zero, do not tag returned elements
     field        per field specifiers and qualifiers

Per Field Specifiers and Qualifiers

     name         element (tag) requested
     objectID     element or fragment identifier
     operation    Open, Close, Read, Sync
     fragment     fragment specifier
     variants     get list of variants if non-zero
     composition  format (text, hytext, TIFF, MIDI, etc.)
     encoding     archive/compression (tar, JPEG, MPEG, etc.)
     language     for text (French, English, Spanish, etc.)
     version      server-defined variant (VMS, Ultrix, DOS, etc.)

Fragment Specifiers

     start        offset of beginning of fragment (bytes)
     end          offset of end of fragment (bytes)
     length       length of fragment (bytes)
     section      Next, Previous, First, Last, Current, Best

Figure 7.  Element Set Name Parameters

A fragment may be requested by relative section or  byte  offset,
depending  on  the  element variant.  Info-1 field fragments come
with flags indicating if the beginning or ending  of  an  element
was  returned.   An element or fragment objectID (once the format
is finalized) is capable of expressing all the per field requests
and can be used instead of the per field parameters.  It is worth
mentioning that the ``Best'' section  specification  above  makes
most  sense  when  the server is returning a record from a result
set built by the user; otherwise, it may mean anything the server
likes.

Extensions to Present.

Some  extensions  to  the  Present  facility  to  support  object
retrieval would be extremely desirable, but formal extensions may
benefit by a few short-term conventions.  Current  proposals  for
document  retrieval  call  for  an internal control number search
with a piggy-backed Present of what is expected to  be  a  single
record  result  set.   This method was partly chosen to allow for



DRAFT                     5 March 1992                      DRAFT





                             - 14 -


stateless servers that do not keep  result  sets.   In  terms  of
objects,  this sort of search is so common and so specialized (in
that the result count must  be  zero  or  one),  that  it  really
belongs as a special kind of Present.  Using the objectID element
set name parameter in a Present against the reserved  result  set
name, ``_IR_NoResultSet_'', a server could be truly stateless and
not have to support Search at all,  let  alone  the  piggy-backed
document Present kludge.

Another urgently required extension is batched Present Responses.
A  request  for an entire object (e.g., an image) fragmented into
multiple Present Responses at the server  would  allow  for  much
more  efficient data transfer than having the client take receipt
of each fragment before being able to request the next  one.   In
the  short term, a client Present against the reserved result set
name, ``_IR_BatchPresent_'', could authorize a server to fragment
the  requested element and send the pieces in multiple responses.
A new temporary fragment flag could  indicate  when  the  batched
responses are done, or if the server refuses to batch them.

Those clients that need to retrieve scattered records  (e.g.,  in
result  set  browsing),  or  those that need the server to send a
sampling from  a  result  set,  also  need  protocol  extensions.
Currently  record numbers are not returned, nor is it possible to
request non-contiguous records or records  from  multiple  result
sets  in a single request.  Temporary solutions using new element
set name parameters and a new reserved result  set  name  may  be
called for.

Copyrights.

When the copyright component of an info-1 field  is  present,  it
contains  an  objectID  that  can  be  used to obtain a copyright
statement.  Once the  client  has  retrieved  and  displayed  the
statement, future occurrences of that copyright identifier during
the same session may  obviate  the  need  to  display  it  again,
depending on the legal obligations.  When the copyright component
is not present, the client may assume that the element is  freely
redistributable.

Formal info-1 Structure.

In Figure 8  is  the  formal  ASN.1  [ASN187]  structure  of  the
general-purpose  record  format being described.  Its main job is
to contain a sequence of data fields.  A  substantial  number  of
diagnostics still need to be added to the bib-1 diagnostic set to
support the new functionality that info-1 promises.


BEGIN
       -- Note that lots of things are VisibleString because the client may
       -- need to resubmit them in a Present as an element set name parameter

GenericRecord ::= SEQUENCE {



DRAFT                     5 March 1992                      DRAFT





                             - 15 -


       elementSetName              ElementSetName OPTIONAL,
       fieldCount           INTEGER OPTIONAL,
              -- these fields allow client shortcuts in record parsing
       userMessage          VisibleString OPTIONAL,
              -- anything that might not be covered elsewhere
       rank                 INTEGER OPTIONAL,
              -- for weighted result sets

       -- large data (fields and records) go at the end so clients doing
       -- partial parsing of records see headers first

       positionalFields     SEQUENCE OF OCTET STRING OPTIONAL,
              -- lightweight for efficient bulk transfer (e.g. tables, files)
              -- positional semantics gotten from Explain and elementSetName
       taggedFields         SEQUENCE OF TaggedField OPTIONAL,
              -- generalized fields

       records                     SEQUENCE OF GenericRecord OPTIONAL
              -- composite/hierarchical record, e.g., holdings records
              -- records at end allows tail recursion elimination
}

TaggedField ::= SEQUENCE {

       tag                  INTEGER,
              -- Tagged fields are for variable or unExplained element sets.
              -- The same tag may occur in more than one field, e.g., for
              -- element variants or multiple abstracting levels.
       value                OCTET STRING,
              -- data element fragment
       fragmentID           VisibleString OPTIONAL,
              -- objectID identifying current fragment; server need not
              -- maintain client's location within an element
       hits                 SEQUENCE OF SatisfyingSection OPTIONAL,
              -- byte offsets and lengths of hits within this element
              -- not necessarily in this fragment
       copyrightID          VisibleString OPTIONAL,
              -- means element is copyrighted; contains a special objectID
              -- used to retrieve actual copyright on demand since we don't
              -- want to ship a legal document with each element
       flags                BIT STRING {
                                   endOfElement (0),
                                   beginningOfElement (1)
                            },
       qualifiers           SEQUENCE OF Qualifier OPTIONAL,
       variantMessage              VisibleString OPTIONAL
}

SatisfyingSection ::= SEQUENCE {

       fragmentID           VisibleString OPTIONAL,
       offset               INTEGER,
       length               INTEGER
}



DRAFT                     5 March 1992                      DRAFT





                             - 16 -


Qualifier ::= SEQUENCE {

       qualifierType        INTEGER,
              -- Composition, Encoding, Language, Version
       qualifierValue              INTEGER
              -- Composition:  TEX, TIFF, MIDI, ...
              -- Encoding:  compress, tar, JPEG, MPEG, ...
              -- Language:  French, English, ...
              -- Version:  [ statically undefined ]
}

ElementSetName ::= [103] IMPLICIT VisibleString

END

Figure 8.  Formal info-1 Record Structure

5.  Conclusion

Z39.50 is in fact a workable protocol for more than bibliographic
information retrieval even though the client programmer still has
some details to work out.  Steady growth in the  number  of  non-
bibliographic implementors has flushed out some weaknesses in the
protocol.  Active development  and  interest  from  the  computer
industry  and  educational institutions are providing exactly the
kind of cross-pollination  the  protocol  needs  to  become  more
robust.  Key to this evolutionary process will be the containment
of a potential explosion in the number of semantic contexts while
at  the same time making sure that a few contexts are rich enough
to build compelling general-purpose user interfaces from them.

6.  References

[ASN187]
     ISO, ``OSI Specification of  Abstract  Syntax  Notation  One
     (ASN.1)'', 1987, Omnicom, Inc., Vienna, VA

[CWIS91]
     CWISP Working Group, ``Campus Wide Information System Proto-
     col  -  Version  0.50 RFC, Draft 4'', October 1991, from Tim
     McGovern, MIT

[Kahl91]
     Kahle, Brewster, ``Document  Identifiers,  or  International
     Standard  Book  Numbers  for the Electronic Age'', September
     1991, Z39.50 Implementors Group Document ZIG91-46, available
     via anonymous ftp from think.com

[Kunz90]
     Kunze, John A., ``UCB Network Information Server  -  Project
     Overview'',  January  1990,  unpublished  paper prepared for
     1989 U.C.   Academic  Computing  Conference,  available  via
     anonymous       ftp      in      help/dist/nis.txt      from
     mailhost.berkeley.edu



DRAFT                     5 March 1992                      DRAFT





                             - 17 -


[MARC88]
     Network Development and MARC Standards Office, ``USMARC Con-
     cise  Formats  for  Bibliographic,  Authority,  and Holdings
     Data'', June 1988, Cataloging Distribution Service,  Library
     of Congress, Washington, DC

[Lync90]
     Lynch, Clifford, ``Extensions to ISO DP 10162/10163 to  Sup-
     port an Explain Service'', December 1989, ZIG90-9, available
     via anonymous ftp from think.com

[NISO91]
     ANSI/NISO Z39.50-199X, ``Proposed ANSI Information Retrieval
     Application  Service  Definition  and Protocol Specification
     for OSI'', December 1991

[Oust90]
     Ousterhout, John, ``Tcl: An Embeddable  Command  Language'',
     Usenix Conference Proceedings, January 1990

[UDI92]
     Berners-Lee, Tim, et al., ``Universal  Document  Identifiers
     on  the  Network'',  February  1992, memo in preparation for
     Internet   RFC   available   via    anonymous    ftp    from
     file://info.cern.ch./pub/www/doc/udi1.ps

The following trade names were used in this article:  Ultrix  and
VMS,  trademarks  of  Digital  Equipment Corporation; and UNIX, a
trademark of AT&T.




























DRAFT                     5 March 1992                      DRAFT