Building A Z39.50 Client
                              Ralph LeVan
               OCLC Online Computer Library Center Inc.
                            6565 Frantz Rd.
                           Dublin, OH  43017
                          email: rrl@oclc.org
                                   
                                  1.
Abstract                             a simple tool for displaying
The core functionality for a         USMARC records.  Finally, I will
Z39.50 Client Application is         wrap all these tools up in a
described.  This core                simple Z39.50 client (zdemo).
functionality consists of            This article is intended
Connection, Initialization,          primarily for implementors.  It
Search, Present and                  is sprinkled liberally with C
Disconnection.  A Z39.50 Client      code fragments.  The complete
API is described which provides      source code is available at
the core functionality.  Also        OCLC’s anonymous FTP site.  (See
included are brief descriptions      the section on Source Code
of TCP/IP, the abstract syntax       Availability at the end of the
ASN.1, BER records and USMARC        article.)
records.  Code for implementing      
the Client API, TCP/IP access,       2.   The Z39.50 Standard
encoding/decoding BER records        
and decoding USMARC records is       2.1  Who Developed It?
freely available.                    The Z39.50 standard was
                                     initially developed in the
1.   Introduction                    library community.  It was built
Z39.50, the ANSI/NISO                to satisfy a requirement to
Information Retrieval Protocol,      search and retrieve USMARC-
is perceived by potential            formatted bibliographic records.
implementors as being difficult      Those roots still show today:
to implement.  I will                the core attribute set for
demonstrate that this is not so      Z39.50 (which includes the list
by developing a Z39.50 client        of types of things that can be
during the course of this            searched for) is named bib-1 and
article.  The code produced,         the most widely interoperable
while copyrighted, is freely         record syntax is still USMARC.
available for anyone to use.         However, the standard has grown
In this article, I will stick to     considerably beyond the original
the “core” functionality of          modest requirements.  Today
Z39.50; features that are widely     there are organizations using
implemented and have the             Z39.50 to deliver full-text
greatest chance of                   documents based on natural
interoperability.  You will          language queries.  Other
learn how to initialize a Z39.50     organizations support complex
session, how to do searches          chemical structure searching and
using simple Boolean operators       display.
(type-1 queries) and how to          
retrieve USMARC and simple text      2.2  Who Maintains It?
(SUTRS) records.  To do this, I      The Z39.50 standard started life
will show you how to build a         as the product of a standards
Z39.50 Client Application            committee.  The committee
Program Interface (API) which        considered its work complete
will allow you to embed Z39.50       with the successful balloting of
client functionality in your         the original 1988 version of the
applications.  I will show you       standard.  At that point a
how to build Z39.50 messages and     Maintenance Agency was appointed
how to send and receive them         by the National Information
using standard TCP/IP socket         Standards Organization (NISO)
protocols.  I will also give you     and the original committee was
disbanded.  Members of the           and multiple record retrieval
Z39.50 committee met                 requests against the same result
occasionally to discuss possible     set.  It also allows the client
implementation of the standard       and server to negotiate
and in 1990 the Z39.50               behavior, such as the kinds of
Implementors Group (ZIG) was         services it needs, and to have
founded.  Today, changes to the      that negotiation persist for the
standard are developed jointly       duration of the session.  In
by the ZIG and the Maintenance       HTTP, much of the message
Agency.  Because the standard is     traffic from the client contains
being enhanced by real               descriptions of preferred server
implementors, the standard now       behavior that needs to be
reflects their real-world            repeated with every transaction.
requirements.                        In its simplest form, Z39.50 is
                                     a synchronous protocol.  That
2.3  Where Can I Get It?             is, the client sends a message
The Maintenance Agency for the       to the server and waits for the
Z39.50 standard is the Library       server to respond.  The client
of Congress.  It maintains an        that is developed in this
anonymous FTP server at              article (zdemo) will use this
ftp.loc.gov where many documents     form.  It is possible to
related to Z39.50 are available.     negotiate much more complex
Among those documents is the         behavior.  The client can have
latest version of the standard.      multiple outstanding requests to
Paper copies of the standard can     the Z39.50 server and the Z39.50
be purchased directly from NISO.     server can interrupt those
Contact them by phone at (800)       client requests with requests of
282-NISO.                            its own that must be responded
                                     to before the original client
3.   Z39.50 Overview                 request can be completed.  The
Unlike other Internet protocols      Client API will not negotiate
such as HTTP or WAIS, Z39.50 is      for that functionality, but it
a session oriented protocol.         can be readily extended to
That means that a connection to      provide it.
a Z39.50 server is made and a        
persistent session is started.       4.   Z39.50 Messages
The connection with the server       There are two logical parts to
is not closed until the session      the definition of Z39.50
is completed.  Session oriented      messages (called Protocol Data
applications are often called        Units or PDU’s in the standard).
“stateful” applications and          First is the definition of the
transaction oriented                 content of the messages and
applications are often called        second is the encoding rules for
“stateless”.                         converting the logical content
A session oriented protocol is       into a physical message that can
considerably more efficient than     be transmitted.  In Z39.50, the
a transaction oriented protocol      messages are defined in the
that requires that the               Abstract Syntax Notation 1
connection with the server be        (ASN.1) grammar and the encoding
reestablished with every             rules are defined by the Basic
message.  Session orientation        Encoding Rules (BER).
also allows clients iterative        
refinement of search result sets     4.1  Defining The Message:
Abstra                               might be a query in one part of
ASN.1 is an ISO standard (ISO        a record and a count in another.
8824) for defining the content       The meaning of the tag is
of messages.  It is used to          defined by its context.
define all the ISO protocol          For example, the ASN.1
messages and is used in the          definition ReferenceId ::= [2]
Internet world to define Simple      IMPLICIT OCTETSTRING defines a
Network Management Protocol          constructed data type named
(SNMP) messages.  ASN.1 is a         ReferenceId, whose tag is 2.
very rich language.  What            The type of tag was not
follows is a simple description      specified and defaults to
of ASN.1; seek a higher              CONTEXT.  The ReferenceId is
authority for a more definitive      composed of the atomic data type
description.                         OCTETSTRING.  The IMPLICIT in
ASN.1 defines records as being       that statement says that the tag
composed of combinations of          for the OCTETSTRING must not be
atomic and constructed data          included inside the ReferenceId.
types.  The atomic data types        If IMPLICIT had been omitted
are things like INTEGER and          from the above definition (i.e.,
BITSTRING.  You will recognize       ReferenceId ::= [2] OCTETSTRING)
them in ASN.1, because they are      then both the context tag ([2])
usually in capital letters.          and the UNIVERSAL tag
Constructed data types are           ([UNIVERSAL 4]) would have been
things like Queries and Options.     encoded in the message.  Thus,
They always begin with an            the use of the IMPLICIT keyword
initial capital letter.              in the definition allows for
All data types have a number         smaller encodings.
(usually called a tag) assigned      ASN.1 includes constructs for
to them.  The tags for atomic        grouping data types together.
data types are assigned by the       These constructs include CHOICE
BER encoding rules.  The tags        (pick one of the things that
for constructed data types are       follows), SEQUENCE (the things
assigned in the ASN.1 where they     that follow must be provided in
are defined and are specified        the order specified) and SET
inside square brackets.              (the things that follow can be
Because tags are simply numbers,     provided in any order.)
there is the possibility the two     
applications will choose the         4.1.1     EXTERNAL’s, OBJECT
same tags to mean the different      ID’s and ISO Registration
things.  One possible way to         ASN.1 allows the developer to
avoid this would be to reserve       specify that a constructed
ranges of tags for ASN.1 data        datatype being referenced is not
types. Instead, ASN.1 defines        defined in the current body of
four types of tags:  UNIVERSAL,      the ASN.1.  The keyword for
APPLICATION, CONTEXT and             specifying this is EXTERNAL.
PRIVATE.  UNIVERSAL tags are         EXTERNALs are used throughout
expected to be recognized            the Z39.50 standard.  They are
wherever they are used in a          the mechanism used to provide
record.  (i.e., a tag of             extensibility and flexibility in
[UNIVERSAL 8] is always an           the standard.  Saying that a
INTEGER.)  CONTEXT tags can have     field is defined externally to
different meanings in different      the standard allows a company to
contexts.  A tag of [CONTEXT 1]      use private data in that field
that only their clients and          Z39.50 messages are encoded
servers will understand.  (This      according to the Basic Encoding
is an interoperability problem       Rules (BER), ISO 8825.  BER
for other clients and servers,       defines records as being
but there are often good reasons     composed of a triple of values:
for wanting to do this.)  It         a tag, a length and a value
also allows the ZIG to agree on      (TLV).  The tag portion of the
extensions to the standard           triple includes bits that
simply by agreeing on the            specify the type of tag
contents of fields defined           (UNIVERSAL or CONTEXT) and
EXTERNAL to the standard.            whether the value portion of the
EXTERNALs provide flexibility by     tag is primitive data or is
allowing Object Identifiers to       composed of more TLV triples.
be used to make selection from a     This recursive definition of a
broad range of possible choices.     record allows for the
For example, RecordSyntax is         construction of arbitrarily
defined as EXTERNAL in Z39.50,       complex hierarchical records.
which means that any of a number     I know of two ways to construct
of possible choices (e.g.,           BER records.  The first way is
USMARC, SUTRS, GRS) can be           with an ASN.1 compiler.  The
specified.                           compiler reads the ASN.1
EXTERNAL objects, when they          definition and produces source
arrive in a message, have an         code in a programming language
OBJECT IDENTIFIER.  The OBJECT       such as C or C++.  The
IDENTIFIER provides an               programmer can then fill in a
identification number that           structure in that language with
allows the message decoder to        the values that are to be
understand the contents of the       encoded and the code produced by
object.  OBJECT IDENTIFIERS are      the ASN.1 compiler reads that
represented symbolically as          structure and builds the BER
strings of numbers, separated by     record.  The strong advantage of
periods (‘.’).  1.2.840.10003 is     this method is that you’re
the OBJECT IDENTIFIER for the        reasonably confident that the
Z39.50 standard itself.              resulting BER record does in
Object Identifiers are               fact encode the ASN.1 properly.
controlled by the International      OCLC chose not to use an ASN.1
Standards Organization (ISO).        compiler, but instead produced
Object Identifiers would have no     utilities to construct the BER
value as identifiers if they         records directly.  OCLC has made
were not unique.  Normally, ISO      those utilities publicly
issues Object Identifiers, but       available, as well as the Z39.50
once ISO issued an Object            Client API.  The reasons for
Identifier for Z39.50, the           choosing not to use an ASN.1
Z39.50 Maintenance Agency was        compiler stem mostly from the
authorized to issue subordinate      maturity of the compilers when
Object Identifiers for Z39.50        OCLC first started implementing
objects.  Thus, all Z39.50           Z39.50 in 1988.  Those reasons
Object Identifiers begin with        are given in greater detail in
the Object Identifier for the        the documentation accompanying
standard itself.                     the BER utilities.  Directions
                                     for getting the BER utilities
4.2  Encoding the Message: The       can be found at the end of this
Basic Encoding Rules                 article.
                                     
                                     4.2.1     The BER Utilities
The BER utilities allow the          and SearchResponse() and
programmer to build a tree           PresentRequest() and
structure that describes the         PresentResponse().  The request
contents of the record, instead      routines take parameters that
of filling in a record-specific      correspond to the fields in the
structure and having a record-       Z39.50 requests.  The response
specific routine construct the       routines take a BER record as
BER record.  Each node in the        their only parameter and return
tree contains the tag for the        a pointer to a response-specific
data it describes and either a       structure with fields in it that
pointer to data or a pointer to      correspond to the fields in the
another node in the tree.  A         Z39.50 response.  The encoding
node in the tree is a C              and decoding of the requests and
structure of type DATA_DIR.          responses will depend on the BER
Routines are provided to             utilities.
construct the tree and to encode     
the primitive data types such as     6.   Establishing the Z39.50
BITSTRING and INTEGER.  Once the     Connection
tree is built, a utility routine     The vast majority of Z39.50
(bld_rec()) is called to             servers are accessible via
construct the BER record.            TCP/IP, so our client will need
When a BER record is received        to know how to connect to a
and decoded by an application,       server via TCP/IP.  The usual
one of these tree structures is      way to perform TCP/IP functions
produced.  To examine the            is with “sockets”.  Sockets
contents of the BER record,          provide the tools and structures
simply traverse the tree.  This      for establishing TCP/IP
puts the interpretation of the       connections and for sending and
record much more in the hands of     receiving messages.  Sockets
the programmer.                      have some of the characteristics
                                     of files, in that they are
5.   ZDEMO and the Client API        opened, read from and written
Zdemo is going to be a simple        to.  In the UNIX world, the
client.  It will establish a         relationship between files and
connection to the Z39.50 server,     sockets is very close; it is
send an InitRequest and wait for     less so in the MS Windows world.
an InitResponse.  It will then       For our purposes, only the
sit in a loop waiting for the        simplest features of sockets
user to enter searches, record       will be used.  We will need to
display requests or a Quit           know how to convert a host name
command.  Commands will consist      into an IP address, open a
of a single letter (S for            socket, send a message, wait for
Search, D for record Display and     a return message, determine how
Q for Quit.)  Arguments to the       many bytes of message are
commands can follow the command      waiting, read a message and
and the default command is           close the socket.  The complete
Search, when the command is          code for opening and closing a
omitted (i.e., S DOG and DOG are     connection to a Z39.50 server is
equivalent commands).                contained in irpconn.c at OCLC’s
The Client API is nearly as          anonymous FTP site.  (See the
simple.  It consists of the          section on Source Code
routines InitRequest() and           Availability at the end of this
InitResponse(), SearchRequest()      article.)  The code for writing
                                     a Z39.50 request, waiting for
the response and then reading        structure which contains data
the response is contained in         that will be used in creating
doirp.c.                             the socket.  If gethostbyname()
Windows Sockets are similar          fails, then connect() will write
enough to standard UNIX sockets      a diagnostic message and return
that I have provided support for     a failure indication.
them as well.  Sprinkled             Next, the socket is created.
throughout irpconn.c and doirp.c     This is done by calling
you will see fragments               socket(), telling it that the
surrounded with “#ifdef WINDOWS”     client will be using it to
and “#endif”.  These sections        communicate via TCP/IP.  If
contain the support for Windows      socket() fails, then connect()
Sockets.                             will write a diagnostic message
The routine to make the              and return a failure indication.
connection is named connect().       Next, the connection to the
It gets passed the name of the       server is established by calling
host machine for the Z39.50          connect(), passing it the socket
server and the port where the        and a structure containing the
server is listening.  The            IP address and port number.  If
standard port for Z39.50 is 210,     connect() fails, then connect()
but few of the servers actually      will write a diagnostic message
listen at that port, so zdemo        and return a failure indication.
(our client program) will need       If it succeeds, then connect()
to accept the port number as an      returns a pointer to the socket
argument.  In turn, zdemo will       and is done.  A TCP/IP
get the host name and port as        connection has been made to the
arguments that are passed to it,     Z39.50 server.
though, with modification, zdemo     
could read this information from     6.1  ZDEMO
a configuration file.                So far, our source code for
For MS Windows applications,         zdemo looks like this:
the first step is to initialize
winsock.dll, the dynamic link
library that contains the
sockets routines.  This is done
by calling WSAStartup(), passing
it the lowest acceptable version
number of the Windows Sockets
standard.  In our code, zdemo
will ask for version 1.1.  If
either there is no winsock.dll
available or it does not support
version 1.1 of the Windows
Sockets standard, then connect()
will write a diagnostic message
and return a failure indication.
The next step in establishing
the connection will be to
convert the host name into an IP
address.  This is done by
calling gethostbyname(), passing
it the host name.  If
successful, it will return a
                            void  *socket;
                                   
                   int main(int argc, char *argv[])
                                   {
            char  password[20], server_name[100], userid[20]”,
                 *usage=“usage: zdemo -h[hostname] [-pport#] “
                             “[-uuserid/password]”;
                          int      i, port=210;
                                   
       get_args(argc, argv, server_name, &port, userid, password);
    printf(“Talking to Z39.50 server on port %u of host ‘%s’\n”, port,
                                server_name);
                        /* initialization code */
             if( (socket=irp_connect(server_name, port))==0 )
                                    {
                 printf(“unable to connect to server %s\n”,
                        server_name?server_name:””);
                                  exit(1);
                                    }
                                   }
                                   

7.
                                     bitstring with a bit turned on
Initialization                       for each version of the standard
The first Z39.50 service is          that the client understands.
Initialization.  The client and      The server responds with a
server use this service to           similar bitstring.  The highest
negotiate the other Z39.50           version of the standard that the
services and options that are to     client and server have in common
be provided.  They also get to       is the version in effect for the
negotiate the preferred message      session.  If the client and
size and exceptional record          server have no supported version
size.  In addition, the client       in common, then the server will
can provide a userid and             return an empty bitstring and
password.                            fail the InitRequest.  The
                                     client can deduce the reason for
7.1  Negotiation                     the failure from the empty
Z39.50 supports a simple             Version bitstring in the
negotiation mechanism.  The          InitResponse.
client proposes values in the        
InitRequest and the server           7.1.2     Options
responds with the actual values.     The client and server negotiate
If the client is unhappy with        the services and options that
the returned values, its only        they want through the Options
option is to close the session.      bitstring.  These are specified
                                     by turning on the appropriate
7.1.1     Version                    bits in the bitstring.  All of
There are now three versions of      the Z39.50 services can be
Z39.50.  Version 1 was defined       negotiated; that is, the client
in 1988.  It was implemented at      can request that they be made
only a few sites and was             available by the server.  The
completely superseded by Version     server can deny these services
2, which introduced ASN.1 and        by turning off the appropriate
BER encoding to the standard.        bit in the bitstring when it is
Version 2 was defined in 1992.       returned in the InitResponse.
The 1995 version of the standard     Options that can be negotiated
defines both Version 2 and           include such things as support
Version 3.  The reason for this      for named result sets or
is that the ZIG wanted Version 3     concurrent operations.
to be backward compatible with       
Version 2 and wanted a single        7.1.3     Message Sizes
document that defined both.  The     The client also specifies a
ZIG did not want developers to       Preferred-message-size and an
have to have two documents to        Exceptional-record-size.  The
develop a server capable of          Preferred-message-size will be
interoperating with either           exceeded by the server only when
Version 2 or Version 3 clients.      the client requests a single
So, both versions are defined in     record and its size exceeds the
Z39.50-1995 and all the              Preferred-message-size, but not
compatibility rules for the two      the Exceptional-record-size.
versions are defined there as        The purpose of this is to allow
well.                                the client to control the
The version of the standard that     maximum size of a normal message
the client wants to use is one       from the server, but to allow it
of the things that is                to occasionally accept large
negotiated.  The client sends a      records.
The server may respond to the             char *id,
proposed values with alternative        char *password);
values in the InitResponse.          
                                     7.3.1     Encoding the Request
7.2  Other Initialization            The easiest way to understand
Parameters                           the InitRequest() routine is to
The client can provide a userid      walk through it line by line,
and password in the InitRequest      showing the ASN.1 that is being
and can also provide information     encoded and providing
identifying the client software      commentary.  The C code is
itself.  Lastly, the InitRequest     indented and in bold.  The ASN.1
contains a placeholder for           is in italics and the commentary
information defined externally       is in normal text.
to the standard.                     Normally when I code using the
All Z39.50 request definitions       BER utilities, I use
include an optional referenceId.     preprocessor variables to hold
This is an arbitrary string of       the tag values.  The
bytes that the client can send       preprocessor variable
that the server is required to       InitRequest would be defined as
return with the response.  Its       20.  I do this for readability.
intent is to help the client         But in the code below, the
identify the returning response      commentary explains what is
in an asynchronous message           going on in the code, and I want
environment. While referenceId       you to be able to see the
can hold any number of bytes,        correlation between the code and
the Z39.50 Client API allows         the ASN.1, so I am omitting the
only a  C language long value to     preprocessor variables.  If you
be used.                             get the code from our FTP
                                     server, you will see proper
7.3  The InitRequest                 preprocessor variables instead
The InitRequest is created by a      of constants.
call to the InitRequest()
routine.  It takes a
referenceId, a
preferredMessageSize, an
exceptionalRecordSize, an id and
a password as parameters.  It
does not accept options as a
parameter, since the Client API
always negotiates for the most
functionality that it can
handle.
InitRequest() returns a pointer
to an allocated area in memory
that contains the BER encoded
InitRequest.
The prototype for InitRequest()
looks like this:
unsigned char *InitRequest(
   long referenceId,
   long preferredMessageSize,
   long exceptionalRecordSize,
                                   
    CHAR *Init_Request(long referenceId, long preferredMessageSize,
         long  exceptionalRecordSize, char *id, char *password, long
                                 *len)
                                  /*
    referenceId has no particular meaning to the Client API.  You can
                             put whatever
     value you want into it, and it will be returned in the response.
                            id and password
      can be either NULL or “”.  len will contain the length of the
                            encoded request
                       when InitRequest() returns.
                                  */
                                   {
       static char *protocol_version=”yy”;  /* versions 1 and 2 */
                                  /*
    When you want Version 2, you have to ask for Version 1 too.  (This
                              is to allow
                    interoperability with ISO 10163).
                                  */
    static char *options_supported=”yy”; /* search and present only */
         /*****************************************************/
       /*                      build an IRP Init request        */
         /*****************************************************/
                     dir=dmake(20, ASN1_CONTEXT, 30);
             initRequest  [20] IMPLICIT InitializeRequest,
                                  /*
    Make a DATA_DIR tree for assembling the parts of our message.  The
                               first two
     arguments specify the tag and tag type for the root of our tree.
                            They correspond
   to the first tag in the ASN.1 definition of an InitRequest.  The 30
                             tells dmake()
      that we expect to see 30 nodes in our tree.  If that number is
                          exceeded, then the
    BER utilities will automatically increment the size of the tree by
                             that amount.
   dir, the value returned by dmake(), is a pointer to the root of the
                                 tree.
                                  */
                             if(referenceId)
             daddchar(dir, 2, ASN1_CONTEXT, (CHAR*)&referenceId,
                         sizeof(referenceId));
                  referenceId  ReferenceId OPTIONAL,
                                  /*
             ReferenceId is defined later in the standard as:
                  ReferenceId ::= [2] IMPLICIT OCTETSTRING
     If a non-zero referenceId has been provided, then add it to the
                          request.  The first
      argument to daddchar() is a pointer to the parent of the field
                           being added.  The
     next 2   arguments are the tag and tag type of the referenceId.
                             The last two
     arguments are a pointer to the referenceId and its length.  The
                            referenceId is
    being passed to the server as a string of bytes (an OCTETSTRING in
                                ASN.1.)
                                  */
            daddbits(dir, 3, ASN1_CONTEXT, protocol_version);
                   protocolVersion  ProtocolVersion,
                                  /*
           protocolVersion is defined later in the standard as:
                 protocolVersion ::= [3] IMPLICIT BITSTRING
      daddbits() encodes ASN.1 BITSTRINGs.  Here, we’re encoding the
                           ProtocolVersion.
                                  */
            daddbits(dir, 4, ASN1_CONTEXT, options_supported);
                           options  Options,
                                  /*
               Options is defined later in the standard as:
                     Options ::= [4] IMPLICIT BITSTRING
                                  */
       daddnum(dir, 5, ASN1_CONTEXT, (CHAR*)&preferredMessageSize,
                       sizeof(preferredMessageSize));
              preferredMessageSize  [5] IMPLICIT INTEGER,
                                  /*
       daddnum() encodes ASN.1 INTEGERs.  Here, we’re encoding the
                         preferredMessageSize.
                                  */
       daddnum(dir, 6, ASN1_CONTEXT, (CHAR*)&exceptionalRecordSize,
                       sizeof(exceptionalRecordSize));
             exceptionalRecordSize  [6] IMPLICIT INTEGER,
                              if(id && *id)
                                    {
                                  char *t;
                              DATA_DIR *subdir;
                                  /*
      We’ll use subdir to keep track of subtrees in our DATA_DIR tree.
                                  */
                            int len=strlen(id)+1;
                                  /*
       We need to figure out how long the id and password are and then
                             add 1 for the
                          ‘/’ separator character.
                                  */
                          if(password && *password)
                            len+=strlen(password)+1;
                                    else
                                  password=””;
                        t=(char*)dmalloc(dir, len+1);
                                  /*
        dmalloc() malloc’s space that is freed automatically when the
                             DATA_DIR tree
      is freed.  In this case, the “+1” is for the NULL that sprintf()
                          will put at the end
                               of the string.
                                  */
                               strcpy(t, id);
                          if(password && *password)
                     sprintf(t+strlen(t), “/%s”, password);
                    subdir=daddtag(dir, 7, ASN1_CONTEXT);
                  idAuthentication  [7] ANY OPTIONAL,
                                  /*
       daddtag() adds a tag without any data.  It returns a pointer to
                             the node that
                   was added to the tree to hold the tag.
                                  */
       daddchar(subdir, ASN1_VISIBLESTRING, ASN1_UNIVERSAL, (CHAR*)t,
                                len-1);
                                  /*
       The ANY is recommended later in the standard to be encoded as a
                                CHOICE,
                           one option of which is:
                              open VisibleString,
         Add the id and password with an IMPLICIT ASN.1 data type of
                               VISIBLESTRING.
                                  */
                                    }
           daddchar(dir, 110, ASN1_CONTEXT, (CHAR*)”1995”, 4);
    implementationId  [110] IMPLICIT InternationalString OPTIONAL,
       daddchar(dir, 111, ASN1_CONTEXT, (CHAR*)”OCLC IRP API”, 12);
   implementationName  [111] IMPLICIT InternationalString OPTIONAL,
            daddchar(dir, 112, ASN1_CONTEXT, (CHAR*)”1.0”, 3);
  implementationVersion  [112] IMPLICIT InternationalString OPTIONAL,
                                  /*
          Tell the server what kind of client is talking to it.
                                  */
                        return bld_rec(dir, len);
                                  /*
      bld_rec() malloc’s the amount of space needed to hold the BER
                           record, assembles
     the BER record in that area and returns a pointer to that area,
                           which is finally
                        returned by InitRequest().
                                  */
                                   }
                                   
                                   
                                 7.3.2
                                     of them have a message.  To do
7.3.2     Transmitting the           this, the application has to
Request                              construct a list of sockets to
Zdemo transmits the BER requests     be waited on.  Two preprocessor
by calling doirp(), passing it       macros are used to construct the
the pointer to the BER request       list: FD_ZERO() and FD_SET().
and the pointer to the socket        FD_ZERO() initializes an empty
returned by connect().  Doirp()      list, and FD_SET() adds sockets
sends the request to the Z39.50      to the list.  After the list is
server, waits for the response       built, the routine select() is
to the request from the server       called, passing it the list of
and returns a pointer to that        sockets to be waited on.  The
response.                            select() call sits inside a
Doirp() starts by determining        while loop; sometimes select()
the length of the request.  It       returns with an indication that
does this by calling the BER         it has not received anything
utility asn1len().  It uses that     yet.
length to drive a while loop         After doirp() has gotten the
where the length represents the      indication that a message is
number of bytes of the request       available, it calls ioctl() to
waiting to be sent.                  determine the amount of data
Doirp() sends data to the server     that has been received.  It then
by calling the socket routine        calls recv() to read the data.
send() and passing it the            It passes recv() the socket, a
socket, a pointer to the request     pointer to a buffer to hold the
and the number of bytes to send.     incoming message, and the number
Send() returns the number of         of bytes it wants to read (which
bytes actually sent.  The            it got from ioctl().)  Recv()
pointer to the request is            returns a count of the number of
incremented by that amount and       bytes that it actually read.  If
the length is decremented by         that count is zero, then there
that amount.  If the length goes     was probably some failure in the
to zero, then the complete           connection and recv() will print
request has been sent and zdemo      an error message and return with
falls out of the while loop.  If     an error indication.
send() indicates an error, then      Often, TCP/IP has to break large
doirp() prints an error message      messages into smaller messages
and quits, returning an error        to transmit them.  That means
indication.                          that when doirp() gets a
Next, doirp() needs to wait for      message, it might be the first
the response from the server.        of many messages that comprise a
The socket utilities are             complete Z39.50 response.  The
prepared to handle much more         BER utilities provide a routine,
complicated tasks than zdemo is      IsCompleteBER(), which gets
requiring of them, so some of        passed a pointer to a buffer
the tools that it uses seem          with a BER encoded message and a
overly complicated for this          count of the number of bytes in
purpose.  The mechanism for          the buffer.  IsCompleteBER()
waiting for a message is one of      returns an indication of whether
those tools.  The socket             a complete message is in the
utilities allow an application       buffer.  If the message is
to have many active sockets open     complete, then IsCompleteBER()
and allow you to wait until any      also returns the actual size of
                                     the message, which might be less
than the amount of data in the       containing information from the
buffer, since it is possible for     InitResponse.
more than one message to have        The first step in decoding any
been received at one time.           Z39.50 response is to decode the
If the message was not complete,     BER encoded message. The BER
then IsCompleteBER() also            utility bld_dir() does this.
returns the number of bytes          Its job is to build a DATA_DIR
remaining to be read to complete     tree that reflects the structure
the message.  Sometimes              of the message.  Typically, to
IsCompleteBER() reports that the     decode the message, we’ll just
message is not complete and          traverse the tree.  I use a for
there are zero bytes waiting to      loop to do this.  I set the loop
be read.  This means that            variable to the first child in
IsCompleteBER() cannot determine     the tree and loop through all
the remaining length and doirp()     its siblings.  Inside the loop I
should just wait for more data       use a switch statement to test
to arrive.  Either way, doirp()      for the possible tags that might
sits in a loop, reading more         have been in the message.
data, until IsCompleteBER()          Again, as with the
reports that a complete message      InitRequest(), the easiest way
has arrived.  When that happens,     to understand the InitResponse()
doirp() returns a pointer to the     routine is to walk through it
buffer containing the message.       line by line, showing the ASN.1
At this point, zdemo has sent        that is being encoded and
our InitRequest and received an      providing commentary.  The C
InitResponse.                        code is indented and in bold.
                                     The ASN.1 is in italics and the
7.4  The InitResponse                commentary is in normal text.  I
The most important field in an       have also repeated the practice
InitResponse is the result           of replacing preprocessor
field.  It tells the client          variables with constants to
whether its InitRequest has been     emphasize the correspondence
accepted by the Z39.50 server.       between the C code and the
If it has a non-zero value, then     ASN.1.
a Z39.50 session has been
successfully established.  If it
is zero, then the Z39.50 server
has rejected our session.
Unfortunately, there is no
explicit mechanism for the
server to tell why it is
rejecting our InitRequest.
We’ll have to deduce the reason
from the other values returned
in the InitResponse.

7.4.1     Decoding the Response
The Z39.50 Client API provides
the routine InitResponse() to
decode the InitResponse from the
Z39.50 server.  It is passed a
pointer to the InitResponse and
returns a pointer to a structure
                                   
              INIT_RESPONSE *InitResponse(CHAR *response)
                                   {
                          DATA_DIR far *subdir;
                      INIT_RESPONSE *init_response;
                 if(!response || !bld_dir(response, dir))
                                return NULL;
                                  /*
      If a response was not provided or we were unable to decode the
                            response, then
      return a failure indication.  The dir that is being passed to
                         bld_dir() is the same
     one that was created in InitRequest() to hold the message being
                         built there.  Dir is
    a global variable and will be used by all the request and response
                               routines.
                                  */
                            if(dir->fldid!=21)
                                return NULL;
            initResponse  [21] IMPLICIT InitializeResponse,
                                  /*
      If the response wasn’t an InitResponse, then return a failure
                         indication.  The tag
             in the root node of the tree is the message tag.
                                  */
              if( (init_response=(INIT_RESPONSE*) calloc(1,
                    sizeof(INIT_RESPONSE)))==NULL)
                                return NULL;
                                  /*
     If we can’t allocate space to hold the structure describing the
                          InitResponse, then
                       return a failure indication.
                                  */
         for(subdir=dir->ptr.child; subdir; subdir=subdir->next)
                                  /*
      This is our driving loop.  The loop variable is initialized to
                       point at the first child
     off the root.  As long as there is such a child, process it and
                           then point at its
                                 sibling.
                                  */
                            switch(subdir->fldid)
                                  /*
               Test for the value of the tag in this node.
                                  */
                                      {
                                    case 2:
                  referenceId  ReferenceId OPTIONAL,
                                  /*
                 ReferenceId is defined later in the standard as:
                       ReferenceId ::= [2] IMPLICIT OCTETSTRING
                                  */
             memcpy((char*)&init_response->referenceId, (char*)subdir-
                              >ptr.data,
                                 (int)subdir->count);
                                  /*
   Just save the referenceId in the INIT_RESPONSE structure.  Only the
            calling  application will be interested in it.
                                  */
                                      break;
                                    case 4:
                           options  Options,
                                  /*
                   Options is defined later in the standard as:
                          Options ::= [4] IMPLICIT BITSTRING
                                  */
                     init_response->options=dgetbits(subdir);
                                  /*
               dgetbits() decodes encoded BITSTRINGs.  It returns a
                           character string
            with a ‘y’ for every bit that was turned on, and a ‘n’ for
                            every bit that
                                  was turned off.
                                  */
                                      break;
                                    case 5:
              preferredMessageSize  [5] IMPLICIT INTEGER,
               init_response->preferredMessageSize=dgetnum(subdir);
                                  /*
              dgetnum() decodes encoded INTEGERs.  It returns a long,
                             which we will
                       save in the INIT_RESPONSE structure.
                                  */
                                      break;
                                    case 6:
             exceptionalRecordSize  [6] IMPLICIT INTEGER,
                 init_response->maximumRecordSize=dgetnum(subdir);
                                      break;
                                    case 12:
                     result [12] IMPLICIT BOOLEAN,
                  init_response->result = (int)dgetnum (subdir);
                                  /*
             BOOLEANs are encoded as INTEGERs, so dgetnum() is used to
                                decode
             them.  A non-zero value means TRUE and a zero value means
                                FALSE.
                                  */
                                      break;
                                      }
                                    }
                          return init_response;
                                   }
                                   

7.5  ZDEMO
The following code gets added to
zdemo:
                                   
                     INIT_RESPONSE *init_response;
                             long        len;
                   unsigned char  *request, *response;
                                   
                                  /*
                          Build the InitRequest.
                                  */
     request=InitRequest(0, 16384, 500000L, userid, password, &len);
                                  /*
                  Send the request and get the response.
                                  */
                   response = do_irp(request, socket);
      if(!response)  /* If we did not get a response, then quit. */
                                    {
                  printf(“unable to send init request\n”);
                                  exit(2);
                                    }
                                  /*
                           Decode the response.
                                  */
                  init_response=InitResponse(response);
               if(!init_response || !init_response->result)
      {  /* If the response was not decodable, or if the InitRequest
                         failed, then quit. */
                          printf(“init failed\n”);
                                  exit(3);
                                    }
                                   
                                     result set.  Every query can
8.   Searching                       have a different result set
Z39.50 allows highly specific        name, allowing the client to
searching of databases.  The         reference any number of previous
specificity of Z39.50 queries is     result sets.  But few, if any,
one of the standard’s great          servers allow an unlimited
strengths.  Other protocols,         number of result sets.  When a
such as WAIS or Gopher, support      client has exceeded the number
“magical” searching.  The user       of supported result sets, the
enters some kind of free text        server might delete old result
query and “magic” happens.  The      sets arbitrarily.
same query on another server         In fact, some servers allow a
might produce completely             client to have only one result
different results, because           set.  In that case, they do not
different “magic” happened.  The     really support named result
user is at a loss to determine       sets.  To get around the
why the records were retrieved.      apparent contradiction of the
The user is also unable to           client being able to name result
control the search.  The user is     sets and the server being unable
unable to specify that she wants     to support named result sets,
to find records where the word       the ZIG agreed on the result set
SMITH appeared in the title, but     name “default”.  This is the
not as an author.  These             result set name that must
weaknesses have all been             accepted by servers that do not
overcome with Z39.50.                otherwise support named results
Another strength of Z39.50           sets.  If all queries sent to
queries is the persistence of        such a server are named
their results for the duration       “default”, then the client has
of the Z39.50 session.  With         only one result set that it can
other protocols, the results of      refer to.
the query must be sent               Unfortunately, in Version 2 of
immediately to the client.           the standard, the client can not
That’s fine, if the database is      tell whether the server will
small and the result sets are        allow result set names other
always small.  When the              than “default”.  The only way to
databases are large, that is not     tell is to use a different
practical.  The user needs the       result set name.  If the server
ability to fetch and examine         cannot support named result
some of the records and still be     sets, it will fail the search
able to ask for other records        and return an error code
later.  Better yet, if the           indicating the problem.  The
result set is large, the user        client will then know that
would like to be able to apply       “default” will be the only
restrictors to the result set        acceptable result set name.  In
and produce a smaller, hopefully     Version 3, support for named
more pertinent, result set.          result sets is one of the
                                     options that can be negotiated
8.1  Result Sets                     at initialization time.
In order to reference a result       If the client uses the same
set after it has been produced,      result set name twice, the
the result set must have a name.     server should replace the
In Z39.50, the client provides       previous result set of the same
the name of the result set with      name with the new result set.
the query: the client names the      To keep that from happening
accidentally, the client is          collection of attributes.
required to set a flag in the        Implementors are free to invent
SearchRequest indicating that        their own attribute sets, but
the result set is to be              the developers provided a
replaced.                            starter set of attributes and
                                     packaged them in an attribute
8.2  Attributes                      set named bib-1.
In “magic” searching systems,        Attribute sets are identified by
query terms are unqualified.         an Attribute Set ID, which is
That is, the user types in a         just an Object Identifier.  All
term, but provides no extra          Attribute Set ID’s begin with
information about the term to        1.2.840.10003.3; the Attribute
indicate its semantic meaning.       Set ID for the bib-1 attribute
Systems that provide more            set is 1.2.840.10003.3.1.
specific searching usually           The bib-1 attribute set contains
provide the concept of an            6 types of attributes: Use,
“index”.  So the user can say        Relation, Position, Structure,
that the term provided should be     Truncation and Completeness.
considered to be an author or a      These attributes are explained
word from a title.  But this is      in great detail in the bib-1
only a single piece of               attributes documents, available
qualifying information that can      at the Library of Congress’ FTP
be provided with the term.           site.  The only attributes
The Z39.50 developers wanted a       discussed in this article will
richer mechanism than simply         be Use and Structure.
indexes.  They wanted to provide     Attribute types in an attribute
many dimensions of qualification     set are identified by a number.
to the term.  The word they          In the bib-1 attribute set, Use
chose to describe these              is attribute type 1 and
additional qualifications on a       Structure is attribute type 4.
term is “attribute”.  A term can     The values that an attribute can
have many attributes.  One of        have are also identified by a
those attributes could be Use,       number. This means that it takes
which roughly corresponds with       two numbers to specify an
indexes.  The Use attribute          attribute for a term: the
allows the client to specify how     attribute type and the attribute
the term would have been used in     value.  For example, every Use
the records to be retrieved.         attribute, such as AUTHOR or
For example, the term was Used       TITLE, has a number.  (AUTHOR is
as an AUTHOR or TITLE.  Another      1003 and TITLE is 4.)  These
attribute is Structure; the term     numbers are specified in the
is supplied according to a           Attribute Sets appendix of the
particular structure.  The           standard.  At last count, there
structure might be that the term     were 98 different Use attributes
is a WORD or a PHRASE.               specified, and that list can be
                                     extended at any time.
8.2.1     Attribute Sets             
Since the developers understood      8.3  Query Terms and Attributes
that they could not predict all      Terms can have one or more
the attributes that implementors     attributes associated with them.
would want, they created the         In the ASN.1 for the standard,
idea of an attribute set.  An        this association is called
attribute set defines a              AttributesPlusTerm and consists
of an AttributeList and a Term.      sent as type-0 queries.
An AttributeList is defined as a     Type-100 queries use the query
SEQUENCE of AttributeElement         grammar from the ANSI/NISO
which are in turn defined as a       Common Command Language
pair of INTEGERs consisting of       (Z39.58).  This grammar is
attributeType and                    closely related to, and has the
attributeValue.  These pairs of      same problems as, the ISO Common
numbers are exactly the numbers      Command Language.
described above.                     Type-101 queries are an
In Version 2, all the attributes     extension of type-1 queries to
in the query have to come from       support proximity searching.
the same attribute set.  During      With Version 3 of the standard,
the development of Version 3, it     type-1 queries are identical
soon became clear that this was      with type-101; but they remain
a problem.  How could the user       distinct in Version 2.
formulate a query asking about       Type-102 queries are still being
AUTHORs (a bib-1 Use attribute)      defined.  They are intended to
and BOILINGPOINTs (a Use             support some of the features of
attribute from an chemical           query grammars that support
attribute set)?  In Version 3,       ranking.
the attribute set ID can be          
specified for every                  8.5  Reverse Polish Notation
AttributeElement.  That means        Queries (type-1)
that you can mix attributes from     Type-1 queries are called
a number of attribute sets.          Reverse Polish Notation (RPN)
                                     queries.  Reverse Polish
8.4  Query Grammars                  Notation is a way of
Z39.50 defines several query         representing Boolean queries by
grammars, each one identified by     specifying first the operands
a number.  Type-0 queries are        and then the operator.  Normal
for private query grammars.          query grammars let you specify
Sometimes clients and servers        an operand, then an operator and
from the same organization           another operand.  This is called
prefer to use that                   an infix notation.  The problem
organization’s own query             with infix notations is that you
grammar.  At OCLC, a number of       end up having to use parentheses
our clients know how to use the      to specify the order of
query grammar of our database        evaluation of the operators and
engine and pass those queries to     operands.  Reverse Polish
the Z39.50 server as type-0          Notation does not have that
queries.                             problem.
Type-1 queries are the only          The search (DOG OR CAT) AND
widely accepted queries.             HOUSE would be expressed as DOG
Support for them is mandatory in     CAT OR HOUSE AND in Reverse
Z39.50.  Type-1 queries are          Polish Notation and the search
described in more detail later.      DOG OR (CAT AND HOUSE) would be
Type-2 queries use the query         expressed as DOG CAT HOUSE AND
grammar from the ISO Common          OR in RPN.  The query is
Command Language (ISO 8777).         evaluated left to right.  Every
This grammar has severe              time you encounter an operator
extensibility limitations and        you process the two operands to
probably should not be used.         the left and replace the
ISO CCL queries can always be
operator and operands with the       search request.  Unfortunately,
result of evaluating them.  In       this is another feature that
the first example, the OR is         cannot be determined at
associated with DOG and CAT.         initialization time.  One way
After DOG OR CAT is evaluated,       the client can find out if the
the result is put back into the      server supports multiple
query.  The AND then has that        database names is to try it and
result and HOUSE as its              see if a diagnostic is returned,
operands.                            but the lack of a diagnostic
Reverse Polish Notation queries      does not necessarily mean that
can be easily represented as         all the databases were searched.
trees, with the operators as         Some of the servers just ignore
roots and branches and the           the extra database names.  This
operands as leaves.  That is the     feature is not available in the
sense in which type-1 queries        Client API.
are Reverse Polish Notation.         
They are not text strings as in      8.7  Piggy-backed Presents
the examples above.  They are        It is possible to request that
trees defined recursively in         records be returned
ASN.1.  A type-1 query can           automatically with the
either be an operand or an           SearchResponse.  This is called
operator with two operands.  An      a piggy-backed Present.  Piggy-
operand can either be a term or      backed Presents are supported in
a type-1 query.  This recursive      the Client API but are not
definition allows for                supported by zdemo and are
arbitrarily complex queries.         beyond the scope of this
We need some way to pass a query     article.  Zdemo will provide
into our Z39.50 Client API.  To      hard-coded values for those
do this, we’ll use real Reverse      parameters in its call to
Polish Notation. Terms will be       SearchRequest().
optionally followed by a slash       
‘/’ and then a Use attribute         8.8  The SearchRequest
value.  They can also be             The SearchRequest is created by
followed by an optional slash        a call to the SearchRequest()
and a Structure attribute value.     routine.  It takes a
Terms can be surrounded by           referenceId, a replaceIndicator,
double-quotes.  The following        a resultSetName, a databaseName,
are all examples of legal query      a query, and a query_type.
terms:  DOG (no Use or Structure     The referenceId is a C language
attribute specified), DOG/21         long value and has the same
(dog as a subject heading),          meaning as in InitRequest().
DOG/21/2 (dog as a subject           The replaceIndicator is an
heading and a structure of WORD)     integer and has either a zero or
and “DOG HOUSE”/21/1 (dog house      non-zero value for FALSE and
as a subject heading and a           TRUE respectively.  The
structure of PHRASE).                resultSetName can be any
                                     character string.  The
8.6  Database Names                  databaseName is a character
The client must specify what         string whose value is determined
database or databases the server     by the server.
is to search.  The Z39.50            The conversion of the query
standard allows multiple             parameter into a Z39.50 query is
databases to be specified in a       probably the trickiest code in
the Client API.  The query is        produced; the searches are not
passed as a character string,        any more exciting.  The code is
but its evaluation is dependent      provided if you want to examine
on the query-type.  If the query-    it.
type is 0, then the query is         
assumed to be in a private query     8.9  The SearchResponse
grammar and is passed through to     The SearchResponse is processed
the Z39.50 server exactly as         by SearchResponse() and it, like
received by SearchRequest().         InitResponse(), takes the BER
If the query-type is 1, then         record returned by the Z39.50
SearchRequest() is expecting a       server as its only parameter and
string with a Reverse Polish         returns a pointer to an
Notation query in it.  The terms     allocated structure which
can be surrounded with double-       contains the fields of the
quotes.  This is important if        SearchResponse.  The prototype
the term consists of multiple        for SearchResponse() is:
words, as in a phrase search.        SEARCH_RESPONSE *SearchResponse(
The term can also be followed by        CHAR *response);
an optional slash (‘/’) and a        and the SEARCH_RESPONSE
Use attribute value.  The Use        structure looks like this:
attribute value can also be          typedef struct
followed by another optional         {
slash and a Structure attribute         long referenceId;
value.  There is no default Use         int searchStatus;
attribute value and the default         long resultCount;
Structure attribute value is            long resultSetStatus;
WORD.                                   long error_code;
For example: to search for books        char *error_msg;
about slavery by Mark Twain, you     } SEARCH_RESPONSE;
could enter the search:              The referenceId is the same one
slavery/21 “twain, mark”/1003/1      provided to SearchRequest().
and                                  searchStatus contains either a
which asks for records with          zero to indicate that the search
“slavery” as a subject heading       failed or a non-zero value to
and “twain, mark” as an author       indicate success.
phrase.                              If searchStatus indicates that
As in InitRequest(),                 the search succeeded then
SearchRequest() returns a            resultCount will contain the
pointer to an allocated area in      count of the number of records
memory that contains the BER         that satisfy the search and the
encoded SearchRequest.               value of resultSetStatus will be
The prototype for                    undefined.  A value of zero in
SearchRequest() is:                  resultCount is not an indication
unsigned char *SearchRequest(        that the search failed, only
   long referenceId,                 that there are no records in the
   int replaceIndicator,             database that meet the search
   char *resultSetName,              criteria.
   char *databaseName,               If searchStatus indicates that
   char *query);                     the search failed, then the
I will not walk through the code     value of resultCount is
this time.  You have already         undefined and resultSetStatus
seen BER encoded messages            will indicate if there are any
records available for retrieval.
Typically resultSetStatus will
contain the value 3 which
indicates that there is no
result set available, but other
values are potentially available
and defined in the standard.
error_code and error_msg should
contain values; otherwise they
will contain 0 and NULL
respectively.  The values for
error_code and error_msg are
described in the Error
Diagnostics appendix of the
standard.

8.10 ZDEMO
Before zdemo can generate a
search, it needs a simple
command processor.  Remember
that commands to zdemo are going
to be single letters, so parsing
the commands will be easy. Zdemo
will need a loop for getting
commands from the user.  A
command of ‘q’ or an end-of-file
indication from the input stream
will end the loop.  Inside that
loop, zdemo will test for a
single letter command and if
there is none, then it will
assume that a search is being
requested.  It will then switch
on the value of the command and
call a routine to handle the
command.
Our driving loop looks like
this:
                        char cmd, input[1000];
                          while(gets(input))
                                   {
                              strlwr(input);
                if(input[0])  /* did we get any input? */
         if(input[1]==‘ ‘)  /* was the second character a blank? */
                                 cmd=input[0];
                                    else
                cmd=‘S’;  /* assume that they want to search */
                                   else
                         cmd=‘ ‘;  /* no command */
                                   
                               if(cmd==‘q’)
                         break;  /* exit the loop */
                                   
                               switch(cmd)
                                    {
                  case ‘s’:  /* explicit search command */
             zsearch(input+2);  /* +2 to skip command and blank */
                                     break;
                  case ‘S’:  /* implicit search command */
                                zsearch(input);
                                    }
                                   }
                                   
In addition, the routines that
zdemo calls will need some clues
about the behavior of the Z39.50
server.  For instance, some
servers will not accept any
resultSetNames except “default”.
Zdemo will be told this through
arguments that are passed to it
at startup time.  In the case of
the “default” resultSetName,
zdemo will look for an argument
of “-d” to indicate that it must
use the “default” resultSetName.
                                   
                        char resultSetName[20];
                                   
                       void zsearch(char *query)
                                   {
                          long              len;
                   SEARCH_RESPONSE   *search_response;
                unsigned char         *request, *response;
                   static int            search_num=1;
                                   
                if (MustUseDefault)  /* global variable */
                      strcpy(resultSetName, “default”);
                                   else
              sprintf(resultSetName, “Search%d”, search_num++)
                                   
   request=SearchRequest(0, TRUE, resultSetName, database_name, query,
                                &len);
                                   
                   response = do_irp(request, socket);
                search_response=SearchResponse(response);
      printf(“%ld records found.\n”, search_response->resultCount);
                    if(search_response->searchStatus)
                     printf(“Search Successful! :-)\n”);
                                   else
                                    {
                         puts(“Search Failed! :-(“);
          printf(“Error_code=%ld, message=’%s’\n”, search_response-
                             >error_code,
           search_response->error_msg ? search_response->error_msg :
                               ”None provided”);
                                   
                     if(search_response->error_code==22)
                                      {
                 puts(“Must use ResultSetName of \”default\””);
              puts(“Resetting internal flags; please try again”);
                              MustUseDefault=TRUE;
                                      }
                       if(search_response->error_msg)
                       free(search_response->error_msg);
                                     }
                          free(search_response);
                             free(response);
                                   }
                                   
                                  9.
                                     the n’th record in a result set
9.   Retrieval                       and always get the same record
The Z39.50 implementors clearly      from the same result set.
saw retrieval as a weakness in       To retrieve records from a
Version 2 of the standard.  Many     result set, the client specifies
of the enhancements in Version 3     the name of the result set and
center around retrieval.             the relative record number of
Included in these enhancements       the record in the result set.
are the ability to ask for           The first record in a result set
specific parts of a record, to       is record number 1.  In the C
ask about the contents of a          programming languages the first
record and to specify a              record would naturally be record
prioritized list of desired          number 0, so it is important to
record syntaxes.  But, even          remember that that is not true
without these enhancements,          here.
Z39.50 supplies perfectly            To ask for several records, the
acceptable mechanisms for            client can specify a single
retrieving records.  Since this      relative record number for the
article is concentrating on core     first desired record and a count
functionality, the Client API        of the number of records to be
will only use those retrieval        returned.  This only allows for
features available in Version 2.     a single list of adjacent
Version 2 allows clients to ask      records to be returned.  With
for a specific range of records      Version 3 comes the ability to
from a result set in full or         specify multiple ranges of
brief forms and to specify a         records in a single request.
single record syntax.  The most      This will allow the user to
common record syntaxes are           request the first, third and ten
USMARC and SUTRS.  USMARC is the     thousandth records from a result
record syntax used in the U.S.       set and the client will be able
library community to exchange        to satisfy the request in a
cataloging information and SUTRS     single transaction with the
is a Simple Unstructured Text        server.
Record Syntax,  invented by the      
ZIG.  Both of these record           9.2  Element Sets and Element
syntaxes will be discussed in        Set Names
greater detail later.                The fields in a record are
                                     called elements in Z39.50.  A
9.1  Result Sets Revisited           collection of elements would be
In Z39.50, result sets are           an element set and if that
modeled as containing ordered        collection of elements had a
lists of pointers to records.        name, it would be an element set
This does not mean that a server     name.  In Version 2, element set
is actually supposed to create       names are the only mechanism
lists like that; it means that       available to specify the
the client can act as if that        elements desired from a record.
were true.  The ordering of the      Version 3 includes rich
result set is important,             mechanisms for identifying and
although the type of ordering is     specifying the elements in a
not.  Whether the records are in     record, but element set names
rank order or chronological          are sufficient for many
order or sorted by title is          purposes.
unimportant.  What is important      The standard only specifies two
is that the client can ask for
element set names: “F” for Full      on the MARC record syntax.  In
records (all elements included)      the United States, the Z39.50
and “B” for Brief records.           developers tend to forget that
Brief records are a problem.         fact and refer to USMARC as
The standard is rightly silent       simply MARC.  But, there are 14
on the elements that constitute      other MARC record syntaxes
a brief record.  But, that           recognized by the standard and
leaves the client developer at       they will be supported by many
the whims of the server              of the commercial servers as
developers as to the fields that     Z39.50 services are implemented
can be displayed in a brief          in Europe.  For the most part,
record.  Unless I am sure that a     these are national MARC syntaxes
particular server returns all        (e.g., UKMARC, CANMARC and
the fields that I want to            FINMARC) which encode support
display in a brief record, I         for local cataloging standards,
usually ask for full USMARC          but there are also some
records and throw away the           internationally recognized MARC
fields that I do not need.  That     syntaxes (e.g., UNIMARC and
technique will not work if SUTRS     INTERMARC.)
records have been requested,         
since they consist of a single       9.3.1.2   Explain
field.                               Successful interoperation of
                                     Z39.50 clients and servers in
9.3  Record Syntaxes                 Version 2 is based on a priori
A record syntax is simply the        agreements between the two
way that records are encoded.        parties.  The client had no
There are a number of record         mechanism for determining what
syntaxes recognized in Z39.50.       Use attributes were going to be
Object identifiers are used to       supported by the server for
specify record syntaxes, so          searching nor what record
record syntaxes must be either       syntaxes were going to be
registered with the maintenance      supported for retrieval.  The
agency or be registered as nodes     client had to be told this
of an implementor’s private          information through some process
object identifier tree.  As          outside of the standard.
mentioned above, there are two       Currently, most of the server
widely recognized record             hosts provide human readable
syntaxes; USMARC and SUTRS.          documentation that can be used
I’ll describe them in detail         to statically configure a
below, but it is worth               client.  The Explain service
mentioning the other record          provides the mechanism that
syntaxes listed in the standard.     allows those things to be
Understanding what these other       determined dynamically.
syntaxes are and where they are      The Explain service is
intended to be used is useful in     implemented as a database that
understanding where the              can be queried by the client.
implementors of the standard are     Access to the records in this
taking it.                           database is primarily gained
                                     through search keys defined by
9.3.1     Non-core Record            the standard.  The contents of
Syntaxes                             these records, which contain
                                     things like Use attributes and
9.3.1.1   Other MARC Syntaxes        record syntaxes supported are
There are a number of variants       defined by the Explain record
syn                                  as well as string tags intended
                                     to carry field “names” that
9.3.1.3   OPAC                       might be of use to a human
OPAC (Online Public Access           viewing them, if not of use to
Catalog) records were an attempt     the software receiving them.
to allow holdings information to     GRS is being heavily used by the
be transmitted along with            Chemical Abstract Service to
bibliographic records (usually       provide their complex chemical
sent in USMARC format.)  They        records which include things
were not widely implemented and      like chemical structure
a number of non-standard             information.  In addition, the
mechanisms for transmitting          GILS (Government Information
holdings information were            Locator Service) profile uses
developed instead.                   GRS records as the most flexible
                                     way to transmit Information
9.3.1.4   Summary                    Locator records and the CIMI
Summary records were developed       (Coalition for the Interchange
as part of an effort to bring        of Museum Information) group is
the WAIS retrieval software into     looking to use GRS records to
compliance with Z39.50.  WAIS        transmit their information.
was based on the 1988 version of     
Z39.50, with a number of private     9.3.2     USMARC
extensions.  Among these             USMARC can be quite daunting, at
extensions was the ability to        first.  Fields are tagged
provide brief record information     numerically and there is little
in a more standardized way than      pattern to the tagging.  If you
the simple Brief Element Set         do not know what the tags mean,
Name provided by the standard.       you are out of luck.  To
                                     complicate things more, some of
9.3.1.5   GRS                        the fields can repeat and others
The Generic Record Syntax is at      cannot: but some of the non-
the heart of most of the growth      repeatable fields have other,
areas of Z39.50 implementation.      repeatable, fields that the
The other record syntaxes            extra data can go into.  (e.g.,
described so far have limited        The first author of a book might
structural flexibility (you          be placed in a 100 field, a non-
cannot have really complex           repeating field, but subsequent
fields) and rigid semantics          authors would be put into 700
(everyone knows what to expect       fields.)
in every field.)  What was           There are actually three
needed was a record syntax with      different sets of rules combined
great flexibility and the            to form USMARC records.  The
ability to transmit both             first is the encoding standard;
elements with semantic               ANSI Z39.2.  It describes the
understanding and elements with      physical encoding of all MARC
no semantic understanding.           records (at least that is the
GRS was invented for this            theory.)  The second is the
purpose.  It supports                tagging rules: what data goes in
arbitrarily complex hierarchical     what fields.  Finally come the
records and elements that can        formatting rules for the data
carry numeric tags from any          (e.g. names should be entered
number of well-known name spaces     last name first with a comma
                                     separator.)  Fortunately, as
client developers, it is not         librarians.
necessary to worry about the         
formatting rules.                    9.3.3     SUTRS
The encoding rules are               The Simple Unstructured Text
straightforward.  The records        Record Syntax exists to provide
are theoretically encoded as 7-      a minimal level of data
bit ASCII, but I’ve seen many        communication.  SUTRS records
private characterset extensions      are essentially preformatted
that use 8-bit ASCII.  The           records.  The intent is to allow
record begins with a fixed           the client to ask the server to
format leader that describes the     format its data in a manner
length and type of the MARC          suitable for display to a human.
record and well as describing        The assumption is that the
some of the encoding options         server probably has a better
that will be used in the record.     idea of how its data should be
The leader is followed by a          formatted than the client does,
directory that describes what        especially if they have no other
fields are contained in the          record syntaxes in common.
records, the offset from the         SUTRS records are simply a
beginning of the data that the       single field of ASCII characters
field can be found at and the        with a newline character at
length of the field.  Fields can     least every 72 characters.  As
have tags in the range 1 through     the name states, there is no
999.                                 structure within that single
Finally comes the data itself.       field.  The client should not
Fields with tags 1 through 10        try to parse the field looking
have a fixed format.  Fields         for subfields.
with tags 11 through 999 have        
subfields.  The subfields do not     9.4  The PresentRequest
have additional subfields.           The PresentRequest is created by
Subfields have single character      a call to the PresentRequest()
tags and the tags are primarily      routine. It takes a referenceId,
alphabetic, but digits and even      a resultSetName, a
punctuation characters are           resultSetStartPoint and
sometimes used.  The fields and      numberOfRecordsRequested,  an
subfields are separated by           ElementSetName and a
separator characters.                preferredRecordSyntax.
I have provided a routine to         The referenceId is a long and
help with the decoding of the        has the same meaning as in
USMARC records; marc2dir().  It      InitRequest().  The
takes a USMARC record and            resultSetName will be one of the
decodes it as if it were a BER       resultSetNames used in a
record. Even if you decide that      previous successful call to
you do not want to use the BER       SearchRequest().  The
Utilities, this routine will         resultSetStartPoint is the
give you a leg up on the             relative record number from the
decoding of USMARC records.  In      resultSet of the first desired
addition, I have provided a          record.
table at the end of this article     numberOfRecordsRequested is the
that lists a large number of         count of the number of
USMARC fields and their              sequential records requested.
subfields and the labels that        The sum of resultSetStartPoint
are commonly put on them when        and numberOfRecordsRequested
displaying them to non-
minus 1 should be less than or
equal to the resultCount for the
resultSet.  ElementSetNames will
be set to “F” or “B”, depending
on whether Full or Brief records
are desired.
preferredRecordSyntax is set to
the Object ID of either USMARC
or SUTRS.  Preprocessor
variables of MARC_SYNTAX and
SIMPLETEXT_SYNTAX are provided
for this purpose.
As in SearchRequest(),
PresentRequest() returns a
pointer to an allocated area in
memory that contains the BER
encoded PresentRequest.
The prototype for
PresentRequest() is:
                                     returned and the PresentRequest
unsigned char *PresentRequest(       completely failed.  If this
   long referenceId,                 happens, there should be an
   char *resultSetName,              error_code and possibly an
   long resultSetStartPoint,         error_msg explaining why the
   long                              PresentRequest failed.  The
numberOfRecordsRequested,            other possible values indicate
   char *ElementSetNames,            why fewer records than requested
   char *preferredRecordSyntax);     where returned.  Those values
                                     are described in detail in the
9.5  The PresentResponse             standard. The values for
The PresentResponse is processed     error_code and error_msg are
by PresentResponse() and it,         described in the Error
like SearchResponse(), takes the     Diagnostics appendix of the
BER record returned by the           standard.
Z39.50 server as its only            The numberOfRecordsReturned
parameter and returns a pointer      contains the count of records
to an allocated structure which      returned by the server.  It
contains the fields of the           should be equal to the
PresentResponse.  The prototype      numberOfRecordsRequested from
for PresentResponse() is:            the PresentRequest().  If it is
PRESENT_RESPONSE                     not, then presentStatus should
*PresentResponse(                    have had a value other than 0.
   CHAR *response);                  The nextResultSetPosition is set
and the PRESENT_RESPONSE             to the value that should be used
structure looks like this:           as the resultSetStartPoint in
typedef struct                       the next PresentRequest() to
{                                    retrieve the next sequential
   long referenceId;                 record.
   long presentStatus;               recordSyntax will be set to the
   long numberOfRecordsReturned;     Object ID of the record syntax
   long nextResultSetPosition;       used by the server for the
   char recordSyntax[50];            records returned.  It should be
   struct record                     the same as the
   {                                 preferredRecordSyntax used in
      long len;                      the PresentRequest().
      char *record;                  records will contain an array of
   } *records                        pointers to and the lengths of
   long error_code;                  the records returned.  The
   char *error_msg;                  number of pointers in the array
} SEARCH_RESPONSE;                   will be equal to
The referenceId is the same one      numberOfRecordsReturned, even if
provided to SearchRequest().         the server accidentally returns
presentStatus contains either a      fewer records than it claims.
zero to indicate that there was      If this happens then the pointer
no error during the                  will be set to NULL.
PresentRequest or it contains a      
status code describing the type      9.6  ZDEMO
of problem encountered during        Zdemo needs four things to allow
the PresentRequest.                  it to do PresentRequests.  It
A value of 5 in presentStatus        needs a way for the user to
means that no records were           specify the resultSetStartPoint
and numberOfRecordsRequested, a
way to specify the
preferredRecordSyntax, a way to
specify the ElementSetName and a
way to display the records
returned.
The preferredRecordSyntax is
specified with a new command (r)
that takes as its single
argument either the word USMARC
or the word SUTRS.  A global
variable is set based on the
argument.  The default value for
preferredRecordSyntax is USMARC.
The ElementSetName is specified
with a new command (e) that
takes as its single argument
either the word FULL or the word
BRIEF.  A global variable is set
based on the argument.  The
default value for ElementSetName
is FULL.
The PresentRequest is initiated
and the numberOfRecordsRequested
and resultSetStartPoint are
specified with a new command (d)
that takes two optional numbers
representing the
resultSetStartPoint and
numberOfRecordsRequested
respectively.  The default value
for both numbers is 1.
The code in zdemo for parsing
the two new commands is trivial
and looks much like the code
added to handle the search (s)
command, so it will not be shown
here.
Zdemo will call a new routine,
zread() to handle the
PresentRequest.  The code for
zread() looks like this:
                        void zread(char *parms)
                                   {
               long              i, numrecs=1, whichrec=1;
                   PRESENT_RESPONSE  *present_response;
                unsigned char         *request, *response;
                                   
              if(*parms)  /* were any arguments provided */
                                    {
                                  char *t;
                            whichrec=atoi(parms);
                     if( (t=strchr(parms, ‘ ‘)) != NULL)
                                numrecs=atoi(t);
                                    }
                                   
       request=PresentRequest(0, resultSetName, whichrec, numrecs,
                   ElementSetName, preferredRecordSyntax);
                                   
                   response = do_irp(request, socket);
                                   
               present_response=PresentResponse(response);
                          if(!present_response)
                                    {
                 printf("Did not get a PresentResponse!\n");
                                   return;
                                    }
                                   
           numrecs= present_response->numberOfRecordsReturned;
                 printf("%ld records returned\n", nRecs);
                 switch(present_response->presentStatus)
                                    {
                              case IRP_success:
                        printf("Present successful\n");
                                     break;
                             case IRP_partial_1:
                             case IRP_partial_2:
                             case IRP_partial_3:
                             case IRP_partial_4:
                     printf("Partial results returned\n");
                                     break;
                              case IRP_failure:
                          printf("Present failed\n");
                                     break;
                                    }
                                   
                         for(i=0; i<numrecs; i++)
       if(present_response->records[i].record)  /* did a record really
                           get returned? */
                                      {
                                char *end, *ptr;
                   if(strcmp(present_response->recordSyntax,
                        SIMPLETEXT_SYNTAX)==0)
             {  /* SUTRS records have a BER wrapper around them */
                             DATA_DIR *temp=dalloc(3);
                bld_dir(present_response->records[i].record, temp);
                            ptr=(char*)temp->ptr.data;
                             end=ptr+(int)temp->count;
                                   dfree(temp);
                                       }
                                   
           if(strcmp(present_response->recordSyntax, MARC_SYNTAX)==0)
               {  /* convert the MARC record to a SUTRS-like record */
               ptr=formatmarc(present_response->records[i].record);
                               end=ptr+strlen(ptr);
                                       }
                                   
                                 while(ptr<end)
                     {  /* print each line in the record */
                            char *t=strchr(ptr, ‘\n’);
                                       if(t)
                                       *t='\0';
                                    puts(ptr);
                                       if(t)
                                       ptr=t+1;
                                       else
                                       ptr=end;
                                       }
                                   
                   free(present_response->records[i].record);
                                         }
                                   
                     if(present_response->error_code)
                                    {
         printf("Error_code=%ld, message='%s'\n", present_response-
                             >error_code,
                         present_response->error_msg ?
                 present_response->error_msg:"None provided");
                       if(present_response->error_msg)
                       free(present_response->error_msg);
                                    }
                                   
                             free(response);
                         free(present_response);
                                   }
                                     Client API can be found on the
9.6.1     Displaying USMARC          same host in the
Records                              pub/BER_utilities directory.
Decoding USMARC records is           OCLC maintains their copyright
beyond the scope of this             to all these materials, but they
article, but the code to             have been made freely available
accomplish it is provided as         to all developers.
part of zdemo at OCLC anonymous      
FTP site.  (See the section of       12.1 License
Source Code Availability at the      ©1995 OCLC Online Computer
end of this article.)                Library Center, Inc., 6565
                                     Frantz Road, Dublin, Ohio 43017-
10.  Terminating the Z39.50          0702.  OCLC is a registered
session                              trademark of OCLC Online
In Version 2 of Z39.50, both the     Computer Library Center, Inc.
client and the server are            NOTICE TO USERS:  The Z39.50
allowed to terminate the session     Client API (“Software”) has been
at any time, simply by dropping      developed by OCLC Online
the TCP/IP connection between        Computer Library Center, Inc.
them.  The routine disconnect()      Subject to the terms and
has been provided to do this.        conditions set forth below, OCLC
It accomplishes this by closing      grants to user a perpetual, non-
the socket with a call to the        exclusive, royalty-free license
fclose() routine (one of the         to use, reproduce, alter,
standard C i/o routines.)            modify, and create derivative
                                     works from Software, and to
11.  Summary                         sublicense Software subject to
This article has described the       the following terms and
elements of Z39.50 necessary to      conditions:
create a simple client.  Many of     SOFTWARE IS PROVIDED AS IS.
the more complex elements have       OCLC MAKES NO WARRANTIES,
been mentioned in enough detail      REPRESENTATIONS, OR GUARANTEES
that you should have some idea       WHETHER EXPRESS OR IMPLIED
if you need them.  Hopefully the     REGARDING SOFTWARE, ITS FITNESS
code provided and its discussion     FOR ANY PARTICULAR PURPOSE, OR
have shown you that while it is      THE ACCURACY OF THE INFORMATION
not trivial to build Z39.50          CONTAINED THEREIN.
applications, neither is it          User agrees that :1) OCLC shall
terribly complex.                    have no liability to user
                                     arising therefrom, regardless of
12.  Source Code Availability        the basis of the action,
The source code for the Z39.50       including liability for special,
Client API and zdemo is              consequential, exemplary, or
available via anonymous FTP at       incidental damages, including
ftp.rsch.oclc.org in the             lost profits, even if it has
pub/SiteSearch/z39.50_client_api     been advised of the possibility
directory.  A copy of this           thereof; and :2) user will
article, all the source code and     indemnify and hold OCLC harmless
user documentation for the           from any claims arising from the
Client API can also be found in      use of the Software by user’s
that directory.                      sublicensees.
The BER utilities used by the        User shall cause the copyright
notice of OCLC to appear on all
copies of Software, including
derivative works made therefrom.