Building A Z39.50 Client Ralph LeVan OCLC Online Computer Library Center Inc. 6565 Frantz Rd. Dublin, OH 43017 email: rrl@oclc.org 1. Abstract a simple tool for displaying The core functionality for a USMARC records. Finally, I will Z39.50 Client Application is wrap all these tools up in a described. This core simple Z39.50 client (zdemo). functionality consists of This article is intended Connection, Initialization, primarily for implementors. It Search, Present and is sprinkled liberally with C Disconnection. A Z39.50 Client code fragments. The complete API is described which provides source code is available at the core functionality. Also OCLC’s anonymous FTP site. (See included are brief descriptions the section on Source Code of TCP/IP, the abstract syntax Availability at the end of the ASN.1, BER records and USMARC article.) records. Code for implementing the Client API, TCP/IP access, 2. The Z39.50 Standard encoding/decoding BER records and decoding USMARC records is 2.1 Who Developed It? freely available. The Z39.50 standard was initially developed in the 1. Introduction library community. It was built Z39.50, the ANSI/NISO to satisfy a requirement to Information Retrieval Protocol, search and retrieve USMARC- is perceived by potential formatted bibliographic records. implementors as being difficult Those roots still show today: to implement. I will the core attribute set for demonstrate that this is not so Z39.50 (which includes the list by developing a Z39.50 client of types of things that can be during the course of this searched for) is named bib-1 and article. The code produced, the most widely interoperable while copyrighted, is freely record syntax is still USMARC. available for anyone to use. However, the standard has grown In this article, I will stick to considerably beyond the original the “core” functionality of modest requirements. Today Z39.50; features that are widely there are organizations using implemented and have the Z39.50 to deliver full-text greatest chance of documents based on natural interoperability. You will language queries. Other learn how to initialize a Z39.50 organizations support complex session, how to do searches chemical structure searching and using simple Boolean operators display. (type-1 queries) and how to retrieve USMARC and simple text 2.2 Who Maintains It? (SUTRS) records. To do this, I The Z39.50 standard started life will show you how to build a as the product of a standards Z39.50 Client Application committee. The committee Program Interface (API) which considered its work complete will allow you to embed Z39.50 with the successful balloting of client functionality in your the original 1988 version of the applications. I will show you standard. At that point a how to build Z39.50 messages and Maintenance Agency was appointed how to send and receive them by the National Information using standard TCP/IP socket Standards Organization (NISO) protocols. I will also give you and the original committee was disbanded. Members of the and multiple record retrieval Z39.50 committee met requests against the same result occasionally to discuss possible set. It also allows the client implementation of the standard and server to negotiate and in 1990 the Z39.50 behavior, such as the kinds of Implementors Group (ZIG) was services it needs, and to have founded. Today, changes to the that negotiation persist for the standard are developed jointly duration of the session. In by the ZIG and the Maintenance HTTP, much of the message Agency. Because the standard is traffic from the client contains being enhanced by real descriptions of preferred server implementors, the standard now behavior that needs to be reflects their real-world repeated with every transaction. requirements. In its simplest form, Z39.50 is a synchronous protocol. That 2.3 Where Can I Get It? is, the client sends a message The Maintenance Agency for the to the server and waits for the Z39.50 standard is the Library server to respond. The client of Congress. It maintains an that is developed in this anonymous FTP server at article (zdemo) will use this ftp.loc.gov where many documents form. It is possible to related to Z39.50 are available. negotiate much more complex Among those documents is the behavior. The client can have latest version of the standard. multiple outstanding requests to Paper copies of the standard can the Z39.50 server and the Z39.50 be purchased directly from NISO. server can interrupt those Contact them by phone at (800) client requests with requests of 282-NISO. its own that must be responded to before the original client 3. Z39.50 Overview request can be completed. The Unlike other Internet protocols Client API will not negotiate such as HTTP or WAIS, Z39.50 is for that functionality, but it a session oriented protocol. can be readily extended to That means that a connection to provide it. a Z39.50 server is made and a persistent session is started. 4. Z39.50 Messages The connection with the server There are two logical parts to is not closed until the session the definition of Z39.50 is completed. Session oriented messages (called Protocol Data applications are often called Units or PDU’s in the standard). “stateful” applications and First is the definition of the transaction oriented content of the messages and applications are often called second is the encoding rules for “stateless”. converting the logical content A session oriented protocol is into a physical message that can considerably more efficient than be transmitted. In Z39.50, the a transaction oriented protocol messages are defined in the that requires that the Abstract Syntax Notation 1 connection with the server be (ASN.1) grammar and the encoding reestablished with every rules are defined by the Basic message. Session orientation Encoding Rules (BER). also allows clients iterative refinement of search result sets 4.1 Defining The Message: Abstra might be a query in one part of ASN.1 is an ISO standard (ISO a record and a count in another. 8824) for defining the content The meaning of the tag is of messages. It is used to defined by its context. define all the ISO protocol For example, the ASN.1 messages and is used in the definition ReferenceId ::= [2] Internet world to define Simple IMPLICIT OCTETSTRING defines a Network Management Protocol constructed data type named (SNMP) messages. ASN.1 is a ReferenceId, whose tag is 2. very rich language. What The type of tag was not follows is a simple description specified and defaults to of ASN.1; seek a higher CONTEXT. The ReferenceId is authority for a more definitive composed of the atomic data type description. OCTETSTRING. The IMPLICIT in ASN.1 defines records as being that statement says that the tag composed of combinations of for the OCTETSTRING must not be atomic and constructed data included inside the ReferenceId. types. The atomic data types If IMPLICIT had been omitted are things like INTEGER and from the above definition (i.e., BITSTRING. You will recognize ReferenceId ::= [2] OCTETSTRING) them in ASN.1, because they are then both the context tag ([2]) usually in capital letters. and the UNIVERSAL tag Constructed data types are ([UNIVERSAL 4]) would have been things like Queries and Options. encoded in the message. Thus, They always begin with an the use of the IMPLICIT keyword initial capital letter. in the definition allows for All data types have a number smaller encodings. (usually called a tag) assigned ASN.1 includes constructs for to them. The tags for atomic grouping data types together. data types are assigned by the These constructs include CHOICE BER encoding rules. The tags (pick one of the things that for constructed data types are follows), SEQUENCE (the things assigned in the ASN.1 where they that follow must be provided in are defined and are specified the order specified) and SET inside square brackets. (the things that follow can be Because tags are simply numbers, provided in any order.) there is the possibility the two applications will choose the 4.1.1 EXTERNAL’s, OBJECT same tags to mean the different ID’s and ISO Registration things. One possible way to ASN.1 allows the developer to avoid this would be to reserve specify that a constructed ranges of tags for ASN.1 data datatype being referenced is not types. Instead, ASN.1 defines defined in the current body of four types of tags: UNIVERSAL, the ASN.1. The keyword for APPLICATION, CONTEXT and specifying this is EXTERNAL. PRIVATE. UNIVERSAL tags are EXTERNALs are used throughout expected to be recognized the Z39.50 standard. They are wherever they are used in a the mechanism used to provide record. (i.e., a tag of extensibility and flexibility in [UNIVERSAL 8] is always an the standard. Saying that a INTEGER.) CONTEXT tags can have field is defined externally to different meanings in different the standard allows a company to contexts. A tag of [CONTEXT 1] use private data in that field that only their clients and Z39.50 messages are encoded servers will understand. (This according to the Basic Encoding is an interoperability problem Rules (BER), ISO 8825. BER for other clients and servers, defines records as being but there are often good reasons composed of a triple of values: for wanting to do this.) It a tag, a length and a value also allows the ZIG to agree on (TLV). The tag portion of the extensions to the standard triple includes bits that simply by agreeing on the specify the type of tag contents of fields defined (UNIVERSAL or CONTEXT) and EXTERNAL to the standard. whether the value portion of the EXTERNALs provide flexibility by tag is primitive data or is allowing Object Identifiers to composed of more TLV triples. be used to make selection from a This recursive definition of a broad range of possible choices. record allows for the For example, RecordSyntax is construction of arbitrarily defined as EXTERNAL in Z39.50, complex hierarchical records. which means that any of a number I know of two ways to construct of possible choices (e.g., BER records. The first way is USMARC, SUTRS, GRS) can be with an ASN.1 compiler. The specified. compiler reads the ASN.1 EXTERNAL objects, when they definition and produces source arrive in a message, have an code in a programming language OBJECT IDENTIFIER. The OBJECT such as C or C++. The IDENTIFIER provides an programmer can then fill in a identification number that structure in that language with allows the message decoder to the values that are to be understand the contents of the encoded and the code produced by object. OBJECT IDENTIFIERS are the ASN.1 compiler reads that represented symbolically as structure and builds the BER strings of numbers, separated by record. The strong advantage of periods (‘.’). 1.2.840.10003 is this method is that you’re the OBJECT IDENTIFIER for the reasonably confident that the Z39.50 standard itself. resulting BER record does in Object Identifiers are fact encode the ASN.1 properly. controlled by the International OCLC chose not to use an ASN.1 Standards Organization (ISO). compiler, but instead produced Object Identifiers would have no utilities to construct the BER value as identifiers if they records directly. OCLC has made were not unique. Normally, ISO those utilities publicly issues Object Identifiers, but available, as well as the Z39.50 once ISO issued an Object Client API. The reasons for Identifier for Z39.50, the choosing not to use an ASN.1 Z39.50 Maintenance Agency was compiler stem mostly from the authorized to issue subordinate maturity of the compilers when Object Identifiers for Z39.50 OCLC first started implementing objects. Thus, all Z39.50 Z39.50 in 1988. Those reasons Object Identifiers begin with are given in greater detail in the Object Identifier for the the documentation accompanying standard itself. the BER utilities. Directions for getting the BER utilities 4.2 Encoding the Message: The can be found at the end of this Basic Encoding Rules article. 4.2.1 The BER Utilities The BER utilities allow the and SearchResponse() and programmer to build a tree PresentRequest() and structure that describes the PresentResponse(). The request contents of the record, instead routines take parameters that of filling in a record-specific correspond to the fields in the structure and having a record- Z39.50 requests. The response specific routine construct the routines take a BER record as BER record. Each node in the their only parameter and return tree contains the tag for the a pointer to a response-specific data it describes and either a structure with fields in it that pointer to data or a pointer to correspond to the fields in the another node in the tree. A Z39.50 response. The encoding node in the tree is a C and decoding of the requests and structure of type DATA_DIR. responses will depend on the BER Routines are provided to utilities. construct the tree and to encode the primitive data types such as 6. Establishing the Z39.50 BITSTRING and INTEGER. Once the Connection tree is built, a utility routine The vast majority of Z39.50 (bld_rec()) is called to servers are accessible via construct the BER record. TCP/IP, so our client will need When a BER record is received to know how to connect to a and decoded by an application, server via TCP/IP. The usual one of these tree structures is way to perform TCP/IP functions produced. To examine the is with “sockets”. Sockets contents of the BER record, provide the tools and structures simply traverse the tree. This for establishing TCP/IP puts the interpretation of the connections and for sending and record much more in the hands of receiving messages. Sockets the programmer. have some of the characteristics of files, in that they are 5. ZDEMO and the Client API opened, read from and written Zdemo is going to be a simple to. In the UNIX world, the client. It will establish a relationship between files and connection to the Z39.50 server, sockets is very close; it is send an InitRequest and wait for less so in the MS Windows world. an InitResponse. It will then For our purposes, only the sit in a loop waiting for the simplest features of sockets user to enter searches, record will be used. We will need to display requests or a Quit know how to convert a host name command. Commands will consist into an IP address, open a of a single letter (S for socket, send a message, wait for Search, D for record Display and a return message, determine how Q for Quit.) Arguments to the many bytes of message are commands can follow the command waiting, read a message and and the default command is close the socket. The complete Search, when the command is code for opening and closing a omitted (i.e., S DOG and DOG are connection to a Z39.50 server is equivalent commands). contained in irpconn.c at OCLC’s The Client API is nearly as anonymous FTP site. (See the simple. It consists of the section on Source Code routines InitRequest() and Availability at the end of this InitResponse(), SearchRequest() article.) The code for writing a Z39.50 request, waiting for the response and then reading structure which contains data the response is contained in that will be used in creating doirp.c. the socket. If gethostbyname() Windows Sockets are similar fails, then connect() will write enough to standard UNIX sockets a diagnostic message and return that I have provided support for a failure indication. them as well. Sprinkled Next, the socket is created. throughout irpconn.c and doirp.c This is done by calling you will see fragments socket(), telling it that the surrounded with “#ifdef WINDOWS” client will be using it to and “#endif”. These sections communicate via TCP/IP. If contain the support for Windows socket() fails, then connect() Sockets. will write a diagnostic message The routine to make the and return a failure indication. connection is named connect(). Next, the connection to the It gets passed the name of the server is established by calling host machine for the Z39.50 connect(), passing it the socket server and the port where the and a structure containing the server is listening. The IP address and port number. If standard port for Z39.50 is 210, connect() fails, then connect() but few of the servers actually will write a diagnostic message listen at that port, so zdemo and return a failure indication. (our client program) will need If it succeeds, then connect() to accept the port number as an returns a pointer to the socket argument. In turn, zdemo will and is done. A TCP/IP get the host name and port as connection has been made to the arguments that are passed to it, Z39.50 server. though, with modification, zdemo could read this information from 6.1 ZDEMO a configuration file. So far, our source code for For MS Windows applications, zdemo looks like this: the first step is to initialize winsock.dll, the dynamic link library that contains the sockets routines. This is done by calling WSAStartup(), passing it the lowest acceptable version number of the Windows Sockets standard. In our code, zdemo will ask for version 1.1. If either there is no winsock.dll available or it does not support version 1.1 of the Windows Sockets standard, then connect() will write a diagnostic message and return a failure indication. The next step in establishing the connection will be to convert the host name into an IP address. This is done by calling gethostbyname(), passing it the host name. If successful, it will return a void *socket; int main(int argc, char *argv[]) { char password[20], server_name[100], userid[20]”, *usage=“usage: zdemo -h[hostname] [-pport#] “ “[-uuserid/password]”; int i, port=210; get_args(argc, argv, server_name, &port, userid, password); printf(“Talking to Z39.50 server on port %u of host ‘%s’\n”, port, server_name); /* initialization code */ if( (socket=irp_connect(server_name, port))==0 ) { printf(“unable to connect to server %s\n”, server_name?server_name:””); exit(1); } } 7. bitstring with a bit turned on Initialization for each version of the standard The first Z39.50 service is that the client understands. Initialization. The client and The server responds with a server use this service to similar bitstring. The highest negotiate the other Z39.50 version of the standard that the services and options that are to client and server have in common be provided. They also get to is the version in effect for the negotiate the preferred message session. If the client and size and exceptional record server have no supported version size. In addition, the client in common, then the server will can provide a userid and return an empty bitstring and password. fail the InitRequest. The client can deduce the reason for 7.1 Negotiation the failure from the empty Z39.50 supports a simple Version bitstring in the negotiation mechanism. The InitResponse. client proposes values in the InitRequest and the server 7.1.2 Options responds with the actual values. The client and server negotiate If the client is unhappy with the services and options that the returned values, its only they want through the Options option is to close the session. bitstring. These are specified by turning on the appropriate 7.1.1 Version bits in the bitstring. All of There are now three versions of the Z39.50 services can be Z39.50. Version 1 was defined negotiated; that is, the client in 1988. It was implemented at can request that they be made only a few sites and was available by the server. The completely superseded by Version server can deny these services 2, which introduced ASN.1 and by turning off the appropriate BER encoding to the standard. bit in the bitstring when it is Version 2 was defined in 1992. returned in the InitResponse. The 1995 version of the standard Options that can be negotiated defines both Version 2 and include such things as support Version 3. The reason for this for named result sets or is that the ZIG wanted Version 3 concurrent operations. to be backward compatible with Version 2 and wanted a single 7.1.3 Message Sizes document that defined both. The The client also specifies a ZIG did not want developers to Preferred-message-size and an have to have two documents to Exceptional-record-size. The develop a server capable of Preferred-message-size will be interoperating with either exceeded by the server only when Version 2 or Version 3 clients. the client requests a single So, both versions are defined in record and its size exceeds the Z39.50-1995 and all the Preferred-message-size, but not compatibility rules for the two the Exceptional-record-size. versions are defined there as The purpose of this is to allow well. the client to control the The version of the standard that maximum size of a normal message the client wants to use is one from the server, but to allow it of the things that is to occasionally accept large negotiated. The client sends a records. The server may respond to the char *id, proposed values with alternative char *password); values in the InitResponse. 7.3.1 Encoding the Request 7.2 Other Initialization The easiest way to understand Parameters the InitRequest() routine is to The client can provide a userid walk through it line by line, and password in the InitRequest showing the ASN.1 that is being and can also provide information encoded and providing identifying the client software commentary. The C code is itself. Lastly, the InitRequest indented and in bold. The ASN.1 contains a placeholder for is in italics and the commentary information defined externally is in normal text. to the standard. Normally when I code using the All Z39.50 request definitions BER utilities, I use include an optional referenceId. preprocessor variables to hold This is an arbitrary string of the tag values. The bytes that the client can send preprocessor variable that the server is required to InitRequest would be defined as return with the response. Its 20. I do this for readability. intent is to help the client But in the code below, the identify the returning response commentary explains what is in an asynchronous message going on in the code, and I want environment. While referenceId you to be able to see the can hold any number of bytes, correlation between the code and the Z39.50 Client API allows the ASN.1, so I am omitting the only a C language long value to preprocessor variables. If you be used. get the code from our FTP server, you will see proper 7.3 The InitRequest preprocessor variables instead The InitRequest is created by a of constants. call to the InitRequest() routine. It takes a referenceId, a preferredMessageSize, an exceptionalRecordSize, an id and a password as parameters. It does not accept options as a parameter, since the Client API always negotiates for the most functionality that it can handle. InitRequest() returns a pointer to an allocated area in memory that contains the BER encoded InitRequest. The prototype for InitRequest() looks like this: unsigned char *InitRequest( long referenceId, long preferredMessageSize, long exceptionalRecordSize, CHAR *Init_Request(long referenceId, long preferredMessageSize, long exceptionalRecordSize, char *id, char *password, long *len) /* referenceId has no particular meaning to the Client API. You can put whatever value you want into it, and it will be returned in the response. id and password can be either NULL or “”. len will contain the length of the encoded request when InitRequest() returns. */ { static char *protocol_version=”yy”; /* versions 1 and 2 */ /* When you want Version 2, you have to ask for Version 1 too. (This is to allow interoperability with ISO 10163). */ static char *options_supported=”yy”; /* search and present only */ /*****************************************************/ /* build an IRP Init request */ /*****************************************************/ dir=dmake(20, ASN1_CONTEXT, 30); initRequest [20] IMPLICIT InitializeRequest, /* Make a DATA_DIR tree for assembling the parts of our message. The first two arguments specify the tag and tag type for the root of our tree. They correspond to the first tag in the ASN.1 definition of an InitRequest. The 30 tells dmake() that we expect to see 30 nodes in our tree. If that number is exceeded, then the BER utilities will automatically increment the size of the tree by that amount. dir, the value returned by dmake(), is a pointer to the root of the tree. */ if(referenceId) daddchar(dir, 2, ASN1_CONTEXT, (CHAR*)&referenceId, sizeof(referenceId)); referenceId ReferenceId OPTIONAL, /* ReferenceId is defined later in the standard as: ReferenceId ::= [2] IMPLICIT OCTETSTRING If a non-zero referenceId has been provided, then add it to the request. The first argument to daddchar() is a pointer to the parent of the field being added. The next 2 arguments are the tag and tag type of the referenceId. The last two arguments are a pointer to the referenceId and its length. The referenceId is being passed to the server as a string of bytes (an OCTETSTRING in ASN.1.) */ daddbits(dir, 3, ASN1_CONTEXT, protocol_version); protocolVersion ProtocolVersion, /* protocolVersion is defined later in the standard as: protocolVersion ::= [3] IMPLICIT BITSTRING daddbits() encodes ASN.1 BITSTRINGs. Here, we’re encoding the ProtocolVersion. */ daddbits(dir, 4, ASN1_CONTEXT, options_supported); options Options, /* Options is defined later in the standard as: Options ::= [4] IMPLICIT BITSTRING */ daddnum(dir, 5, ASN1_CONTEXT, (CHAR*)&preferredMessageSize, sizeof(preferredMessageSize)); preferredMessageSize [5] IMPLICIT INTEGER, /* daddnum() encodes ASN.1 INTEGERs. Here, we’re encoding the preferredMessageSize. */ daddnum(dir, 6, ASN1_CONTEXT, (CHAR*)&exceptionalRecordSize, sizeof(exceptionalRecordSize)); exceptionalRecordSize [6] IMPLICIT INTEGER, if(id && *id) { char *t; DATA_DIR *subdir; /* We’ll use subdir to keep track of subtrees in our DATA_DIR tree. */ int len=strlen(id)+1; /* We need to figure out how long the id and password are and then add 1 for the ‘/’ separator character. */ if(password && *password) len+=strlen(password)+1; else password=””; t=(char*)dmalloc(dir, len+1); /* dmalloc() malloc’s space that is freed automatically when the DATA_DIR tree is freed. In this case, the “+1” is for the NULL that sprintf() will put at the end of the string. */ strcpy(t, id); if(password && *password) sprintf(t+strlen(t), “/%s”, password); subdir=daddtag(dir, 7, ASN1_CONTEXT); idAuthentication [7] ANY OPTIONAL, /* daddtag() adds a tag without any data. It returns a pointer to the node that was added to the tree to hold the tag. */ daddchar(subdir, ASN1_VISIBLESTRING, ASN1_UNIVERSAL, (CHAR*)t, len-1); /* The ANY is recommended later in the standard to be encoded as a CHOICE, one option of which is: open VisibleString, Add the id and password with an IMPLICIT ASN.1 data type of VISIBLESTRING. */ } daddchar(dir, 110, ASN1_CONTEXT, (CHAR*)”1995”, 4); implementationId [110] IMPLICIT InternationalString OPTIONAL, daddchar(dir, 111, ASN1_CONTEXT, (CHAR*)”OCLC IRP API”, 12); implementationName [111] IMPLICIT InternationalString OPTIONAL, daddchar(dir, 112, ASN1_CONTEXT, (CHAR*)”1.0”, 3); implementationVersion [112] IMPLICIT InternationalString OPTIONAL, /* Tell the server what kind of client is talking to it. */ return bld_rec(dir, len); /* bld_rec() malloc’s the amount of space needed to hold the BER record, assembles the BER record in that area and returns a pointer to that area, which is finally returned by InitRequest(). */ } 7.3.2 of them have a message. To do 7.3.2 Transmitting the this, the application has to Request construct a list of sockets to Zdemo transmits the BER requests be waited on. Two preprocessor by calling doirp(), passing it macros are used to construct the the pointer to the BER request list: FD_ZERO() and FD_SET(). and the pointer to the socket FD_ZERO() initializes an empty returned by connect(). Doirp() list, and FD_SET() adds sockets sends the request to the Z39.50 to the list. After the list is server, waits for the response built, the routine select() is to the request from the server called, passing it the list of and returns a pointer to that sockets to be waited on. The response. select() call sits inside a Doirp() starts by determining while loop; sometimes select() the length of the request. It returns with an indication that does this by calling the BER it has not received anything utility asn1len(). It uses that yet. length to drive a while loop After doirp() has gotten the where the length represents the indication that a message is number of bytes of the request available, it calls ioctl() to waiting to be sent. determine the amount of data Doirp() sends data to the server that has been received. It then by calling the socket routine calls recv() to read the data. send() and passing it the It passes recv() the socket, a socket, a pointer to the request pointer to a buffer to hold the and the number of bytes to send. incoming message, and the number Send() returns the number of of bytes it wants to read (which bytes actually sent. The it got from ioctl().) Recv() pointer to the request is returns a count of the number of incremented by that amount and bytes that it actually read. If the length is decremented by that count is zero, then there that amount. If the length goes was probably some failure in the to zero, then the complete connection and recv() will print request has been sent and zdemo an error message and return with falls out of the while loop. If an error indication. send() indicates an error, then Often, TCP/IP has to break large doirp() prints an error message messages into smaller messages and quits, returning an error to transmit them. That means indication. that when doirp() gets a Next, doirp() needs to wait for message, it might be the first the response from the server. of many messages that comprise a The socket utilities are complete Z39.50 response. The prepared to handle much more BER utilities provide a routine, complicated tasks than zdemo is IsCompleteBER(), which gets requiring of them, so some of passed a pointer to a buffer the tools that it uses seem with a BER encoded message and a overly complicated for this count of the number of bytes in purpose. The mechanism for the buffer. IsCompleteBER() waiting for a message is one of returns an indication of whether those tools. The socket a complete message is in the utilities allow an application buffer. If the message is to have many active sockets open complete, then IsCompleteBER() and allow you to wait until any also returns the actual size of the message, which might be less than the amount of data in the containing information from the buffer, since it is possible for InitResponse. more than one message to have The first step in decoding any been received at one time. Z39.50 response is to decode the If the message was not complete, BER encoded message. The BER then IsCompleteBER() also utility bld_dir() does this. returns the number of bytes Its job is to build a DATA_DIR remaining to be read to complete tree that reflects the structure the message. Sometimes of the message. Typically, to IsCompleteBER() reports that the decode the message, we’ll just message is not complete and traverse the tree. I use a for there are zero bytes waiting to loop to do this. I set the loop be read. This means that variable to the first child in IsCompleteBER() cannot determine the tree and loop through all the remaining length and doirp() its siblings. Inside the loop I should just wait for more data use a switch statement to test to arrive. Either way, doirp() for the possible tags that might sits in a loop, reading more have been in the message. data, until IsCompleteBER() Again, as with the reports that a complete message InitRequest(), the easiest way has arrived. When that happens, to understand the InitResponse() doirp() returns a pointer to the routine is to walk through it buffer containing the message. line by line, showing the ASN.1 At this point, zdemo has sent that is being encoded and our InitRequest and received an providing commentary. The C InitResponse. code is indented and in bold. The ASN.1 is in italics and the 7.4 The InitResponse commentary is in normal text. I The most important field in an have also repeated the practice InitResponse is the result of replacing preprocessor field. It tells the client variables with constants to whether its InitRequest has been emphasize the correspondence accepted by the Z39.50 server. between the C code and the If it has a non-zero value, then ASN.1. a Z39.50 session has been successfully established. If it is zero, then the Z39.50 server has rejected our session. Unfortunately, there is no explicit mechanism for the server to tell why it is rejecting our InitRequest. We’ll have to deduce the reason from the other values returned in the InitResponse. 7.4.1 Decoding the Response The Z39.50 Client API provides the routine InitResponse() to decode the InitResponse from the Z39.50 server. It is passed a pointer to the InitResponse and returns a pointer to a structure INIT_RESPONSE *InitResponse(CHAR *response) { DATA_DIR far *subdir; INIT_RESPONSE *init_response; if(!response || !bld_dir(response, dir)) return NULL; /* If a response was not provided or we were unable to decode the response, then return a failure indication. The dir that is being passed to bld_dir() is the same one that was created in InitRequest() to hold the message being built there. Dir is a global variable and will be used by all the request and response routines. */ if(dir->fldid!=21) return NULL; initResponse [21] IMPLICIT InitializeResponse, /* If the response wasn’t an InitResponse, then return a failure indication. The tag in the root node of the tree is the message tag. */ if( (init_response=(INIT_RESPONSE*) calloc(1, sizeof(INIT_RESPONSE)))==NULL) return NULL; /* If we can’t allocate space to hold the structure describing the InitResponse, then return a failure indication. */ for(subdir=dir->ptr.child; subdir; subdir=subdir->next) /* This is our driving loop. The loop variable is initialized to point at the first child off the root. As long as there is such a child, process it and then point at its sibling. */ switch(subdir->fldid) /* Test for the value of the tag in this node. */ { case 2: referenceId ReferenceId OPTIONAL, /* ReferenceId is defined later in the standard as: ReferenceId ::= [2] IMPLICIT OCTETSTRING */ memcpy((char*)&init_response->referenceId, (char*)subdir- >ptr.data, (int)subdir->count); /* Just save the referenceId in the INIT_RESPONSE structure. Only the calling application will be interested in it. */ break; case 4: options Options, /* Options is defined later in the standard as: Options ::= [4] IMPLICIT BITSTRING */ init_response->options=dgetbits(subdir); /* dgetbits() decodes encoded BITSTRINGs. It returns a character string with a ‘y’ for every bit that was turned on, and a ‘n’ for every bit that was turned off. */ break; case 5: preferredMessageSize [5] IMPLICIT INTEGER, init_response->preferredMessageSize=dgetnum(subdir); /* dgetnum() decodes encoded INTEGERs. It returns a long, which we will save in the INIT_RESPONSE structure. */ break; case 6: exceptionalRecordSize [6] IMPLICIT INTEGER, init_response->maximumRecordSize=dgetnum(subdir); break; case 12: result [12] IMPLICIT BOOLEAN, init_response->result = (int)dgetnum (subdir); /* BOOLEANs are encoded as INTEGERs, so dgetnum() is used to decode them. A non-zero value means TRUE and a zero value means FALSE. */ break; } } return init_response; } 7.5 ZDEMO The following code gets added to zdemo: INIT_RESPONSE *init_response; long len; unsigned char *request, *response; /* Build the InitRequest. */ request=InitRequest(0, 16384, 500000L, userid, password, &len); /* Send the request and get the response. */ response = do_irp(request, socket); if(!response) /* If we did not get a response, then quit. */ { printf(“unable to send init request\n”); exit(2); } /* Decode the response. */ init_response=InitResponse(response); if(!init_response || !init_response->result) { /* If the response was not decodable, or if the InitRequest failed, then quit. */ printf(“init failed\n”); exit(3); } result set. Every query can 8. Searching have a different result set Z39.50 allows highly specific name, allowing the client to searching of databases. The reference any number of previous specificity of Z39.50 queries is result sets. But few, if any, one of the standard’s great servers allow an unlimited strengths. Other protocols, number of result sets. When a such as WAIS or Gopher, support client has exceeded the number “magical” searching. The user of supported result sets, the enters some kind of free text server might delete old result query and “magic” happens. The sets arbitrarily. same query on another server In fact, some servers allow a might produce completely client to have only one result different results, because set. In that case, they do not different “magic” happened. The really support named result user is at a loss to determine sets. To get around the why the records were retrieved. apparent contradiction of the The user is also unable to client being able to name result control the search. The user is sets and the server being unable unable to specify that she wants to support named result sets, to find records where the word the ZIG agreed on the result set SMITH appeared in the title, but name “default”. This is the not as an author. These result set name that must weaknesses have all been accepted by servers that do not overcome with Z39.50. otherwise support named results Another strength of Z39.50 sets. If all queries sent to queries is the persistence of such a server are named their results for the duration “default”, then the client has of the Z39.50 session. With only one result set that it can other protocols, the results of refer to. the query must be sent Unfortunately, in Version 2 of immediately to the client. the standard, the client can not That’s fine, if the database is tell whether the server will small and the result sets are allow result set names other always small. When the than “default”. The only way to databases are large, that is not tell is to use a different practical. The user needs the result set name. If the server ability to fetch and examine cannot support named result some of the records and still be sets, it will fail the search able to ask for other records and return an error code later. Better yet, if the indicating the problem. The result set is large, the user client will then know that would like to be able to apply “default” will be the only restrictors to the result set acceptable result set name. In and produce a smaller, hopefully Version 3, support for named more pertinent, result set. result sets is one of the options that can be negotiated 8.1 Result Sets at initialization time. In order to reference a result If the client uses the same set after it has been produced, result set name twice, the the result set must have a name. server should replace the In Z39.50, the client provides previous result set of the same the name of the result set with name with the new result set. the query: the client names the To keep that from happening accidentally, the client is collection of attributes. required to set a flag in the Implementors are free to invent SearchRequest indicating that their own attribute sets, but the result set is to be the developers provided a replaced. starter set of attributes and packaged them in an attribute 8.2 Attributes set named bib-1. In “magic” searching systems, Attribute sets are identified by query terms are unqualified. an Attribute Set ID, which is That is, the user types in a just an Object Identifier. All term, but provides no extra Attribute Set ID’s begin with information about the term to 1.2.840.10003.3; the Attribute indicate its semantic meaning. Set ID for the bib-1 attribute Systems that provide more set is 1.2.840.10003.3.1. specific searching usually The bib-1 attribute set contains provide the concept of an 6 types of attributes: Use, “index”. So the user can say Relation, Position, Structure, that the term provided should be Truncation and Completeness. considered to be an author or a These attributes are explained word from a title. But this is in great detail in the bib-1 only a single piece of attributes documents, available qualifying information that can at the Library of Congress’ FTP be provided with the term. site. The only attributes The Z39.50 developers wanted a discussed in this article will richer mechanism than simply be Use and Structure. indexes. They wanted to provide Attribute types in an attribute many dimensions of qualification set are identified by a number. to the term. The word they In the bib-1 attribute set, Use chose to describe these is attribute type 1 and additional qualifications on a Structure is attribute type 4. term is “attribute”. A term can The values that an attribute can have many attributes. One of have are also identified by a those attributes could be Use, number. This means that it takes which roughly corresponds with two numbers to specify an indexes. The Use attribute attribute for a term: the allows the client to specify how attribute type and the attribute the term would have been used in value. For example, every Use the records to be retrieved. attribute, such as AUTHOR or For example, the term was Used TITLE, has a number. (AUTHOR is as an AUTHOR or TITLE. Another 1003 and TITLE is 4.) These attribute is Structure; the term numbers are specified in the is supplied according to a Attribute Sets appendix of the particular structure. The standard. At last count, there structure might be that the term were 98 different Use attributes is a WORD or a PHRASE. specified, and that list can be extended at any time. 8.2.1 Attribute Sets Since the developers understood 8.3 Query Terms and Attributes that they could not predict all Terms can have one or more the attributes that implementors attributes associated with them. would want, they created the In the ASN.1 for the standard, idea of an attribute set. An this association is called attribute set defines a AttributesPlusTerm and consists of an AttributeList and a Term. sent as type-0 queries. An AttributeList is defined as a Type-100 queries use the query SEQUENCE of AttributeElement grammar from the ANSI/NISO which are in turn defined as a Common Command Language pair of INTEGERs consisting of (Z39.58). This grammar is attributeType and closely related to, and has the attributeValue. These pairs of same problems as, the ISO Common numbers are exactly the numbers Command Language. described above. Type-101 queries are an In Version 2, all the attributes extension of type-1 queries to in the query have to come from support proximity searching. the same attribute set. During With Version 3 of the standard, the development of Version 3, it type-1 queries are identical soon became clear that this was with type-101; but they remain a problem. How could the user distinct in Version 2. formulate a query asking about Type-102 queries are still being AUTHORs (a bib-1 Use attribute) defined. They are intended to and BOILINGPOINTs (a Use support some of the features of attribute from an chemical query grammars that support attribute set)? In Version 3, ranking. the attribute set ID can be specified for every 8.5 Reverse Polish Notation AttributeElement. That means Queries (type-1) that you can mix attributes from Type-1 queries are called a number of attribute sets. Reverse Polish Notation (RPN) queries. Reverse Polish 8.4 Query Grammars Notation is a way of Z39.50 defines several query representing Boolean queries by grammars, each one identified by specifying first the operands a number. Type-0 queries are and then the operator. Normal for private query grammars. query grammars let you specify Sometimes clients and servers an operand, then an operator and from the same organization another operand. This is called prefer to use that an infix notation. The problem organization’s own query with infix notations is that you grammar. At OCLC, a number of end up having to use parentheses our clients know how to use the to specify the order of query grammar of our database evaluation of the operators and engine and pass those queries to operands. Reverse Polish the Z39.50 server as type-0 Notation does not have that queries. problem. Type-1 queries are the only The search (DOG OR CAT) AND widely accepted queries. HOUSE would be expressed as DOG Support for them is mandatory in CAT OR HOUSE AND in Reverse Z39.50. Type-1 queries are Polish Notation and the search described in more detail later. DOG OR (CAT AND HOUSE) would be Type-2 queries use the query expressed as DOG CAT HOUSE AND grammar from the ISO Common OR in RPN. The query is Command Language (ISO 8777). evaluated left to right. Every This grammar has severe time you encounter an operator extensibility limitations and you process the two operands to probably should not be used. the left and replace the ISO CCL queries can always be operator and operands with the search request. Unfortunately, result of evaluating them. In this is another feature that the first example, the OR is cannot be determined at associated with DOG and CAT. initialization time. One way After DOG OR CAT is evaluated, the client can find out if the the result is put back into the server supports multiple query. The AND then has that database names is to try it and result and HOUSE as its see if a diagnostic is returned, operands. but the lack of a diagnostic Reverse Polish Notation queries does not necessarily mean that can be easily represented as all the databases were searched. trees, with the operators as Some of the servers just ignore roots and branches and the the extra database names. This operands as leaves. That is the feature is not available in the sense in which type-1 queries Client API. are Reverse Polish Notation. They are not text strings as in 8.7 Piggy-backed Presents the examples above. They are It is possible to request that trees defined recursively in records be returned ASN.1. A type-1 query can automatically with the either be an operand or an SearchResponse. This is called operator with two operands. An a piggy-backed Present. Piggy- operand can either be a term or backed Presents are supported in a type-1 query. This recursive the Client API but are not definition allows for supported by zdemo and are arbitrarily complex queries. beyond the scope of this We need some way to pass a query article. Zdemo will provide into our Z39.50 Client API. To hard-coded values for those do this, we’ll use real Reverse parameters in its call to Polish Notation. Terms will be SearchRequest(). optionally followed by a slash ‘/’ and then a Use attribute 8.8 The SearchRequest value. They can also be The SearchRequest is created by followed by an optional slash a call to the SearchRequest() and a Structure attribute value. routine. It takes a Terms can be surrounded by referenceId, a replaceIndicator, double-quotes. The following a resultSetName, a databaseName, are all examples of legal query a query, and a query_type. terms: DOG (no Use or Structure The referenceId is a C language attribute specified), DOG/21 long value and has the same (dog as a subject heading), meaning as in InitRequest(). DOG/21/2 (dog as a subject The replaceIndicator is an heading and a structure of WORD) integer and has either a zero or and “DOG HOUSE”/21/1 (dog house non-zero value for FALSE and as a subject heading and a TRUE respectively. The structure of PHRASE). resultSetName can be any character string. The 8.6 Database Names databaseName is a character The client must specify what string whose value is determined database or databases the server by the server. is to search. The Z39.50 The conversion of the query standard allows multiple parameter into a Z39.50 query is databases to be specified in a probably the trickiest code in the Client API. The query is produced; the searches are not passed as a character string, any more exciting. The code is but its evaluation is dependent provided if you want to examine on the query-type. If the query- it. type is 0, then the query is assumed to be in a private query 8.9 The SearchResponse grammar and is passed through to The SearchResponse is processed the Z39.50 server exactly as by SearchResponse() and it, like received by SearchRequest(). InitResponse(), takes the BER If the query-type is 1, then record returned by the Z39.50 SearchRequest() is expecting a server as its only parameter and string with a Reverse Polish returns a pointer to an Notation query in it. The terms allocated structure which can be surrounded with double- contains the fields of the quotes. This is important if SearchResponse. The prototype the term consists of multiple for SearchResponse() is: words, as in a phrase search. SEARCH_RESPONSE *SearchResponse( The term can also be followed by CHAR *response); an optional slash (‘/’) and a and the SEARCH_RESPONSE Use attribute value. The Use structure looks like this: attribute value can also be typedef struct followed by another optional { slash and a Structure attribute long referenceId; value. There is no default Use int searchStatus; attribute value and the default long resultCount; Structure attribute value is long resultSetStatus; WORD. long error_code; For example: to search for books char *error_msg; about slavery by Mark Twain, you } SEARCH_RESPONSE; could enter the search: The referenceId is the same one slavery/21 “twain, mark”/1003/1 provided to SearchRequest(). and searchStatus contains either a which asks for records with zero to indicate that the search “slavery” as a subject heading failed or a non-zero value to and “twain, mark” as an author indicate success. phrase. If searchStatus indicates that As in InitRequest(), the search succeeded then SearchRequest() returns a resultCount will contain the pointer to an allocated area in count of the number of records memory that contains the BER that satisfy the search and the encoded SearchRequest. value of resultSetStatus will be The prototype for undefined. A value of zero in SearchRequest() is: resultCount is not an indication unsigned char *SearchRequest( that the search failed, only long referenceId, that there are no records in the int replaceIndicator, database that meet the search char *resultSetName, criteria. char *databaseName, If searchStatus indicates that char *query); the search failed, then the I will not walk through the code value of resultCount is this time. You have already undefined and resultSetStatus seen BER encoded messages will indicate if there are any records available for retrieval. Typically resultSetStatus will contain the value 3 which indicates that there is no result set available, but other values are potentially available and defined in the standard. error_code and error_msg should contain values; otherwise they will contain 0 and NULL respectively. The values for error_code and error_msg are described in the Error Diagnostics appendix of the standard. 8.10 ZDEMO Before zdemo can generate a search, it needs a simple command processor. Remember that commands to zdemo are going to be single letters, so parsing the commands will be easy. Zdemo will need a loop for getting commands from the user. A command of ‘q’ or an end-of-file indication from the input stream will end the loop. Inside that loop, zdemo will test for a single letter command and if there is none, then it will assume that a search is being requested. It will then switch on the value of the command and call a routine to handle the command. Our driving loop looks like this: char cmd, input[1000]; while(gets(input)) { strlwr(input); if(input[0]) /* did we get any input? */ if(input[1]==‘ ‘) /* was the second character a blank? */ cmd=input[0]; else cmd=‘S’; /* assume that they want to search */ else cmd=‘ ‘; /* no command */ if(cmd==‘q’) break; /* exit the loop */ switch(cmd) { case ‘s’: /* explicit search command */ zsearch(input+2); /* +2 to skip command and blank */ break; case ‘S’: /* implicit search command */ zsearch(input); } } In addition, the routines that zdemo calls will need some clues about the behavior of the Z39.50 server. For instance, some servers will not accept any resultSetNames except “default”. Zdemo will be told this through arguments that are passed to it at startup time. In the case of the “default” resultSetName, zdemo will look for an argument of “-d” to indicate that it must use the “default” resultSetName. char resultSetName[20]; void zsearch(char *query) { long len; SEARCH_RESPONSE *search_response; unsigned char *request, *response; static int search_num=1; if (MustUseDefault) /* global variable */ strcpy(resultSetName, “default”); else sprintf(resultSetName, “Search%d”, search_num++) request=SearchRequest(0, TRUE, resultSetName, database_name, query, &len); response = do_irp(request, socket); search_response=SearchResponse(response); printf(“%ld records found.\n”, search_response->resultCount); if(search_response->searchStatus) printf(“Search Successful! :-)\n”); else { puts(“Search Failed! :-(“); printf(“Error_code=%ld, message=’%s’\n”, search_response- >error_code, search_response->error_msg ? search_response->error_msg : ”None provided”); if(search_response->error_code==22) { puts(“Must use ResultSetName of \”default\””); puts(“Resetting internal flags; please try again”); MustUseDefault=TRUE; } if(search_response->error_msg) free(search_response->error_msg); } free(search_response); free(response); } 9. the n’th record in a result set 9. Retrieval and always get the same record The Z39.50 implementors clearly from the same result set. saw retrieval as a weakness in To retrieve records from a Version 2 of the standard. Many result set, the client specifies of the enhancements in Version 3 the name of the result set and center around retrieval. the relative record number of Included in these enhancements the record in the result set. are the ability to ask for The first record in a result set specific parts of a record, to is record number 1. In the C ask about the contents of a programming languages the first record and to specify a record would naturally be record prioritized list of desired number 0, so it is important to record syntaxes. But, even remember that that is not true without these enhancements, here. Z39.50 supplies perfectly To ask for several records, the acceptable mechanisms for client can specify a single retrieving records. Since this relative record number for the article is concentrating on core first desired record and a count functionality, the Client API of the number of records to be will only use those retrieval returned. This only allows for features available in Version 2. a single list of adjacent Version 2 allows clients to ask records to be returned. With for a specific range of records Version 3 comes the ability to from a result set in full or specify multiple ranges of brief forms and to specify a records in a single request. single record syntax. The most This will allow the user to common record syntaxes are request the first, third and ten USMARC and SUTRS. USMARC is the thousandth records from a result record syntax used in the U.S. set and the client will be able library community to exchange to satisfy the request in a cataloging information and SUTRS single transaction with the is a Simple Unstructured Text server. Record Syntax, invented by the ZIG. Both of these record 9.2 Element Sets and Element syntaxes will be discussed in Set Names greater detail later. The fields in a record are called elements in Z39.50. A 9.1 Result Sets Revisited collection of elements would be In Z39.50, result sets are an element set and if that modeled as containing ordered collection of elements had a lists of pointers to records. name, it would be an element set This does not mean that a server name. In Version 2, element set is actually supposed to create names are the only mechanism lists like that; it means that available to specify the the client can act as if that elements desired from a record. were true. The ordering of the Version 3 includes rich result set is important, mechanisms for identifying and although the type of ordering is specifying the elements in a not. Whether the records are in record, but element set names rank order or chronological are sufficient for many order or sorted by title is purposes. unimportant. What is important The standard only specifies two is that the client can ask for element set names: “F” for Full on the MARC record syntax. In records (all elements included) the United States, the Z39.50 and “B” for Brief records. developers tend to forget that Brief records are a problem. fact and refer to USMARC as The standard is rightly silent simply MARC. But, there are 14 on the elements that constitute other MARC record syntaxes a brief record. But, that recognized by the standard and leaves the client developer at they will be supported by many the whims of the server of the commercial servers as developers as to the fields that Z39.50 services are implemented can be displayed in a brief in Europe. For the most part, record. Unless I am sure that a these are national MARC syntaxes particular server returns all (e.g., UKMARC, CANMARC and the fields that I want to FINMARC) which encode support display in a brief record, I for local cataloging standards, usually ask for full USMARC but there are also some records and throw away the internationally recognized MARC fields that I do not need. That syntaxes (e.g., UNIMARC and technique will not work if SUTRS INTERMARC.) records have been requested, since they consist of a single 9.3.1.2 Explain field. Successful interoperation of Z39.50 clients and servers in 9.3 Record Syntaxes Version 2 is based on a priori A record syntax is simply the agreements between the two way that records are encoded. parties. The client had no There are a number of record mechanism for determining what syntaxes recognized in Z39.50. Use attributes were going to be Object identifiers are used to supported by the server for specify record syntaxes, so searching nor what record record syntaxes must be either syntaxes were going to be registered with the maintenance supported for retrieval. The agency or be registered as nodes client had to be told this of an implementor’s private information through some process object identifier tree. As outside of the standard. mentioned above, there are two Currently, most of the server widely recognized record hosts provide human readable syntaxes; USMARC and SUTRS. documentation that can be used I’ll describe them in detail to statically configure a below, but it is worth client. The Explain service mentioning the other record provides the mechanism that syntaxes listed in the standard. allows those things to be Understanding what these other determined dynamically. syntaxes are and where they are The Explain service is intended to be used is useful in implemented as a database that understanding where the can be queried by the client. implementors of the standard are Access to the records in this taking it. database is primarily gained through search keys defined by 9.3.1 Non-core Record the standard. The contents of Syntaxes these records, which contain things like Use attributes and 9.3.1.1 Other MARC Syntaxes record syntaxes supported are There are a number of variants defined by the Explain record syn as well as string tags intended to carry field “names” that 9.3.1.3 OPAC might be of use to a human OPAC (Online Public Access viewing them, if not of use to Catalog) records were an attempt the software receiving them. to allow holdings information to GRS is being heavily used by the be transmitted along with Chemical Abstract Service to bibliographic records (usually provide their complex chemical sent in USMARC format.) They records which include things were not widely implemented and like chemical structure a number of non-standard information. In addition, the mechanisms for transmitting GILS (Government Information holdings information were Locator Service) profile uses developed instead. GRS records as the most flexible way to transmit Information 9.3.1.4 Summary Locator records and the CIMI Summary records were developed (Coalition for the Interchange as part of an effort to bring of Museum Information) group is the WAIS retrieval software into looking to use GRS records to compliance with Z39.50. WAIS transmit their information. was based on the 1988 version of Z39.50, with a number of private 9.3.2 USMARC extensions. Among these USMARC can be quite daunting, at extensions was the ability to first. Fields are tagged provide brief record information numerically and there is little in a more standardized way than pattern to the tagging. If you the simple Brief Element Set do not know what the tags mean, Name provided by the standard. you are out of luck. To complicate things more, some of 9.3.1.5 GRS the fields can repeat and others The Generic Record Syntax is at cannot: but some of the non- the heart of most of the growth repeatable fields have other, areas of Z39.50 implementation. repeatable, fields that the The other record syntaxes extra data can go into. (e.g., described so far have limited The first author of a book might structural flexibility (you be placed in a 100 field, a non- cannot have really complex repeating field, but subsequent fields) and rigid semantics authors would be put into 700 (everyone knows what to expect fields.) in every field.) What was There are actually three needed was a record syntax with different sets of rules combined great flexibility and the to form USMARC records. The ability to transmit both first is the encoding standard; elements with semantic ANSI Z39.2. It describes the understanding and elements with physical encoding of all MARC no semantic understanding. records (at least that is the GRS was invented for this theory.) The second is the purpose. It supports tagging rules: what data goes in arbitrarily complex hierarchical what fields. Finally come the records and elements that can formatting rules for the data carry numeric tags from any (e.g. names should be entered number of well-known name spaces last name first with a comma separator.) Fortunately, as client developers, it is not librarians. necessary to worry about the formatting rules. 9.3.3 SUTRS The encoding rules are The Simple Unstructured Text straightforward. The records Record Syntax exists to provide are theoretically encoded as 7- a minimal level of data bit ASCII, but I’ve seen many communication. SUTRS records private characterset extensions are essentially preformatted that use 8-bit ASCII. The records. The intent is to allow record begins with a fixed the client to ask the server to format leader that describes the format its data in a manner length and type of the MARC suitable for display to a human. record and well as describing The assumption is that the some of the encoding options server probably has a better that will be used in the record. idea of how its data should be The leader is followed by a formatted than the client does, directory that describes what especially if they have no other fields are contained in the record syntaxes in common. records, the offset from the SUTRS records are simply a beginning of the data that the single field of ASCII characters field can be found at and the with a newline character at length of the field. Fields can least every 72 characters. As have tags in the range 1 through the name states, there is no 999. structure within that single Finally comes the data itself. field. The client should not Fields with tags 1 through 10 try to parse the field looking have a fixed format. Fields for subfields. with tags 11 through 999 have subfields. The subfields do not 9.4 The PresentRequest have additional subfields. The PresentRequest is created by Subfields have single character a call to the PresentRequest() tags and the tags are primarily routine. It takes a referenceId, alphabetic, but digits and even a resultSetName, a punctuation characters are resultSetStartPoint and sometimes used. The fields and numberOfRecordsRequested, an subfields are separated by ElementSetName and a separator characters. preferredRecordSyntax. I have provided a routine to The referenceId is a long and help with the decoding of the has the same meaning as in USMARC records; marc2dir(). It InitRequest(). The takes a USMARC record and resultSetName will be one of the decodes it as if it were a BER resultSetNames used in a record. Even if you decide that previous successful call to you do not want to use the BER SearchRequest(). The Utilities, this routine will resultSetStartPoint is the give you a leg up on the relative record number from the decoding of USMARC records. In resultSet of the first desired addition, I have provided a record. table at the end of this article numberOfRecordsRequested is the that lists a large number of count of the number of USMARC fields and their sequential records requested. subfields and the labels that The sum of resultSetStartPoint are commonly put on them when and numberOfRecordsRequested displaying them to non- minus 1 should be less than or equal to the resultCount for the resultSet. ElementSetNames will be set to “F” or “B”, depending on whether Full or Brief records are desired. preferredRecordSyntax is set to the Object ID of either USMARC or SUTRS. Preprocessor variables of MARC_SYNTAX and SIMPLETEXT_SYNTAX are provided for this purpose. As in SearchRequest(), PresentRequest() returns a pointer to an allocated area in memory that contains the BER encoded PresentRequest. The prototype for PresentRequest() is: returned and the PresentRequest unsigned char *PresentRequest( completely failed. If this long referenceId, happens, there should be an char *resultSetName, error_code and possibly an long resultSetStartPoint, error_msg explaining why the long PresentRequest failed. The numberOfRecordsRequested, other possible values indicate char *ElementSetNames, why fewer records than requested char *preferredRecordSyntax); where returned. Those values are described in detail in the 9.5 The PresentResponse standard. The values for The PresentResponse is processed error_code and error_msg are by PresentResponse() and it, described in the Error like SearchResponse(), takes the Diagnostics appendix of the BER record returned by the standard. Z39.50 server as its only The numberOfRecordsReturned parameter and returns a pointer contains the count of records to an allocated structure which returned by the server. It contains the fields of the should be equal to the PresentResponse. The prototype numberOfRecordsRequested from for PresentResponse() is: the PresentRequest(). If it is PRESENT_RESPONSE not, then presentStatus should *PresentResponse( have had a value other than 0. CHAR *response); The nextResultSetPosition is set and the PRESENT_RESPONSE to the value that should be used structure looks like this: as the resultSetStartPoint in typedef struct the next PresentRequest() to { retrieve the next sequential long referenceId; record. long presentStatus; recordSyntax will be set to the long numberOfRecordsReturned; Object ID of the record syntax long nextResultSetPosition; used by the server for the char recordSyntax[50]; records returned. It should be struct record the same as the { preferredRecordSyntax used in long len; the PresentRequest(). char *record; records will contain an array of } *records pointers to and the lengths of long error_code; the records returned. The char *error_msg; number of pointers in the array } SEARCH_RESPONSE; will be equal to The referenceId is the same one numberOfRecordsReturned, even if provided to SearchRequest(). the server accidentally returns presentStatus contains either a fewer records than it claims. zero to indicate that there was If this happens then the pointer no error during the will be set to NULL. PresentRequest or it contains a status code describing the type 9.6 ZDEMO of problem encountered during Zdemo needs four things to allow the PresentRequest. it to do PresentRequests. It A value of 5 in presentStatus needs a way for the user to means that no records were specify the resultSetStartPoint and numberOfRecordsRequested, a way to specify the preferredRecordSyntax, a way to specify the ElementSetName and a way to display the records returned. The preferredRecordSyntax is specified with a new command (r) that takes as its single argument either the word USMARC or the word SUTRS. A global variable is set based on the argument. The default value for preferredRecordSyntax is USMARC. The ElementSetName is specified with a new command (e) that takes as its single argument either the word FULL or the word BRIEF. A global variable is set based on the argument. The default value for ElementSetName is FULL. The PresentRequest is initiated and the numberOfRecordsRequested and resultSetStartPoint are specified with a new command (d) that takes two optional numbers representing the resultSetStartPoint and numberOfRecordsRequested respectively. The default value for both numbers is 1. The code in zdemo for parsing the two new commands is trivial and looks much like the code added to handle the search (s) command, so it will not be shown here. Zdemo will call a new routine, zread() to handle the PresentRequest. The code for zread() looks like this: void zread(char *parms) { long i, numrecs=1, whichrec=1; PRESENT_RESPONSE *present_response; unsigned char *request, *response; if(*parms) /* were any arguments provided */ { char *t; whichrec=atoi(parms); if( (t=strchr(parms, ‘ ‘)) != NULL) numrecs=atoi(t); } request=PresentRequest(0, resultSetName, whichrec, numrecs, ElementSetName, preferredRecordSyntax); response = do_irp(request, socket); present_response=PresentResponse(response); if(!present_response) { printf("Did not get a PresentResponse!\n"); return; } numrecs= present_response->numberOfRecordsReturned; printf("%ld records returned\n", nRecs); switch(present_response->presentStatus) { case IRP_success: printf("Present successful\n"); break; case IRP_partial_1: case IRP_partial_2: case IRP_partial_3: case IRP_partial_4: printf("Partial results returned\n"); break; case IRP_failure: printf("Present failed\n"); break; } for(i=0; irecords[i].record) /* did a record really get returned? */ { char *end, *ptr; if(strcmp(present_response->recordSyntax, SIMPLETEXT_SYNTAX)==0) { /* SUTRS records have a BER wrapper around them */ DATA_DIR *temp=dalloc(3); bld_dir(present_response->records[i].record, temp); ptr=(char*)temp->ptr.data; end=ptr+(int)temp->count; dfree(temp); } if(strcmp(present_response->recordSyntax, MARC_SYNTAX)==0) { /* convert the MARC record to a SUTRS-like record */ ptr=formatmarc(present_response->records[i].record); end=ptr+strlen(ptr); } while(ptrrecords[i].record); } if(present_response->error_code) { printf("Error_code=%ld, message='%s'\n", present_response- >error_code, present_response->error_msg ? present_response->error_msg:"None provided"); if(present_response->error_msg) free(present_response->error_msg); } free(response); free(present_response); } Client API can be found on the 9.6.1 Displaying USMARC same host in the Records pub/BER_utilities directory. Decoding USMARC records is OCLC maintains their copyright beyond the scope of this to all these materials, but they article, but the code to have been made freely available accomplish it is provided as to all developers. part of zdemo at OCLC anonymous FTP site. (See the section of 12.1 License Source Code Availability at the ©1995 OCLC Online Computer end of this article.) Library Center, Inc., 6565 Frantz Road, Dublin, Ohio 43017- 10. Terminating the Z39.50 0702. OCLC is a registered session trademark of OCLC Online In Version 2 of Z39.50, both the Computer Library Center, Inc. client and the server are NOTICE TO USERS: The Z39.50 allowed to terminate the session Client API (“Software”) has been at any time, simply by dropping developed by OCLC Online the TCP/IP connection between Computer Library Center, Inc. them. The routine disconnect() Subject to the terms and has been provided to do this. conditions set forth below, OCLC It accomplishes this by closing grants to user a perpetual, non- the socket with a call to the exclusive, royalty-free license fclose() routine (one of the to use, reproduce, alter, standard C i/o routines.) modify, and create derivative works from Software, and to 11. Summary sublicense Software subject to This article has described the the following terms and elements of Z39.50 necessary to conditions: create a simple client. Many of SOFTWARE IS PROVIDED AS IS. the more complex elements have OCLC MAKES NO WARRANTIES, been mentioned in enough detail REPRESENTATIONS, OR GUARANTEES that you should have some idea WHETHER EXPRESS OR IMPLIED if you need them. Hopefully the REGARDING SOFTWARE, ITS FITNESS code provided and its discussion FOR ANY PARTICULAR PURPOSE, OR have shown you that while it is THE ACCURACY OF THE INFORMATION not trivial to build Z39.50 CONTAINED THEREIN. applications, neither is it User agrees that :1) OCLC shall terribly complex. have no liability to user arising therefrom, regardless of 12. Source Code Availability the basis of the action, The source code for the Z39.50 including liability for special, Client API and zdemo is consequential, exemplary, or available via anonymous FTP at incidental damages, including ftp.rsch.oclc.org in the lost profits, even if it has pub/SiteSearch/z39.50_client_api been advised of the possibility directory. A copy of this thereof; and :2) user will article, all the source code and indemnify and hold OCLC harmless user documentation for the from any claims arising from the Client API can also be found in use of the Software by user’s that directory. sublicensees. The BER utilities used by the User shall cause the copyright notice of OCLC to appear on all copies of Software, including derivative works made therefrom.