Information Retrieval Design (paperback edition)



Information Retrieval Design (paperback edition)
Product Details

Ever since humankind learned how to record messages on portable long-lasting media—clay tablets, papyrus, much later paper and more recently various electronic media—we have devised ways to describe and organize these messages so that they could be found, used, and enjoyed later on. This ancient practice has evolved, over the millennia, into the ancient and honorable profession of librarianship and related specializations such as cataloging and indexing. In the twentieth century, this basic human need to analyze and organize messages for later retrieval has become the main preoccupation of information science, under the rubric of “information retrieval.”

Thus this book is about the design of databases that will help retrieve messages. Its purpose is to help the designer consider all the relevant factors and to choose the best available options. In most cases, there are no single correct or right answers, only better and worse choices for given purposes and persons. What this book opposes is simply accepting designs and procedures without considering alternative possibilities and matching them with the needs, desires, preferences, and resources of the persons who will use IR databases, or be served by them.

James D. Anderson and José Pérez-Carballo

Purpose of this book; definition of IR databases

This book is for our students, and for others who aspire to design the best possible information retrieval (IR) databases for every type of clientele and every type of message, in whatever medium or format. The overall objective is maximum effective retrieval of useful messages for each particular user.

I hope that this detailed concentration on the fundamental decision points of IR database design will help members of the information professions to consider all the options, and then to design and create better IR databases. Our society needs the best possible IR databases to cope with the ever growing explosion of information on the internet and the world-wide web, as well as in older print formats, video, film, audio, and electronic formats.

Scope of this book

The scope of this book is determined by the features of modern IR databases. The term information retrieval database” or “IR database” is used in the broadest sense. Increasingly IR databases are designed for and implemented in digital media, but the design principles addressed in this book apply just as much to all media, including print on paper, microfilm or fiche, or even card catalogs. They certainly apply to modern digital libraries. The basic definition for the term “IR database” as used in this book is any database in any medium designed or created for the purpose of discovering and retrieving messages, texts and documents. Thus, it includes the whole gamut of IR databases presented to users via online connections, the world-wide web, CD-ROMs, or in print on paper: indexing and abstracting services (regardless of medium), library catalogs (including OPACs: online public access catalogs), bibliographies, and indexes, including back-of-the-book indexes (which can now be presented electronically with electronic books!). This and related definitions are expanded in the first part of the book.

Organization of this book; fundamental issues of IR database design

This book is organized around twenty fundamental design issues in IR database design. See the table of contents for a summary of these issues. These are issues that I have identified during twenty-five years of investigation, teaching, design, evaluation, and creation of IR databases. These issues were further refined between 1991 and 1997 by Committee YY of the National Information Standards Organization (NISO). This committee, which I chaired, focused primarily on indexing and indexes, which are fundamental components for all IR databases. But in fact, the committee addressed all of the fundamental design issues for IR databases.


Prof. Anderson has distilled his decades of experience in teaching and design of information retrieval databases to produce a work that covers every aspect of design. He spent a number of years as the chair of a National Information Standards Organization committee charged with revising the standard for indexes. His experience with that committee has enriched this, his magnum opus .

Sound textbooks for information retrieval database design have been very few, and none have been as comprehensive as this. I particularly enjoyed the practice of defining some examples, and following them as example cases in every chapter. With these example cases it becomes possible to see how the principles discussed in the chapter would be applied in an actual database.

Prof. Anderson's work will serve a variety of users. It synthesizes much that is known, while bringing to bear its author's insights on issues. The case studies make the work particularly useful as a textbook, but it will also serve as a refresher for those already in the field, and as a reference for all audiences.

– Jessica Milstead, Information Scientist


About the Authors

JAMES D. ANDERSON (B.A., Harvard College, M.S.L.S., D.L.S., Columbia University) is professor emeritus of library and information science in the School of Communication, Information, and Library Studies at Rutgers the State University of New Jersey. He was associate dean of this school from 1983 through 1997. His library career included service at Sheldon Jackson College, Sitka, Alaska, and the Multnomah County Public Library (Portland, Oregon). He taught at Columbia, St. John's, and the City University of New York before coming to Rutgers in 1977, where he specialized in the design of information retrieval databases. Major projects included the international bibliography and database of the Modern Language Association of America and the bilingual (French and English) Bibliography of the History of Art, sponsored by the J. Paul Getty Trust and the French Centre National de la Recherche Scientifique in Paris. At Rutgers he also chaired the President's Select Committee for Lesbian and Gay Concerns, and fought for equal benefits for lesbian and gay employees, without success. He left Rutgers in 2003 to protest the new president's proclamation that less than half benefits for lesbian and gay employees was a "reasoned response." For the Presbyterian Church (U.S.A.) he edited and published the journal More Light Update on lesbian, gay, bisexual, and transgender issues from 1980 to 2003. See the Bibliography for his relevant publications.

JOSÉ PÉREZ-CARBALLO (B.A. Universidad Nacional Autónoma de México, Ph.D., New York University) is associate professor of computer information systems at California State University, Los Angeles. He specializes in IR systems for academia and industry. He has participated in several collaborations in TREC (Text Retrieval Conferences) focusing on the interactive and natural language processing tracks. After five years at Rutgers University, where he taught and pursued research in human information behavior and web resource design in the School of Communication, Information, and Library Studies, he joined a company in Cupertino, California as an knowledge architect. He worked there on the design and implementation of domain specific knowledge representation systems for facilitating user interaction with large IR databases and in the application of natural language processing to enhance the performance of IR systems. His publications related to this book are listed in the Bibliography.
Paperback, 630 pages
Size, 8.5" x 11"
© 2005 Ometeca Institute