random library quotation Link: Publications Forum Link: About DLF Link: News
photo of books

DLF PARTNERS

""

DLF ALLIES

""

Comments

Please send the DLF Director your comments or suggestions.

Acquiring Copyright Permission
To Digitize and Provide Open Access to Books


By Denise Troll Covey
October 2005
Digital Library Federation and Council on Library and Information Resources

About the Author
Denise Troll Covey, principal librarian for special projects at Carnegie Mellon University, is responsible for conducting research to inform library administration and strategic planning. She manages Carnegie Mellon University Libraries' performance measures and keeps abreast of technological developments and their social implications and the laws, policies, practices, and standards relevant to digital libraries. Her current projects are engaging Carnegie Mellon faculty members in developing an institutional repository for their scholarly work and conducting an analysis of the public comments and public hearing transcripts regarding the U.S. Copyright Office's investigation of orphan works. Ms. Covey serves on the National Information Standards Organization Standards Development Committee, where she is leading an initiative to develop rights expression and management for scholarly information. She is also secretary of the Measurement, Assessment, and Evaluation Section of the Library Administration and Management Association. Ms. Covey was a Distinguished Fellow at the Digital Library Federation in 2000-2001.

Acknowledgments
Many people were involved in the copyright-permission work reported here. I thank Lily Waters and Leigh Caskey Schenk of the U.S. Army for doing the groundwork for the feasibility study, Tracey Connelly for continuing the work, and Carole George for seeing the study through to completion and conducting the preliminary data analysis. George also created the database for tracking the data in the Posner project and contributed to the design of the publisher database for the Million Book Project. I thank Ruth Ann Schmidt for her help with the Posner permissions and Cynthia Brown for her work on the Thousand Book Project, her assistance in designing the publisher database, and her help finding publisher addresses for the Million Book Project. The time and effort of the librarians and students who helped find publisher addresses are also much appreciated, as are the reading and editing suggestions for this report provided by Cindy Carroll.
Special thanks go to Erin Rhodes, who did the bulk of the permissions work on the Posner and Million Book projects. Without her efforts, diligence, and persistence, this report would not exist. Though her task was sometimes tedious, she persevered. Though she often felt confused and frustrated, she persevered. Despite the inadequacy of our mechanisms for tracking the data, she persevered. And she never complained. I could have had no better assistant.
Special thanks are also extended to Kathlin Smith of the Council on Library and Information Resources for her careful reading and editing suggestions and to the copyright attorney that she recruited to ensure the accuracy of my overview of copyright law.
Those who funded this work must also be thanked: Henry Posner, Jr., and his wife Helen for funding the Posner copyright-permission work and Bruce Miller at the University of California Libraries at Merced for funding the Million Book Project copyright-permission work.
Finally, I thank Gloriana St. Clair, dean of University Libraries at Carnegie Mellon and one of the directors of the Universal Library Project. Her vision and substantial allocation of my time were essential to what we have accomplished.

Contents

Foreword
The contemporary academic library and its users have an appetite for digital copies of books that far outstrips the willingness and ability of publishers to provide such access. In the science disciplines, contemporary and historical journal literature is becoming widely available in digital format, albeit at considerable cost. Access to the scholarly record in digital form is already transforming the manner in which science disciplines communicate, publish, research, and review excellence.
This widespread access is not the case for the mass of works in the humanities, arts, and social sciences. Yet it is in these disciplines that the utility of older scholarly books and journal articles tends to be the greatest. Scholars have great interest in digital access to even the very earliest primary works of literature, history, philosophy, religion, and culture that have appeared in print.
While some of this primary material is available in commercial databases, much of it is not. As a result, libraries are increasingly seeking to negotiate noncommercial, free, public, digital access-open access-to copyrighted and noncopyrighted materials that are not available from scholarly publishers. These materials are typically out of print and have little promise for commercial exploitation, yet they are very much alive to scholarly inquiry. Compounding the problem is that nineteenth- and twentieth-century materials are often in a state of physical decay. This only adds urgency to the library's desire to save these materials for current and future scholarship.
What are the stumbling blocks to digitization? Is copyright law a major barrier? Is it easier to negotiate with some types of publishers than with others? To what extent does the age of the material influence permission decisions? This report, by Denise Troll Covey, principal librarian for special projects at Carnegie Mellon University, responds to many of these questions. It begins with a brief, cogent overview of U.S. copyright laws, licensing practices, and technological developments in publishing that serve as the backdrop for the current environment. It then recounts in detail three efforts undertaken at Carnegie-Mellon University to secure copyright permission to digitize and provide open access to books with scholarly content.
The results of this well-documented, meticulous survey are illuminating. The responses to the author's carefully designed inquiries reveal a picture of confusion and chaos in the face of a significant opportunity and growing need. The range of publisher responses and their requests for fees, restrictions, and caveats show a publishing industry that has in no way reached a consensus on how to respond to libraries' growing desire to provide digital access to scholarly materials. Indeed, some publishers are not even aware of what rights they actually own.
From the expense and difficulty of determining copyright status and locating the owner to the struggle to get a response from a publisher when seeking permission to digitize for scholarly use, this timely report provides a detailed account of the challenges facing libraries today. It should be of practical use to publishers and librarians alike as we try to navigate the current situation and work to improve it, through such innovations as the "orphaned works" legislation that is currently under discussion. The lessons learned and reported will inform and aid the rest of us as we wrestle with the same problems.
David Seaman
Executive Director
Digital Library Federation

Introduction
Information users increasingly look to find materials on the Web. Many scholars and librarians dream of creating a "universal digital library," where high-quality resources are accessible from their desktops. Realizing this dream-creating a digital library that is comparable to an excellent traditional library and providing open access to it,- require negotiating copyright permission.
This report focuses on three efforts at Carnegie Mellon University to acquire copyright permission to digitize and provide open access to books-that is, to make books freely available on the Internet for public use. [1] To provide a context for the studies that form the basis of this report, the report begins with an overview of copyright laws, licensing practices, and technological developments that have brought about dramatic changes in the cost and dissemination of scholarly information. This section also describes the impact that these changes have had on research, learning, and libraries. The three studies, including data analyses that explore the response and success rates with different types of publishers and publications and transaction costs, are then presented in detail. Anecdotes illuminate the effort required and problems encountered in trying to acquire copyright permission for open access, from the difficulty of determining copyright status and ownership and locating copyright owners to the questions, concerns, record-keeping methods, and changing contractual practices that constrain publishers' embrace of open access. The report describes how lessons learned in each study were applied in the next study and the benefits of flexible and innovative approaches to acquiring copyright permission.


A Brief History of Law and Practice

In the late eighteenth century, James Madison wanted the newly formed United States to offer temporary monopolies to creators as incentives to continue to create, after which their works would become common property-part of what came to be known as the public domain-to foster creativity in others. Thomas Jefferson had reservations about such monopolies based on the history of copyright as an instrument of censorship in England (Vaidhyanathan 2002; Thibadeau 2004). [2] Nevertheless, our founding fathers gave Congress the power "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries" (United States Constitution 1789, I, 8, 8). Soon thereafter, the first Congress passed this country's first copyright law as a bargain between creators and users of intellectual property designed to balance the private interest of creators with the public good of others (Copyright Act of 1790).

The initial term of U.S. copyright, legislated in 1790, was 14 years, with the right to renew copyright for another 14 years during the last year of the initial term if the author was still living. [3] Federal copyright protection initially applied only to maps, charts, and books. It granted authors or those to whom they transferred their copyrights the sole right to print, reprint, publish or sell these works (Hirtle 2004). [4] Over the course of the next two centuries, however, the duration and scope of copyright protection were extended, the requirements for acquiring it were changed, and the rights associated with it were redefined. More recently, new technologies evolved that changed scholarly communication and raised questions about the interpretation and application of copyright.

Table 1 shows significant changes in the copyright term. [5] The Copyright Act of 1870 doubled the duration of the initial copyright term. The Act of 1909 doubled the duration of the renewal period. It required that works be marked with a standard copyright notice to acquire copyright protection and be deposited and promptly registered with the Copyright Office. The 1909 act recognized the right of owners to reproduce, distribute, perform, or make derivatives of intellectual property and acknowledged works for hire as a category of works able to acquire copyright protection. It also codified the doctrine of first sale, which allows the owner of a lawful copy of a copyrighted work to sell or dispose possession of that copy.

Table 1. Overview of selected extensions of the copyright term

Year of Copyright Act17901870190919761998
WorksAll copyrighted worksWorks copyrighted prior to 1978Works copyrighted 1978 or afterAll copyrighted works
AuthorAllAllAllAllPersonalCorporatePersonalCorporate
Initial term (years)14282828Life + 50Publication + 75 or creation + 100, whichever is shorterLife + 70Publication + 95 or creation + 120, whichever is shorter
Renewal term (years)14142847
Total years28425675variesvariesvariesvaries

Among the many changes to U.S. copyright law, the Copyright Act of 1976 stands out as one of the most dramatic. That act set the duration of copyright for all works created on or after January 1, 1978, to 50 years following the death of the author [6] or, for works for hire, 75 years after publication or 100 years after creation, whichever expired first. For works copyrighted prior to 1978, the renewal period was extended by 19 years (the initial 28-year term plus the 28-year renewal period plus 19 years, for a total of 75 years). The 1976 Copyright Act clarified or modified the definition of the rights to reproduce, distribute, perform, or make derivatives of intellectual property, and recognized the right of public display. It preempted state copyright laws, which in some cases had provided copyright protection for unpublished works in perpetuity, [7] distinguished two types of work for hire, and implemented a variety of compulsory licenses. With a few specified exceptions, the 1976 act required works to be marked with a standard copyright notice to acquire copyright protection-a requirement eliminated in the Berne Convention Implementation Act of 1988. The 1976 act eliminated the requirement of "prompt" registration with the Copyright Office, but provided incentives for doing so. Despite these incentives, many works are not registered today.

The 1976 Copyright Act also defined copyright infringement, its defenses and remedies, and exemptions from liability. Section 107 of the act codified for the first time the doctrine of fair use of copyrighted works, wherein use "for purposes of criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship or research, is not an infringement of copyright." In determining whether a use is fair, a court considers the purpose of the use, the nature of the work, the amount and substantiality of the use in relation to the entire work, and the effect of the use on the market for or value of the work. Section 108 includes limited privileges that allow libraries under certain circumstances to make copies for preservation, replacement, or distribution directly to patrons or through interlibrary loan. Section 109 confirms the doctrine of first sale.

Once the 1976 Copyright Act went into effect, copyright protection of both published and unpublished work began the moment that an original work was rendered or fixed in tangible form. Over the next two decades, additional laws were enacted to confirm or extend the scope of copyright, for example, to confirm copyright protection for software (1980), to include the moral rights of creators of selected visual arts (1990), and to protect constructed architectural works (1990) (Copyright Law of the United States of America 2003, iii-viii). Copyright protection now applies to any work "fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device" (Copyright Law of the United States of America 2003, 2). It does not apply to ideas, facts, titles, names, slogans, procedures, processes, methods, concepts, principles, blank forms, or works produced by the U.S. government.

A subsequent law in 1992, the Copyright Renewal Act, automatically renewed all copyrights secured between 1964 and 1977 and not renewed by the copyright owner, the rationale being that inadvertent failure to comply with formalities such as renewal could result in loss of copyright (Copyright Renewal Act 1992). Six years later, the Sonny Bono Copyright Term Extension Act (CTEA) extended the copyright to the life of the creator plus 70 years or, for works for hire, 95 years from the date of publication or 120 years from the date of creation, whichever expires first (Copyright Term Extension Act 1998). The CTEA extension applied to all works that were copyright protected at the time the law went into effect. Critics of the CTEA dubbed it the "Mickey Mouse Act" because of the Walt Disney Corporation's active support of the legislation, which prevented Mickey Mouse from entering the public domain for another 20 years (see, for example, Ellis 1999).

In 2002, attempts were made in the case of Eldred versus Ashcroft to have the CTEA declared unconstitutional (Downes 2002). The amicus brief argued that repeated retroactive extensions of the copyright term threatened to enact perpetuity by means of installments and to undermine the system of free expression protected by the First Amendment to the Constitution (Brief for Petitioners 2002, 6). [8] In 2003, the U.S. Supreme Court upheld the CTEA by a vote of seven to two, noting that although the extension was perhaps unwise on policy grounds, it was nevertheless constitutional (537 U.S. 2003, 17).

Before proceeding further with this discussion of copyright, it is necessary to interject a discussion of two other significant phenomena that have had a strong impact on copyright legislation and practice. First, in the 1960s and 1970s, commercial publishers began acquiring copyright ownership of more and more scholarly work (Pew Higher Education Roundtable 1998). Second, in the 1980s and 1990s, new technologies precipitated dramatic changes in how people create, access, and use intellectual property. The collision of these phenomena has had a profound impact on scholarly communication.

Information technologies enabled scholarly resources to be distributed on the Internet. With the invention of the World Wide Web and the provision of full-text resources online, academic users quickly came to prefer the convenience of Web access to going to the library. The shift to online distribution of scholarly information was accompanied by a shift in library acquisitions, from purchased ownership to licensed access. Many publishers charge a significantly higher price for online access than for traditional access to their content. As a result of their desire to satisfy increasing user demand for online access, libraries now spend more money for materials, but acquire fewer materials, than they did previously. In response to extraordinary increases in the prices of scholarly journals, many libraries have canceled subscriptions-to print, and even to some extent to online journals. As the number of subscriptions decreases, some publishers raise the prices of their journals, which only leads to more cancellations. The spiraling cycle of decreased subscriptions and increased prices is untenable over the long haul.

Thus, we face a paradox. On the one hand, the Web offers easy, speedy, convenient access to abundant content, more content than was ever readily available before. On the other hand, canceled subscriptions and the acquisition of fewer materials by libraries suggest a decline in scholarly resources available to a particular community. This affects not only the research conducted but also the impact of the published research results, as fewer libraries can afford to license the journals or purchase the books. The trend to restrict access, enabled by copyright and contract law, has been referred to as "the progressive commoditization of knowledge" (De Rosa, Dempsey, and Wilson 2003).

Licenses restrict access to members of the licensing community. In terms of the Web, commercially licensed materials reside in the deep Web, inaccessible using popular Internet search engines such as Google, which index only materials on the surface Web. Furthermore, licenses are covered by contract law: In practice, licenses need not grant public rights such as fair use or interlibrary loan. In conjunction with digital rights management (DRM) technologies, commercially licensed systems control who can access a resource and what they can do with it. When libraries license access to a resource, they agree to the terms of the license and the restrictions of DRM implemented in the delivery system. DRM systems cannot recognize public rights such as fair use. Furthermore, they can take the approach that if a right is not explicitly granted, it is denied, thereby prohibiting any innovative use in the future.

The online distribution of commercial information precipitated new laws. Perhaps the most striking is the 1998 Digital Millennium Copyright Act (DMCA), which made it illegal to circumvent, remove, impair, or deactivate technological protections against unlawful access, and illegal to manufacture, sell, or distribute code-cracking devices that would enable unauthorized access or copying (Digital Millennium Copyright Act 1998). [9] Critics argue that, in effect, the DMCA legalized whatever rights or restrictions copyright holders implemented in computer code (Electronic Privacy Information Center 2002; Lohmann 2002).

Current licensing practices and technological protections, in conjunction with the anticircumvention law, in many cases make it impossible to exercise the first-sale doctrine in the digital environment. In a required follow-up study of the DMCA in 2001, the U.S. Copyright Office concluded that the doctrine of first sale does not apply to online resources. The explanation given was that the doctrine was designed as a distribution right applicable to tangible works where distribution is limited by geography and the natural degradation of the physical work. Digital works are intangible and their distribution infringes the reproduction right of the copyright holder. [10]

Fair use is also at risk in the digital realm. Efforts to develop guidelines for fair use of digital works in education and libraries, initiated at the Conference on Fair Use in 1994, failed for the most part (Conference on Fair Use 1998). Little progress has been made in this arena, with the exception of the Technology, Education, and Copyright Harmonization (TEACH) Act passed in June 2001. The TEACH Act legalized the temporary storage and transmission of limited portions of a performance or display, comparable to what could be done in the timeframe of a live classroom session, by educational institutions without their having to acquire permission from the copyright holder (S. 487 2001).

The flurry of proposed legislation and current litigation pertaining to copyright law and related public policies is beyond the scope of this report. Interested readers are encouraged to visit the Web sites of the U.S. Copyright Office, the Electronic Frontier Foundation, and Public Knowledge as starting points for keeping informed. [11]


The Implications

All the copyright term extensions described in the previous section of this report diminish the rate at which creative works enter the public domain. Under current copyright law, if a work is in the public domain, anyone can reproduce, distribute, make derivative works of, or perform or display the work publicly without permission or payment. Legal allowances for use of copyrighted works without permission, such as the doctrine of fair use, the TEACH Act, and library copying privileges, are limited, and the circumstances of their application are sufficiently ambiguous to deter their use. [12] While a work is copyright protected, people more often than not must request permission and often pay the copyright holder a fee for the right to reproduce, distribute, make derivative works, or perform or display a work. Copyright holders can grant one or more of these rights, and they may do so exclusively or nonexclusively. The many retroactive extensions of the copyright term since 1962 keep "substantially all works with otherwise-expiring copyrights out of the public domain for a generation" (Moglen 2002, 12).

Given the mushrooming volume of publishing over the past century and the current duration of the copyright term, we can assume that the number of books currently in the public domain is relatively small in comparison with the number of books still protected by copyright. [13] We can also safely assume that most of those books are no longer in print. [14] The commercial marketplace offers limited access to out-of-print books. Libraries supposedly provide access to these books (Lessig 2004).

So what is happening to these millions of out-of-print books presumably residing on shelves in a library or offsite storage facility? If not weeded from the collection, books printed on acidic paper are slowly turning to dust. As fewer copies remain and as they become more brittle, these books cease to circulate or to be available for interlibrary loan, making them virtually inaccessible to potential readers. Copyright law allows libraries to make up to three physical copies of a deteriorating book if it is not otherwise available. Given user preferences for online access, however, libraries are not likely to invest their limited resources in making and storing physical copies. Copyright law also allows digitization for preservation purposes in certain circumstances, but access to the online copy must be restricted to users physically in the library that created the digital copy. To provide open access, or even authenticated remote access, to these digitized works requires permission from the copyright owner of each title. [15] It is no wonder that according to a recent survey, 89 percent of librarians agree or strongly agree with the statement: "Copyright issues are one of the major challenges to the building of the digital library" (Carroll 2004, 9).

In 2004, Brewster Kahle and Richard Prelinger challenged the constitutionality of existing copyright laws on grounds that the copyright system denies public access to works protected by copyright but no longer available in print without benefiting the creator or the public. The argument raised questions about the constitutional bargain between private interest and public good and focused on the fact that "the copyright system contains no mechanisms to create and maintain useful records of copyright ownership" (Stanford Law School Center for Internet and Society 2004). In the absence of such records, "people who would like to distribute or use orphaned works-digital libraries, or creators who would like to include the work in their own creative expression-often are unable to clear rights" (Stanford Law School Center for Internet and Society 2004). A federal district court in California dismissed the case, but it is currently on appeal to the Ninth Circuit Court of Appeals. [16]

However, on January 26, 2005, the U.S. Copyright Office issued a notice of inquiry regarding orphan works, tentatively defined as "copyrighted works whose owners are difficult or even impossible to locate." Prompted by the Senate Judiciary Committee and with support from the House Judiciary Committee, the inquiry is part of an investigation to determine "whether orphaned works are being needlessly removed from public access and their dissemination inhibited" (U.S. Copyright Office 2005). The Copyright Office received 721 initial comments and 146 reply comments in response to its notice, many of which provided detailed answers to the specific questions posed in regard to the age, identification, and designation of orphan works, and the nature of the problems faced by people who want to use them. Many responses also elaborated what remedies should be available to copyright owners who later come forward to challenge the orphan status of their work and the infringement by users.

Recalling the era when U.S. copyright law required renewal to retain or extend copyright for a longer term, one might think that data on copyright renewals could shed light on the rate at which copyrighted works were abandoned by their owners. Research conducted by the Copyright Office in 1961 revealed that less than 15 percent of all registered copyrights were renewed, and that the renewal rate for books was only 7 percent (Ringer 1961, 220). Michael Lesk's recent analysis of two million books published in the United States from 1923 through 1963 revealed that less than 10 percent had their copyrights renewed (Lesk 2004b). The unanswered and unanswerable question is whether the low rate of renewal was inadvertent or intentional. Were these books abandoned because the copyright owners no longer wanted to exercise their rights or because they failed to comply with the formality of copyright renewal in the requisite timeframe? Another compelling and unanswerable question is whether past practice (i.e., the low rate of copyright renewal 40 or more years ago) is necessarily predictive of current or future behavior in a radically different technological environment for the creation and dissemination of copyrighted work.


The Response

Although capitalism has historically trusted the marketplace to be self-correcting over time, by the mid-1990s there were serious problems in the market for scholarly communication. This had several significant results

Faculty members began putting their work on the surface Web, where access is free, scholarly or educational use is unrestricted, and their work can easily be found using popular Internet search engines. Over time, this grassroots phenomenon became known as the open-access movement. In 1997, the Association of Research Libraries initiated the Scholarly Publishing and Academic Resources Coalition (SPARC), which aims to lower the cost and expand the online dissemination and use of peer-reviewed scholarly work by contributing to the development of open-access journals and competitive alternatives to expensive commercial journals, promoting fundamental changes in the system and culture of scholarly communication, and raising awareness of the relevant issues. [17] The movement to provide free online access to scholarly articles was aided significantly by the international Budapest Open Access Initiative in 2002. [18] Since then, substantial research has been conducted to determine the impact of open access and to address the concerns of various stakeholders in the scholarly information supply chain. [19] Perhaps the most significant research conducted, in terms of promoting the open-access movement, is the research confirming that open access increases use of material and need not decrease sales when a given work also appears in a commercial publication (see, for example, Pope 1999, Lawrence 2001, Antelman 2004, Harnad and Brody 2004).

The efforts of SPARC and other organizations and individuals engaged in the open-access movement have yielded results. Although definitions of what constitutes open access, in terms of how promptly after publication a work must be made available on the surface Web, vary somewhat among the players, the movement to liberate scholarly work from the deep Web is afoot with intensity. The number of agencies and foundations that require or encourage open access to publications based on research they funded is increasing. [20] The number of peer-reviewed, open-access journals is increasing. Some prominent commercial journals have started offering authors the option of paying to have their published work available through open access (Gass and Doyle 2005). The number of universities creating institutional repositories to provide open access to their scholarly assets is further evidence of the spread of open-access initiatives. Though there is much debate about who will pay for open access, consensus regarding the benefits makes it unlikely that the movement will halt any time soon (Davis et. al 2004; Gass and Doyle 2005).

Users clearly prefer the ease and convenience of surface Web access to information. Just as clearly, current copyright laws and licensing practices interfere with meeting their needs and expectations. Most students and faculty (50 percent to 90 percent) perceive a significant gap between their high-priority needs and the service their library is providing (LibQual+TM 2002, 2003). Despite the burgeoning success of the open-access movement, a tremendous amount of work remains to be done. To date, the open-access movement has focused on scholarly journals, but libraries contain more than journals. Creating a digital library that is comparable to an excellent traditional library requires negotiating copyright permission to digitize and to provide open access to an array of materials. Given the cost of acquiring and storing redundant library collections, it behooves libraries to explore the possibility of acquiring permission to digitize and provide open access to different kinds of materials.

What follows is a detailed look at three studies conducted by Carnegie Mellon University Libraries to acquire copyright permission to digitize and provide open access to books. The first study was conducted to determine the feasibility of acquiring copyright permission for open access to books. The second and third studies, informed by the results of the feasibility study, were conducted as components of real digitization projects. The work illuminates problems and complexities relevant to the designation of orphan books.


The Random Sample Feasibility Study

Between 1999 and 2001, the Carnegie Mellon University Libraries conducted a feasibility study to determine the likelihood of publishers granting nonexclusive permission to digitize and provide surface Web access to their copyrighted books. The primary goal of the project was to develop an understanding of the process, the time it takes, and the problems encountered. We also wanted to ascertain whether different types of publishers responded differently and whether they responded differently on the basis of the type or print status of their publications.

We consulted a statistician on campus to ensure that the random sample of books we selected from our library catalog would yield statistically valid results. The random sample contained 368 titles. We created a database to track the study. Each record in the database contained fields for capturing the bibliographic information about a title, whether it was in or out of copyright, the name and contact information of the publisher, dates for when initial and follow-up letters were sent, details about the publisher's response, and whether permission was granted or denied. Publishers were given the option of providing open access or of restricting access to Carnegie Mellon users. The database had fields to capture this information and was later amended to capture additional restrictions that publishers applied. The database also enabled coding the type of publisher, type of publication, and whether the title was in or out of print.

The study took two years to complete because it was conducted with intermittent labor. Overall, four people worked on the project, including two visiting librarians from the U.S. Army, Lily Waters and Leigh Caskey Schenk. Waters designed the database and helped populate it with the bibliographic information about the books. Two other researchers, Tracey Connelly and Carole George, subsequently worked on the project, with George completing the preliminary data analysis (George 2001). Meanwhile, librarians coded the print status and type of publisher and publication for each title in the sample.

Of the 368 titles in the sample, 351 (95 percent) were copyright protected. Upon initial examination, 10 percent of the copyrighted titles were eliminated from the study because they were technical reports or theses that had been mistakenly cataloged as books. We also eliminated 3 percent of the books when third-party copyright ownership, for example, of charts, illustrations, or photographs, would have complicated the pursuit of copyright permission. As the study proceeded, another 8 percent of the titles were eliminated when publishers introduced complications from third-party ownership. Ultimately, 11 percent of the copyrighted titles were eliminated as too complicated to pursue. The final sample for which we were seeking copyright permission included 277 titles published by 209 publishers.

Our plan was to send letters to the publishers requesting nonexclusive permission to digitize and to provide free-to-read Web access to their copyrighted books in the sample. If we received no response in a month, we would send a follow-up letter. The initial request letter and follow-up letters were somewhat different:

The initial request letter described Carnegie Mellon University Libraries' collaboration with the School of Computer Science on the Universal Library Project, which aims to digitize the cultural and intellectual history of humankind. The letter referenced the experience of the National Academies Press when it began to provide open access to its books (open access did not decrease sales) and emphasized digitization as a way for our libraries to address the "urgent need for more space to store physical volumes." The letter asked publishers to tell us who owned the copyright to their titles if they no longer did or if they did not own the copyright to a work in its entirety. It also explained that this was a research project and provided a brief overview of what we expected to learn.

The follow-up letter referenced the date of the initial request letter, summarized its contents, and further explained that we were working from a random sample of books in our collection. It ended with the provocative statement: "If we do not receive a response from you within 60 days of mailing this letter, we will assume that you have granted permission to digitize the book and offer it free to read by anyone on the Internet." Though we had no intention of digitizing books without permission, we included this statement to elicit a response. To our surprise, only one publisher commented on this approach. [21]

We included a contract with both letters. The contract offered options for publishers to deny permission or to grant permission either for open access or for access restricted to Carnegie Mellon users.

The first lesson learned was that identifying and locating copyright holders is time-consuming and often unsuccessful. Publishers move, merge, or go out of business, or copyright reverts to the author. Resources used to locate addresses included Global Books in Print, Literary Market Place, and Internet search engines. We failed to find addresses for 7 percent of the publishers. We sent an initial copyright-permission request letter to each publisher that we could locate. Sometimes we sent initial request letters for the same title to different publishers because the first copyright holder contacted no longer owned the rights and responded with a referral-typically without an address, which started the arduous process of locating the copyright owner all over again. Many letters were returned marked "Address (or Addressee) unknown."

If the initial letter appeared to have been successfully delivered but we got no response, we sent a follow-up letter. More than 60 percent of the publishers contacted required a second or third letter. The average length of time to receive a response from a publisher was 101 days from the date of the initial letter for a response of "Permission granted," and 124 days for a response of "Permission denied." The time to respond was probably affected by our use of intermittent labor, which caused delays in sending follow-up letters. We had planned that follow-up letters would be sent one month after the initial request letter, but two months or more often passed between sending the initial and follow-up letters.

We sent a total of 524 letters: 278 initial request letters and 246 follow-up letters. The number of letters was unnecessarily high because we sent separate letters for each title, rather than sending one letter per publisher that bundled all their titles into one request.


Overall Results

Ultimately, 21 percent of the publishers, accounting for 19 percent of the titles in the sample, could not be located. Half of the publishers of books in the final sample responded to our request letters, and more than a fourth of them granted permission, enabling us to digitize and provide Web access to about a fourth of the copyrighted books in the sample (figure 1).

figure

Fig. 1. Analysis of the final random sample of 209 publishers and 277 titles

The preceding analysis of the full sample of publishers and titles sheds light on the difficulty of locating publishers, soliciting a response, and securing copyright permission to digitize and provide Web access to books. However, it skews the success rate in the sense that it measures the success of permissions granted in a context that includes publishers that were never contacted.

Another way of viewing the data is to look only at the publishers we located and the titles in the final sample to which they held the copyright. Looking only at these publishers and titles, more than a third of the publishers did not respond to our letters and more than a third of them granted permission. The permissions granted enabled us to digitize and provide Web access to less than a third of the books in the sample issued by the publishers we contacted (figure 2).

figure

Fig. 2. Analysis of the publishers successfully located

By the time we were analyzing the data from the feasibility study, we had started seeking copyright permission to digitize and provide Web access to books in the Posner Memorial Collection and had revised our process to try to increase the response and success rates. (The Posner study is described later in this report.) We were beginning to believe that increasing the response rate would require one set of strategies and that increasing the success rate among those that did respond would require another. With this in mind and for the purpose of future comparisons, we analyzed the publisher responses in the feasibility study. Looking only at the publishers that responded and the titles to which they held copyright, more than half of the publishers granted permission for almost half of the titles (figure 3).

figure

Fig. 3. Analysis of completed negotiations


Analysis of Restrictions

The copyright permission request letter offered an option to restrict access to the Carnegie Mellon community, but many publishers mandated other restrictions. Overall, 68 percent of the publishers that granted permission applied some kind of restriction. U.S. publishers were slightly more likely to apply restrictions than foreign publishers were. The most common restriction related to access. Access to more than half of the titles for which permission was granted was restricted to Carnegie Mellon users. Publishers also applied the restrictions or stipulations listed below. The data are based on the number of titles to which the restriction applied, rather than the number of publishers that applied the restriction, because publishers of multiple titles in the sample sometimes applied different restrictions to different titles.

The analyses that follow are based on the number of titles, rather than the number of publishers, in the final sample because publishers with multiple books in the sample sometimes granted permission for some, but not all, of their titles. The response rate is based on the number of titles with copyright owned by publishers we successfully contacted. The success rate is based on the number of titles with copyright owned by publishers that responded.


Analysis of Foreign and Domestic Publications

Most of the books in the final sample were published in the United States. Foreign publishers were twice as difficult to locate as U.S. publishers. If we located them, the response rates for foreign and domestic publishers were roughly the same. The foreign publishers were more likely to grant permission than U.S. publishers were (figure 4).

figure

Fig. 4. Analysis of foreign and domestic titles


Analysis by Publisher Type

The response and success rates varied across different types of publishers (figure 5). Although museums and galleries published very little of the content in our sample, they were easy to locate and always responded and granted permission. University presses and scholarly associations also published little of the content in the sample and were relatively easy to locate. University presses were far more likely to respond, but much less likely to grant permission, than scholarly associations were. Most of the books in the sample were published by commercial publishers. They were the most difficult to locate, least likely to respond, and least likely to grant permission. Scholarly associations were slightly more likely to respond than commercial publishers, and university presses were slightly more likely to grant permission than commercial publishers.

figure

Fig. 5. Analysis by publisher type


Analysis by Publication Type

The response and success rates also varied with different types of publications (figure 6). Most of the sample content was traditional monographs. Monograph publishers were somewhat difficult to locate. Though likely to respond, they were not very likely to grant permission. Publishers of the few series in the sample were the most difficult to locate, the most likely to respond, and the least likely to grant permission. The few publishers of exhibit catalogs were likely to respond and always granted permission. Publishers of the few conference proceedings were the easiest to locate and the least likely to respond; more than half granted permission.

figure

Fig. 6. Analysis by publication type


Analysis by Print Status and Publication Date

Most of the books in the sample were out of print (figure 7). Publishers of out-of-print books were more difficult to locate, less likely to respond, and more likely to grant permission than were publishers of books that were still in print.

figure

Fig. 7. Analysis by print status

Figure 8 shows the distribution of titles in the sample by publication date and print status. Most of the titles were published relatively recently; only one-third were published before 1970. All the titles published before 1940 and almost all the titles published 1940 to 1960 are out of print. Books in print outnumber books out of print in the sample only in the decade 1990 to 2000.

figure

Fig. 8. Analysis of print status by publication date (number of titles)

Figure 9 shows the results of our efforts to secure copyright permission by publication date. Because the number of titles in the sample published during each decade varied significantly, the data must be interpreted cautiously. The results suggest that the age of the work did affect the results, but not always in ways we expected.

With rare exceptions, the older the work, the more difficult it was to locate the publisher. We could not find the publishers of most of the books published between 1920 and 1930 and of almost half of the books published between 1940 and 1950. Publishers of more than a third of the books published from 1950 to 1960 and 1960 to 1970 could not be found. By contrast, few of the publishers of books published 1980 or later could not be found.

When we could locate the publisher, there did not appear to be a correlation between the date of publication and the response rate. We received no response regarding 30 percent to 40 percent of the 1930-1940, 1970-1980, and 1980-1990 samples, and no response regarding 20 percent to 30 percent of 1940-1950, 1950-1960, and 1960-1970 titles.

Although permission was sometimes denied for older titles and granted for more recently published titles, the overall trend was as expected: The more recent the date of publication, the more likely that permission would be denied. Permission was denied for more than half of the titles in the sample published between 1990 and 2000, accounting for 35 percent of the total permissions denied in the study. Only 17 percent of the titles in the sample were published between 1990 and 2000.

Permission was granted for 20 percent to 30 percent of the titles in the sample published in the 1930s, 1940s, 1950s, 1960s, 1970s, and 1980s. However, with the exception of one decade, the percentage of total titles in the sample published in a given decade was roughly equivalent to that decade's percentage of the total permissions granted in the study. For example, books published between 1960 and 1970 constituted 15 percent of the sample and 14 percent of the total permissions granted. The exception was the decade 1980-1990. Titles published between 1980 and 1990 made up 30 percent of the titles in the sample, but accounted for 37 percent of the total permissions granted in the study. This suggests that 1980-1990 might be a good decade for acquiring copyright permission to digitize and provide open access to books.

figure

Fig. 9. Analysis of results by publication date (number of titles)


Analysis of Transaction Costs

Focused on outcomes, we neglected to track transaction costs in the feasibility study. However, we suspect that the cost per title was high, in part because of the intermittent labor and consequent learning curves. A crude, retrospective speculation about the transaction cost, based on the cost of paper and postage for the letters and a very conservative estimate of labor costs ($13,000) for Connelly and George, two of the researchers who worked on the project, [22] is roughly $200 per title for which permission was granted. [23] The speculative cost would be significantly higher if it included my time and the cost of Internet connectivity and database creation.


Conclusions and Lessons Learned

The random sample feasibility study revealed that it is indeed possible to secure permission to digitize and provide open access to books, but the work is tedious and often comes to naught. We learned that even determining the copyright status of a book can be difficult and time-consuming. When we conducted the study, we had a fledgling understanding of U.S. copyright law, but knew very little about foreign copyright law. When in doubt, we assumed that a work was copyright protected and sought permission. In the course of the study, we mistakenly requested permission for four titles that were no longer copyright protected. One publisher denied permission to digitize and provide Web access to three of these titles. Whether this means that the publisher did not know the copyright status of the books, or whether they believed their permission was required regardless of the copyright status of the books is unknown. The feasibility study also demonstrated that identifying and locating current copyright owners, particularly of older books, is a difficult, time-consuming, hit-or-miss, sometimes futile process. We agreed that future studies would track the transaction costs.


The Fine and Rare Book Study

In 2001, the University Libraries at Carnegie Mellon received funding from Henry Posner, Jr., and his wife Helen Posner to digitize and provide Web access to the Posner Memorial Collection of fine and rare books and associated archival material. The collection includes landmark titles of the history of Western science, beautifully produced books on decorative arts, and fine sets of literature. Henry Posner, Sr., formed the collection between 1924 and 1973, starting with literature and decorative arts and, after 1950, focusing on the history of science. [24] The funding provided by the Posners was to purchase a high-quality color scanner designed for handling fine and rare books and to pay the scanner operator.

We knew that the collection contained some copyrighted titles and therefore that the project entailed acquiring copyright permission. The Posner project, which took place between 2001 and 2004, became our second copyright-permission study. The library catalog records for each title in the collection were exported and loaded into a database to track the copyright-permission work. Additional database records were created for copyrighted catalogs and newsletters among the archival material. The database and initial request process were identical to those used in the feasibility study. The request letter offered the option to restrict access to the Carnegie Mellon community. A contract, prepared in consultation with university legal counsel, was included with the letter.

Work began summer 2001 with intermittent labor. The library staff member assigned to the project could dedicate little time to the work, did not consult the copyright-renewal records to determine the copyright status of titles published in the United States from 1923 through 1963, and, as the workers in the feasibility study did, reported having difficulty locating publishers' addresses. As of September 2002, only 75 initial letters and no follow-up letters had been sent. Only a third of the publishers contacted had responded. Of these, 25 percent had granted permission with some kind of restriction or stipulation.

At this point we made several decisions. First, we calculated that at the current rate it would take us four-and-a-half years to complete the copyright-permission work on the Posner titles. We wanted to finish the permission work by the time the books had been digitized, i.e., by the end of 2003. We concluded that we needed to recruit more labor. Second, if a publisher had multiple titles of interest, we decided to list all the titles in a single letter rather than to send one letter per publication. We also decided to call publishers that had not responded to our initial letter rather than to send a second letter. We hoped thereby to increase our success by engaging the publishers in conversation, answering their questions, and addressing their concerns. Follow-up contact was to be initiated several weeks after the initial request letter was sent.

In May 2003, Erin Rhodes was hired as a part-time temporary employee dedicated to the Posner project copyright permission work. Her employment was extended to full-time in September 2003, and the bulk of the permissions work was completed by November of that year. Nevertheless, we were still locating estates and finalizing negotiations for Posner titles through 2004.

The only way to definitively determine the copyright status of a book published between 1923 and 1963 (the period during which copyright renewal had to be formally registered) is to have the Office of Copyright conduct a title search. As an experiment, we asked the Office of Copyright to conduct a title search for seven titles. They immediately charged us $150 and estimated that it would take four to six weeks to conduct the searches. We received their response 15 weeks later. They found only one of the seven titles. Given the number of titles in the Posner Memorial Collection published between 1923 and 1963, we estimated that it would cost $6,000 to $7,000 to have the Office of Copyright conduct title searches. The cost would be closer to $8,000 if we included the titles with no date of publication. We decided that our time and financial resources were better spent consulting the copyright-renewal records and seeking copyright permission when the copyright status of a work was not clear.

Rhodes consulted the copyright-renewal records for books and serials published in the United States between 1923 and 1963 and coded the records in the database accordingly. As the work progressed, the coded copyright status for items sometimes changed as we learned more about foreign and domestic copyright law. [25] In August 2003, we consulted Carnegie Mellon legal counsel to help us determine the copyright status of foreign publications, but it quickly became apparent that the complexity of international copyright law was impeding the project. [26] We eventually abandoned efforts to determine the copyright status of many of the foreign books and chose to assume that they were still in copyright and to request permission to digitize them. Later, we consulted university legal counsel about the copyright status of the archival materials associated with the books in the Posner collection. Legal counsel said that we did need permission to digitize and provide Web access to book catalogs, newsletters, broadsides, newspapers, the text of speeches, and correspondence from the book collector or his secretary to book dealers. However, upon examination of sample correspondence from book dealers to the collector, counsel advised us that we did not need permission to digitize and provide Web access to this material because the letters were compilations of facts about the books. The Posner family granted permission to digitize and provide access to personal correspondence from the collector, Henry Posner, Sr., and the work-for-hire correspondence prepared by his secretary. By November 2003, we were still unable to locate some of the publishers of book catalogs, so Rhodes began examining the title pages of book catalogs published in the United States, applying the laws about books published without a copyright notice when notices were required, to determine whether the catalogs were in the public domain. [27]

Determining copyright status is one step. Determining copyright ownership is another. Locating the copyright owner is yet another. The three do not necessarily go hand in hand. The publisher or creator cited on the title page of a book is the beginning point for a journey that often resembles traversing a maze. For U.S. works published between 1923 and 1963, renewal records must be consulted to determine the copyright status. According to the U.S. Copyright Office, the claimant in a copyright-renewal record is the copyright holder at the time of renewal, but not necessarily the current copyright owner. Similar ambiguity applies to the title page of more-recent publications that do not require copyright renewal: The name that appears there might not be the current copyright owner. There is neither a definitive source to identify current copyright holders nor a definitive source for locating those holders once they have been identified. According to copyright attorney Michael Shamos, "If a work is in copyright and the copyright is assigned to a new owner, an assignment document needs to be filed with the Copyright Office. Otherwise, the new owner will not be able to prove his ownership and will not be able to sue anyone for infringement." When asked about publishers we could not locate, he responded, "It is possible that the publishers went defunct and either abandoned their copyrights (not expressly, but by default) or conveyed copyright back to the authors, or sold the copyrights to satisfy creditors in bankruptcy" (e-mail from Michael Shamos to Denise Troll Covey, March 7, 2003). We agreed that the cost of having the Copyright Office conduct a search for each title was prohibitive and that we would consult the Copyright Office renewal records and use our own devices to determine copyright status and try to identify and locate copyright owners. We also agreed, in consultation with university legal counsel, that if we could not locate the copyright owner, we would assume permission was denied and not digitize and provide Web access to the books.

Not counting correspondence or ephemeral material in the archival folders, the Posner Memorial Collection contains 1,106 volumes or cataloged items. We determined that 26 percent (284) were still in copyright or were to be treated as if they were. [28] By the conclusion of the study, we determined that these 284 copyrighted works were owned by 104 different copyright holders.

As in the feasibility study, identifying and locating the copyright holders were arduous tasks. There were many publishers that we could not locate using the resources used to find publishers in the feasibility study. An administrative assistant and several librarians were recruited to assist with locating publishers. Again, many letters were returned marked "Address unknown." Letters to foreign publishers were sometimes returned marked simply "Gone away." Publishers often responded by referring us to another publisher, sometimes a foreign publisher, [29] the author, or the author's estate. The referring publisher seldom provided an address.

To locate authors, we began consulting the Authors Registry, [30] Writers, Artists, and Their Copyright Holders (WATCH) File, [31] the Society of Authors in London, [32] and the Authors Licensing and Collecting Society. We had some success consulting these sources, but were still looking for addresses for 13 authors or estates in 2004. Rhodes became quite the detective, making several phone calls to libraries, book dealers, and university professors to discover contact information for the author or estate in question. She also began examining the books themselves, looking for clues. In one case, she discovered that the author had been a professor at City College in New York, so she called a librarian at City College. The librarian helped her locate the author's daughter, who provided her mother's address. The mother, the current copyright holder, granted permission to digitize and provide Web access to the title in the Posner Memorial Collection.

In the course of the Posner study, we encountered many third-party copyright owners. Unlike the feasibility study, we could not eliminate these books from the project. If copyright was held by a publisher, we did not pursue third-party copyright owners. However, when copyright reverted from the publisher to the author, we attempted to contact all the authors and contributors cited in the bibliographic record for the work. For example, the bibliographic record for The Journal of Christopher Columbus indicates that the work was translated by one person and revised and annotated by another. Yet another person provided the appendix. Often we were unable to locate all of the third parties.

If a request letter appeared to have been successfully delivered, we conducted a follow-up call or sent an e-mail message a few weeks after the letter was sent. Nevertheless, we frequently sent multiple letters to the same publishers because they had lost or misplaced our letter by the time we spoke to them on the telephone or contacted them by e-mail. In many cases, we also sent multiple letters when the copyright to a title had transferred to another publisher or to the author and we had difficulty locating them. Subsequent letters were frequently sent as attachments in e-mail. By the end of the project, we had sent 174 initial request letters and made 159 follow-up attempts in e-mail or by telephone.

In the discussion that follows, the term publisher, unless otherwise distinguished from authors and estates, refers to a unique copyright holder of content in the Posner collection. The term title refers to an item identified in a record in the database created to track the copyright permission work for the Posner project. [33] Because of the way in which the database was constructed, distinguishing titles from volumes and parts would have required manually counting every data point in this report and would have significantly hampered the data analyses. Twelve percent of the copyrighted titles in the collection were multivolume or multipart works. In the analyses below, when not distinguishing titles from volumes or parts made a significant difference in the results, the instance is noted.


Overall Results

As of November 2004, we were still unable to locate almost a third of the publishers, which meant that we had no opportunity to even try to acquire permission for 13 percent of the copyrighted titles in the Posner Memorial Collection. Almost two-thirds of the publishers responded to our request letter, e-mail, or telephone calls. Almost half of them granted permission to digitize and provide Web access to their works, [34] accounting for most of the copyrighted titles in the collection (see figure 10). More than twice as many publishers granted permission as denied permission.

In the context of the Posner study, permission denied meant that the publisher either responded "no" to our request or was considered to have denied permission according to the three-strikes rule. We established the three-strikes rule in September 2003, in consultation with university legal counsel and the dean of University Libraries, as a way to bring closure to a negotiation if the publisher failed to respond to our initial request letter and two follow-up attempts. For example, three strikes could consist of an initial request letter that was not returned to us marked "address or addressee unknown" and two telephone messages or two e-mail messages that were successfully delivered with no response. According to Carnegie Mellon legal counsel, inability to locate a publisher or lack of response from a publisher, despite due diligence, did not permit us to treat these cases as permission granted. Only two publishers were considered to have denied permission under the three-strikes rule. The few publishers in figure 10 indicated as "No response" are authors and estates that we located in 2004, but that had not yet received two follow-up contacts from us when the data were analyzed for this report.

figure

Fig. 10. Summary of overall results of the Posner study

Of the permissions granted, 12 percent were for multivolume or multipart works with the volumes or parts bound separately: 13 titles had 2 volumes, 4 titles had three parts or volumes, and 1 had 4 volumes. Of the permissions denied, 13 percent were for multivolume works: 3 titles had 2 volumes, and 1 title had 18 volumes, a supplement, and a catalog. Of the titles for which we could not locate the publishers, 8 percent were for multivolume or multipart works: 2 titles had 3 volumes and one had 3 parts. None of the titles for which we received no response were multivolume or multipart works.

To better understand the outcome of our efforts, we must look strictly at the publishers we located. Of those we contacted, almost all responded and most granted permission. As shown in figure 11, the permissions granted enabled us to digitize and provide Web access to 71 percent of the copyrighted titles published by those we contacted.

figure

Fig. 11. Analysis of the publishers successfully contacted

Looking only at the publishers with which we have completed negotiations and the titles in the Posner collection to which they hold copyright, the overall success rate was 70 percent, granting permission for 75 percent of the titles published by those that responded (figure 12).

figure

Fig. 12. Analysis of completed negotiations


Analysis of Restrictions

As shown in table 2, publishers granting permission to digitize and provide Web access to their books in the Posner collection applied fewer restrictions than did publishers granting permission in the feasibility study.

Table 2. Comparative analysis of restrictions applied

Feasibility StudyPosner Study
Restrict access to Carnegie Mellon users [35]54%6%
Display full citation23%10%
Permission does not apply to third-party material22%5%
License to provide access expires8%6%
No simultaneous users6%4%
Permission to scan expires3%0%

Of those publishers that stipulated that permission did not apply to components of the work with copyright owned by a third party, all of them limited the duration of the license to provide Web access to the title, and 89 percent prohibited simultaneous use. However, the duration of the license was longer in the Posner study than in the feasibility study. In all but one case, the licenses in the Posner study were six to seven years, rather than the three to four years stipulated in the feasibility study. [36] All the publishers that limited the duration of the license in the Posner study were university presses.

Only 1 percent of the publishers that granted permission in the Posner project requested a copy of their digitized books, in comparison with 15 percent in the feasibility study. One publisher made granting permission contingent on our assurance that we would terminate Web access to its four titles in the Posner collection if it gave us 60 days' notice: "A short notice period is essential to allow for the possibility of a reprint license being granted" (e-mail to Erin Rhodes, September 30, 2003). We agreed, and the publisher granted permission. In addition, one current copyright owner, the heir of the author, stipulated that he would grant permission if we would digitize and include his father's notes and updated introduction to the work. We agreed. He denied permission.

Several publishers contacted in the Posner study inquired about royalty fees. We had decided not to pay fees in the Posner project. In one case, the original publisher still owned the copyright to the title, which was published in 1934. Though the publisher was disappointed that we would not pay a royalty, it still granted permission. In another case, the copyright to a title published in 1966 had passed to another publisher that denied permission because we would not pay a royalty fee.

The analyses that follow are based on the number of titles, rather than on the number of publishers, because publishers with multiple books in the Posner collection sometimes granted permission for some titles, but not for others. The response rate is based on the number of titles with copyright owned by publishers we successfully contacted. The success rate is based on the number of titles with copyright owned by publishers that responded. Collection content refers to the copyrighted titles in the Posner Memorial Collection.


Analysis of Foreign and Domestic Publications

As in the feasibility study, most of the books in the Posner collection were published in the United States and foreign publishers were far more difficult to locate than U.S. publishers. However, domestic publishers were more likely to respond and more likely than foreign publishers to grant permission in the Posner study, as compared with the feasibility study (figure 13).

figure

Fig. 13. Analysis of foreign and domestic titles


Analysis by Publisher Type

Again, the response and success rates varied across different types of publishers. We successfully located all the scholarly associations, university presses, and commercial and special publishers of collection content. Special publishers own the copyright to the largest proportion of the content, followed by university presses, and authors and estates. Scholarly associations and commercial publishers own copyright to little of the material. Copyright to 13 percent of the collection content is owned by units that we could neither identify (code by publisher type) nor locate. The response rates of the publishers that we contacted were very good (figure 14). Special publishers almost always granted permission. Scholarly associations, and authors and estates were likely to grant permission, although authors and estates were difficult to locate. More than half of the commercial publishers granted permission. University presses were the least likely to grant permission.

figure

Fig. 14. Analysis by publisher type


Analysis by Publication Type

The response and success rates also varied with different types of publications. Most of the copyrighted content in the collection is traditional monographs; 10 percent is book catalogs. The book and catalog publishers we located were very likely to respond, and most granted permission (figure 15). Publishers of the few series and serials in the collection were more difficult to locate, but all those that we successfully contacted responded and granted permission. [37] The few miscellaneous copyrighted archival materials in the collection, for example, newsletters and newspapers, were coded as "Other" publications. The owners of these materials were relatively easy to locate, and all of them responded and granted permission.

figure

Fig. 15. Analysis by publication type


Analysis by Print Status and Publication Date

Given the age and nature of the Posner Memorial Collection and data on print status by publication date in the feasibility study (see figure 8), we strongly suspected that most of the copyrighted content in the Posner collection is out of print. When we began coding the print status of copyrighted books in the collection, we quickly ran into snags. For example:

When Rhodes raised these questions, the dean of University Libraries provided an answer, but we were simultaneously discovering in our work on copyright permissions for the Million Book Project (described later) that publishers answer these questions differently. In light of this fact, we came to believe that an analysis of print status as defined by a librarian would be meaningless for our current purposes and chose not to complete this analysis. Details are provided later in this report.

The copyrighted titles in the Posner collection are significantly older and the distribution of titles published per decade is more even than that of the books in the random sample feasibility study. Roughly 88 percent of the Posner titles were published before 1970, compared with 35 percent of the random sample. Figure 16 shows the comparative distribution by publication date of copyrighted titles in the two studies. Figure 17 shows the results of our efforts in the Posner study to acquire copyright permission by publication date.

figure

Fig. 16. Comparative distribution of project content by publication date (number of titles)

figure

Fig. 17. Analysis of Posner study results by publication date (number of titles)

The extent to which the age of the work affected the results in the Posner study is unclear.

Publishers of older material in the Posner collection were not conspicuously more difficult to locate than were publishers of more-recent material. More diligence and persistence were expended on locating and following up with publishers in the Posner study than in the feasibility study; consequently, more publishers were found and more of them responded than in the feasibility study. In the Posner study, there was no striking difference in the ability to find publishers of titles published between 1920 and 1930 and titles published between 1970 and 1980: Almost one-fourth of them could not be found. Although roughly a third of the collection content was published between 1960 and 1980, 40 percent of the publishers we could not locate were publishers of titles published during these two decades.

It appears as if permission was frequently denied for titles published between 1920 and 1930. However, this is an instance when not distinguishing titles from volumes or parts in the collection skews the data. A closer examination revealed that of the 23 so-called copyrighted "titles" in the collection published in that decade and for which permission had been denied, 20 of them pertain to one actual title (in 18 numbered volumes, a catalog, and a supplement). Two of the remaining three "titles" are a two-volume work. Though there are multivolume and multipart works in the Posner collection published in subsequent decades, none exceeds four parts or volumes, and the total per decade does not dramatically skew the data. [38]

Looking at the data per decade, permission was granted for more than 60 percent of the titles published in the 1930s, 1940s, 1950s, 1960s, and 1970s. With the exception of two decades, the percentage of total copyrighted titles in the collection published in a given decade was roughly equivalent to that decade's percentage of the total permissions granted in the study. For example, books published between 1960 and 1970 constituted 20 percent of the sample and 21 percent of the total permissions granted. The exceptional decades were 1920-1930 and 1930-1940. Titles published 1930-1940 made up 25 percent of the copyrighted titles in the collection. Permission was granted for 90 percent of the titles published during this decade, accounting for 35 percent of the total permissions granted in the study. In contrast, titles published 1920-1930 constituted 13 percent of the copyrighted collection, but accounted for only 3 percent of the total permissions granted in the study (this was, however, the decade where not distinguishing titles from volumes or parts skews the data). Over a third of the total permissions granted were for titles published in 1960 or later.


Analysis of Transaction Costs

We closely monitored the labor costs of copyright-permission assistant Erin Rhodes. [39] Rhodes determined the copyright status of the materials in the Posner collection, identified and located the copyright holders, prepared the initial request letters, followed up by e-mail or by telephone, updated the database, and prepared the preliminary statistics. We also monitored the cost of paper and postage for initial request letters and long-distance telephone charges. We did not factor in the cost of Internet connectivity, database creation, consultation with university legal counsel, or administrator time. University legal counsel did not levy a fee for consultations and advice. As project administrator, I answered many questions from the copyright-permission assistant, often in consultation with the dean of University Libraries, and the more-difficult questions from publishers. [40] I also conducted the data analyses.

On the basis of the costs monitored from May through October 2003, we spent roughly $10,808 on labor (wages and benefits), $379 on long-distance phone calls, and $100 on paper and postage. The average transaction cost per copyrighted title in the Posner collection for which permission was granted was $78. The cost would be significantly higher if Rhodes's work with authors and estates in 2004, my time, and the cost of Internet connectivity and database creation were included.


Conclusions and Lessons Learned

Although we located fewer of the publishers of copyrighted content in the Posner project than in the feasibility study, we greatly increased the response and success rates during the Posner study. Of the publishers that we successfully contacted in the latter study, almost all responded to our request, while only two-thirds of those that we contacted in the feasibility study responded. Of the publishers that responded in the Posner study, 75 percent granted permission, in comparison with 45 percent in the feasibility study.

We attribute the increased success in the Posner project to a more informative initial request letter, to prompt follow-up by e-mail or telephone, and to the publishers' ability to see the quality of the digitized books in the Posner collection on the Web. [41] We believe that the age and nature of the Posner Memorial Collection were also significant factors. The Posner collection contains more old books than the random sample did, which probably accounts for the greater difficulty we encountered locating publishers of the Posner works. Special publishers own the copyright to most of the titles in the Posner collection but to very few titles in the random sample feasibility study. Results from the feasibility study suggest that special publishers are more likely to grant permission than traditional publishers are. Furthermore, it is conceivable that publishers of the works in the Posner collection liked the idea of seeing high-quality digital replicas of their books in an online special collection almost a third of which is classic works published from the fifteenth through the nineteenth century.

The Posner project confirmed our belief that it is possible to secure copyright permission to digitize books and to provide open access to them on the Web. It also confirmed what we had learned in the feasibility study about how difficult and time-consuming it is to determine copyright status and to identify and locate copyright holders, particularly authors and estates. However, by dedicating personnel and adjusting our processes, we significantly reduced the cost per title for which permission was granted. Further adjustments to our workflow or refinements to our negotiation strategies could yield even greater cost savings.

The Posner study also made us aware that many publishers do not keep good records. Some do not really know what they have published. On several occasions, we had to photocopy the title page of a book and fax it to the publisher because it claimed it had not published the book. Frequently, publishers reported that they did not know whether they had the right to grant nonexclusive permission to digitize and provide open access to their books. Some responded that the author had not granted them this right, so they denied permission. Given the age of the books in the Posner collection, it is unlikely that any author explicitly granted electronic rights to the publisher, so we suspect that the publishers that granted permission assumed they had this right because it was not explicitly denied.

As expected, many publishers expressed concern about open access and lost revenue, regardless of the fact that they were not generating revenue from these older, presumably out-of-print, books. The questions they asked related to access restrictions, the quality of the digitized books, and whether the delivery system enabled users to download or print the books. [42] Negotiating with publishers was frequently confusing, even frustrating, but always enlightening. A few examples will illustrate.

We agreed that future copyright-permission studies should experiment with ways to reduce the transaction costs and should formulate and test strategies to increase the response and success rates. We also believed that, whenever possible, we should examine physical books published between 1923 and 1989 to see whether they have a copyright notice as part of our effort to determine copyright status. [43] We also knew that we needed to develop a better way to manage the data and routinely calculate statistics. Inadequate methods of analyzing the data unnecessarily delayed analyses that might have guided us to change strategies and correct course in a more timely way.


The Million Book Project Study

The Million Book Project (MBP) is funded by the National Science Foundation (NSF) and the governments of India and China. Its goal is to digitize and provide open access to 1 million books by 2007. With rare exception, the books for the Million Book Collection are being scanned in India and China. The MBP is part of the larger Universal Library Project, which is a partnership of Carnegie Mellon School of Computer Science and the University Libraries. The Universal Library project directors aim to digitize the cultural and intellectual history of humankind. While the vision of the universal library is unlikely to be achieved in our lifetime, the philosophy makes the sequence in which materials are digitized inconsequential. [44]

The initial MBP collection-development meeting was held in November 2001. [45] Participants swiftly agreed that 1 million books could not be selected title by title. They also quickly agreed that garnering permission to digitize and provide open access to copyrighted books would be time-consuming and expensive. With these points in mind, the group decided that the Million Book Collection would be a collection of collections, including at least 200,000 indigenous works from partner institutions in India and China, 700,000 public domain works, and a target of 100,000 copyrighted works. Efforts to acquire permission to include copyrighted material in the collection would begin with titles cited in Books for College Libraries (BCL), a five-volume bibliography of books compiled by librarians and recommended for all academic library collections. The copyright-permission work would be considered a separate project requiring separate funding. Everyone agreed that copyright law must be strictly followed for all materials included in the collection and that letters of assurance must be secured from project partners in India and China. Memorandums of Understanding were completed in 2002. Partners in India and China would be responsible for securing permission to include copyrighted books published in India and China. [46] Carnegie Mellon would be responsible for securing permission to include copyrighted books published in the United States. [47]

Plans were to seek funding to ask publishers for permission to digitize and provide open access to the titles they published that were cited in BCL. In the meantime, eager to get started and secure copyrighted content for the Million Book Collection, one of the MBP directors, Raj Reddy, instructed the University Libraries to send letters to significant publishers of scholarly monographs, asking them to participate in the MBP by providing out-of-print books. In June 2002, letters were sent to 32 commercial publishers, 11 university presses, and 1 scholarly association selected by our head of acquisitions, Denise Novak. Using intermittent labor, little follow-up was done and little accomplished. Only seven of the commercial publishers we contacted responded: two granted permission, three denied permission, and two explained that copyright reverted to the author when their books went out of print. Another commercial publisher was considered "Permission denied" under the three-strikes rule. The remaining 24 commercial publishers were abandoned on the basis of preliminary data from the feasibility study that indicated they were the least likely to grant permission (George 2001). The initial 11 university presses were eventually contacted again by copyright-permission assistant Rhodes when she completed the bulk of the permissions work on the Posner project and turned her attention to the MBP in November 2003. Rhodes also followed up with some of the publishers of designated titles that we had contacted in a previous project, the Thousand Book Project, which was folded into the MBP when it began. Eventually, many of these publishers were also abandoned so that we could focus our efforts on publishers of works cited in BCL.

As we requested copyright permission for designated titles in the Posner project, we realized that the transaction cost of pursuing copyright permission per title (about $78 per book) was too high to pursue on a large scale. There are roughly 50,000 titles cited in BCL. [48] Assuming, for the sake of a cursory analysis, that the cited titles were published in the United States:

The 50,000 titles cited in BCL were published by about 5,600 publishers. On the basis of the transaction costs from the Posner study, I proposed that we change to a per-publisher approach for the MBP. After discussions with the dean and associate dean, we agreed to treat BCL like an approval plan for publishers, assuming that if they had published books cited in BCL then they were among the best publishers in the country. Many libraries use publisher-based approval plans to select books for their collections. We subsequently began asking the publishers of books cited in BCL for permission to digitize all of their out-of-print, in-copyright books to facilitate collection development for the MBP and to reduce the cost of acquiring copyright permission. Treating BCL like an approval plan for publishers substantially reduced the transaction cost by obviating the need to check copyright-renewal records for cited titles, simplifying letter preparation, and reducing the cost of paper and postage. Consider the effort required to prepare letters containing lists of designated titles: about 950 titles cited in BCL were published by Harvard University Press; 356 titles were published by Indiana University Press. [49] Furthermore, using a per-publisher rather than a per-title approach meant that each letter could potentially secure permission to include more titles in the Million Book Collection than just those cited in BCL. This was already apparent from two of the publishers that we had initially contacted in June 2002. The National Academies Press, with only 26 titles cited in BCL, had granted permission for about 3,400 titles published through 1994. Rand McNally, with two titles cited in BCL, had granted permission for all of its out-of-print, in-copyright books except atlases-roughly 900 titles. We calculated that if only 10 percent of the 5,600 publishers with works cited in BCL granted permission to digitize 500 books each, the result would be 280,000 copyrighted works for the Million Book Collection.

In August 2003, MBP project partner University of California Libraries at Merced (UC Merced) provided funding for a full-time copyright permission assistant at Carnegie Mellon, Erin Rhodes, and a part-time copyright permission assistant at UC Merced, Sarah Sheets. [50] With dedicated labor, in November 2003 we began sending letters to publishers of books cited in BCL. Letters to publishers briefly introduced the MBP, explicitly stated adherence to copyright law, and described the copyright absurdity wherein out-of-print, in-copyright books are neither generating revenue for the copyright holder nor readily available to potential readers. The letters provided an overview of research indicating that users want to find information online, but use it in print (Friedlander 2002); that online access increases use, including use of older materials (Guthrie 2000); and that open access does not decrease revenue (Pope 1999). The letters then asked publishers for nonexclusive permission to digitize and offer free-to-read on the Web any of the following options:

The letters explained that the Million Book delivery system will have minimal functionality. They closed with an offer to give participating publishers preservation-quality copies of their digitized books and the associated OCR text file, explaining that they could use the electronic files in added-value, fee-based services that they develop or use. For example, "Buy" buttons and print-on-demand service in conjunction with the images could generate revenue for them from the sale of in-print and out-of-print books. Unlike the feasibility study and Posner project, the MBP offered no option to restrict access to the Carnegie Mellon community.

Over time, we revised the letter to include answers to common questions asked or concerns raised by the publishers. For example, we updated the letter to state that the Million Book delivery system restricts saving and printing to one page at a time, as netLibrary does. Later, we included a sentence indicating that we were seeking a partner to provide print-on-demand service for the Million Book Collection. As more and more publishers indicated that they were not inclined to participate because there was no direct financial reward, we reorganized the letter to foreground our efforts to provide print-on-demand service, highlighting that it would generate revenue for them. For a short time, we included letters of endorsement for the project. These letters articulated some of the work involved in participating, but praised the project and noted that the benefit was worth the cost. However, when new publishers we contacted commented that the work described in the letters was discouraging and a reason not to participate, we discontinued including the endorsement letters.

When Rhodes turned her attention to the MBP, we had not completed the data analyses from the feasibility study or the Posner project. We relied on the preliminary analysis of the data from the feasibility study to guide our copyright-permission work in the MBP. On the basis of the preliminary finding that university presses and scholarly associations were more likely than commercial publishers were to grant permission to digitize and provide open access to their copyrighted books (George 2001), copyright-permission work on the MBP started with university presses and scholarly associations. When we had contacted all the university presses and scholarly associations with books cited in BCL, we began sending letters to commercial publishers, but soon stopped. Funding for the MBP copyright-permission work was running out, and we decided to dedicate our efforts to closing negotiations with publishers we had already contacted.

As in the Posner project, we often sent multiple letters to the same publisher because they had lost or misplaced the initial letter by the time we spoke to them on the telephone or contacted them by e-mail. We often sent subsequent letters as attachments to an e-mail message. To expedite the process, eventually we began sending even initial request letters as enclosures in e-mail if we could find an e-mail address for the publisher. From August to December 2004, 71 percent of the letters were sent by e-mail.

In the beginning, Rhodes was conducting follow-up calls or sending follow-up e-mails two weeks after we sent the initial letters. We discovered that in almost all cases, the publisher had not had a chance to even look at the letter in that period of time. We extended the period to three weeks, with little change in the results. By May 2004, we had extended the period to four weeks.

As of January 24, 2005, we had sent 665 initial request letters and made 782 follow-up attempts, [51] either by telephone or e-mail, to reach 431 publishers. Over time, we abandoned 67 of the publishers, mostly commercial presses, because they were contacted before we had labor dedicated to the MBP permission work and too much time had passed with no response or follow-up. We had also significantly changed our request letter and strategy. The data analyses in this report are based on the 364 publishers with which we sought to close negotiations.


Tracking the Data</