The Statement

Thursday, April 17, 2014

Advertise with us »

HathiTrust: An author never forgets

Illustration by Alicia Kovalcheck
Illustration by Alicia Kovalcheck Buy this photo

By Rachel Premack, Daily Staff Reporter
Published February 18, 2013

Dean of Libraries Paul Courant tossed an academic journal on a table in his office. The earwax-colored front cover read: The National Tax Journal, March 1980.

“Read that for as long as you can before you get bored,” Courant, a silver-haired man with a tiny earring in his left ear said, smiling.

Like most writing published in the past 100 years, the journal was printed on acid paper, which quickly deteriorates. Its pages are already yellow around the edges.

Courant said these pages will have the consistency of corn flakes in 50 years. The knowledge it holds, too, could evaporate like soggy cereal — and so could countless other tomes.

And that’s just one problem HathiTrust tries to eliminate. The HathiTrust Digital Library, a four-year-old initiative led by the University and involving over 60 other research libraries, seeks to digitize the record of human knowledge.

A hathi never forgets

The University was part of HathiTrust’s small founding group, according to Courant, who is also a professor of economics and public policy and a member of HathiTrust’s Board of Governors. It is the world’s largest digitized library collection with more than 10 million current volumes, 3.7 trillion pages and 8,467 tons of knowledge; Google Books, a wing of the online giant founded in 2004, digitized most of it.

President Emeritus James Duderstadt — current director of the University’s Millennium Project and the Program in Science, Technology and Public Policy — has watched digitization become part of the University’s culture since his tenure as president from 1988 to 1996. Two projects in the early 1990s, with the National Science Foundation and the Andrew W. Mellon Foundation, pioneered the concept.

Engineering alum Larry Page worked with the University on the NSF project in 1994 and 1995. In 2004, Page, now the CEO of Google, approached the University, offering to digitize its collections. Google would shoulder the costs, and no books would be destroyed in the process — then a major advancement in the business of digitization.

Google Books has more than 20 million volumes, according to Duderstadt, and they aim for 30 million. Nearly five million are from the University’s libraries.

After working with Google, the University collaborated with 26 other research universities to combine their individual digital collections in one venue, creating HathiTrust in 2008. “Hathi” is the Hindi word for elephant, a gentle giant famed for its impressive memory. Google digitized the work for universities involved in HathiTrust, and both the libraries and Google own a copy.

“Throughout most of human history we have rationed access to knowledge,” Duderstadt said. “But now it all comes to you. And in a world where knowledge is the ultimate power, we’ve kind of redefined how the world works.”

However, HathiTrust and Google Books have raised intricate legal questions. These organizations do not ask the permission of authors or publishers before digitizing their books, nor do they compensate either party. Paul Aiken, Authors Guild executive director, and others aren’t happy with this interpretation of copyright law.

“There are tens of thousands of out-of-print books that are becoming available again,” Aiken said. “These are still literary works under copyright, the result of thousands of hours of hard work, and they’re entitled to copyright protection.”

Legal strife

The Authors Guild filed a federal copyright infringement suit in September 2011 against HathiTrust. In October 2012, Federal District Court Judge Harold Baer ruled against AG, which is currently devising an appeal. Baer wrote in his ruling that digitization was a transformative act, or one that does not infringe upon copyright.

“I cannot imagine a definition of fair use that would ... terminate this invaluable contribution to the progress of science and cultivation of the arts ” Baer wrote in the decision. He defined “fair use” as copyrighted material that benefits the public in “scholarship, teaching and research.”

HathiTrust also allows readers to view books in large print or listen via text-to-voice technologies, which Baer championed as an expansion of options for print-disabled readers.

In Aiken’s opinion, copyright law does allow libraries to duplicate and digitize work for preservation reasons, but on a book-by-book basis rather than complete collections. Aiken said nothing states that a company like Google may keep a copy of the work as well.

“It’s not about going from one end of the stacks to the other,” Aiken said. “The digital copy is supposed to stay with the library.”

Law Prof. Jessica Litman — author of “Digital Copyright” — said that before digitizing, Google announced they would be undertaking the procedure and anyone who did not want their works digitized could opt out.