MD

The Statement

Saturday, December 20, 2014

Advertise with us »

HathiTrust: An author never forgets

By Rachel Premack, Daily Staff Reporter
Published February 18, 2013

Dean of Libraries Paul Courant tossed an academic journal on a table in his office. The earwax-colored front cover read: The National Tax Journal, March 1980.

“Read that for as long as you can before you get bored,” Courant, a silver-haired man with a tiny earring in his left ear said, smiling.

Like most writing published in the past 100 years, the journal was printed on acid paper, which quickly deteriorates. Its pages are already yellow around the edges.

Courant said these pages will have the consistency of corn flakes in 50 years. The knowledge it holds, too, could evaporate like soggy cereal — and so could countless other tomes.

And that’s just one problem HathiTrust tries to eliminate. The HathiTrust Digital Library, a four-year-old initiative led by the University and involving over 60 other research libraries, seeks to digitize the record of human knowledge.

A hathi never forgets

The University was part of HathiTrust’s small founding group, according to Courant, who is also a professor of economics and public policy and a member of HathiTrust’s Board of Governors. It is the world’s largest digitized library collection with more than 10 million current volumes, 3.7 trillion pages and 8,467 tons of knowledge; Google Books, a wing of the online giant founded in 2004, digitized most of it.

President Emeritus James Duderstadt — current director of the University’s Millennium Project and the Program in Science, Technology and Public Policy — has watched digitization become part of the University’s culture since his tenure as president from 1988 to 1996. Two projects in the early 1990s, with the National Science Foundation and the Andrew W. Mellon Foundation, pioneered the concept.

Engineering alum Larry Page worked with the University on the NSF project in 1994 and 1995. In 2004, Page, now the CEO of Google, approached the University, offering to digitize its collections. Google would shoulder the costs, and no books would be destroyed in the process — then a major advancement in the business of digitization.

Google Books has more than 20 million volumes, according to Duderstadt, and they aim for 30 million. Nearly five million are from the University’s libraries.

After working with Google, the University collaborated with 26 other research universities to combine their individual digital collections in one venue, creating HathiTrust in 2008. “Hathi” is the Hindi word for elephant, a gentle giant famed for its impressive memory. Google digitized the work for universities involved in HathiTrust, and both the libraries and Google own a copy.

“Throughout most of human history we have rationed access to knowledge,” Duderstadt said. “But now it all comes to you. And in a world where knowledge is the ultimate power, we’ve kind of redefined how the world works.”

However, HathiTrust and Google Books have raised intricate legal questions. These organizations do not ask the permission of authors or publishers before digitizing their books, nor do they compensate either party. Paul Aiken, Authors Guild executive director, and others aren’t happy with this interpretation of copyright law.

“There are tens of thousands of out-of-print books that are becoming available again,” Aiken said. “These are still literary works under copyright, the result of thousands of hours of hard work, and they’re entitled to copyright protection.”

Legal strife

The Authors Guild filed a federal copyright infringement suit in September 2011 against HathiTrust. In October 2012, Federal District Court Judge Harold Baer ruled against AG, which is currently devising an appeal. Baer wrote in his ruling that digitization was a transformative act, or one that does not infringe upon copyright.

“I cannot imagine a definition of fair use that would ... terminate this invaluable contribution to the progress of science and cultivation of the arts ” Baer wrote in the decision. He defined “fair use” as copyrighted material that benefits the public in “scholarship, teaching and research.”

HathiTrust also allows readers to view books in large print or listen via text-to-voice technologies, which Baer championed as an expansion of options for print-disabled readers.

In Aiken’s opinion, copyright law does allow libraries to duplicate and digitize work for preservation reasons, but on a book-by-book basis rather than complete collections. Aiken said nothing states that a company like Google may keep a copy of the work as well.

“It’s not about going from one end of the stacks to the other,” Aiken said. “The digital copy is supposed to stay with the library.”

Law Prof. Jessica Litman — author of “Digital Copyright” — said that before digitizing, Google announced they would be undertaking the procedure and anyone who did not want their works digitized could opt out. The process of asking each author for permission would have been immense.

Litman said HathiTrust is a sort of project that copyright should not and does not illegalize.

“It seems to me that it was a very clear fair use argument,” Litman said. “The only thing I can speculate about is that the authors felt so strongly that the existence of a digitized copy was a dignitary wrong.”

In Litman’s view, the authors were inaccurate about this interpretation of the law, adding that organizations do not need to ask permission when digitizing a work.

Aiken argues that digitization is a duplication of work, which necessitates the copyright holder’s consent. Without this, some writers don’t profit from their work.

Edward Hasbrouck, co-chair of the Book Division in National Writer’s Union, said many authors support the creation of a digital copy of their writings. But the fact that they cannot give their permission is unlawful.

“If you have read many of the legal cases, Google Books and HathiTrust have tried to create an entirely false impression that authors oppose the scanning of the books and want to oppose digitization,” he said. “We very strongly endorse and support digital libraries.”

Many authors don’t agree with Google and HathiTrust bypassing them when digitizing works, which he feels denies authors and publishers their fair compensation.

“It’s profoundly disingenuous for Google to claim a benign public purpose in its efforts,” Hasbrouck said. “They are investing lots of money in this project because they can make lots of money in this purpose.”

According to Courant, however, most of the works in HathiTrust lack profit possibilities.

“Most of the books in the HathiTrust are long, long out of print,” Courant said. “Nobody’s been making any money to speak of in a long time. There really isn’t much of an income stream at risk here.”

While both Google Books and HathiTrust will take down works if asked, Hasbrouck said this is problematic. For instance, an academic author with hundreds of works would have to demand each work be removed. It may be an endless battle if projects like Google Books goes international.

The HathiTrust case is an echo of a previous class action suit filed in 2005 by the AG and the Association of American Publishers against Google Books.

The agreement reached involved a retail product of Google’s digitized works, Courant said, where 37 percent of profits went to Google and the rest to the right holders. The product required that readers would pay to read online works and those readers affiliated with certain research institutions could access works for free.

The U.S. Court of Appeals rejected the settlement in 2008 on the basis that too many people — holders of books not in AG or AAP — were not represented, adding settlement ought to represent the opinions of all people who would be affected by it.

Hasbrouck said the settlement tapped into the question of whether the author or publisher holds electronic rights.

Google and AAP have since settled out of court, while Google and AG are still at odds. Google declined to comment due to ongoing litigation.

Aiken stressed that this settlement would have allowed AG to distribute their work to libraries and students in an equitable way.

“It (mass digitization) should be done by contractual agreements,” Aiken said. “It should be done so the value of the books are recognized by the owners of the works.”

Pamela Samuelson, director of the Berkeley Center for Law and Technology, said the 2012 ruling in favor of HathiTrust improves the outlooks for Google’s litigation.

“The underlying issue of fair use is pretty similar, so I would think that if the HathiTrust victory is affirmed by the Second Circuit Court — good news for Google,” Samuelson said.

Who holds the rights?

Even once it’s decided if the consent of right holders — whoever can license a work for certain uses, often the author or publisher — is needed for digitization, two little words destroy the concordance: orphan works. These are works that have no owner because the publisher is out of business, the author is deceased or there are other complications.

Samuelson said these works might be digitized after “due diligence” has been served to find the copyright holders.

That may involve researching the author, the publisher, the relatives of the author and so on, but the legal term is subjective.

The University, launched a now-suspended Orphan Works Project that aimed to make works with unknown rights holders available. Aiken pointed out that many works that the University determined to be orphaned had living authors or were still in print.

“This is interference with the authors’ commercial rights. It’s not the job of a university to make the decisions about these rights,” Aiken said.

Courant affirmed that the initial process of orphan-work identification was flawed, and the project was suspended.

“No orphan works — not one — were made available to readers in error,” Courant said. “However, it's now clear that reliable identification of orphan works is difficult and costly.”

U.S. law dictates that 70 years after the author’s death or if the rights are otherwise waived, the work enters public domain. Two-thirds of HathiTrust’s contents are not accessible for full reading for this reason, as these works retain copyright and cannot be freely distributed. In the HathiTrust, public domain works are fully viewable, a practice Aiken affirms as legal and approved by writers.

Works that still retain copyright, however, may be searched, and users can see the pages where their keyword exists.

Google Books often provides a “snippet” of text to show the keyword in context of the searches. Both providers show the book title, author’s name and other basic information about the text.

Hasbrouck said this prevents readers from using the book itself. For instance, an author may offer the book on his or her ad-supported website that readers could use, and the author would get money from ad revenue on the website.

With HathiTrust, this is money lost. Google’s “snippet” technique is even more undermining.

“Looking at the index does substitute for looking at the book itself,” Hasbrouck said. “There’s been an attempt to portray it as something that has no economic consequence for authors at all.”

Only individuals who are scanning books to develop the archive can read the copyrighted books in full.

Laine Farley, executive director of the California Digital Library, said HathiTrust is spawning new academic studies even with books that aren’t fully accessible.

“We’re already seeing evidence of new types of scholarship that can come out of access to these works,” Farely said. “If they’re not (readable), just knowing that they exist has increased the ability of our scholars as well as the public to find those materials.”

Twenty-first century enlightenment

While authors, publishers and the entities who seek to publicize their work slog through legal disputes, technology surges ahead.

“I sometimes say that, every once in a while, this University stumbles across something that changes the world,” Duderstadt said. “This is one of those world-changing things.”

Duderstadt hailed HathiTrust, along with massive open online courses, as a factor in the current “21st century Enlightenment.”

“Now if you have a cell phone, you not only have access to millions and millions of volumes of books’ knowledge, but you have access to learning capability for free,” Duderstadt said. “We’re providing access, not only to knowledge, but to learning for the world.”

Duderstadt recalled a friend, who recently relocated to China, having his furniture moved in his new country when one of the movers approached him.

“You know, I’m taking a course in computer programming from MIT,” the mover said. “It’s pretty hard.”

Duderstadt’s friend replied, “Well, why are you moving furniture if you’re taking this course?”

“Well, five years from now I don’t wanna be a furniture mover,” he replied. “I wanna be a computer programmer.”