Some libraries close books to Google, Microsoft

Indexing on competitive search engines remains an obstacle

Some libraries are choosing to pay to have their content digitized by the Open Content Alliance rather than having it scanned for free by Google or Microsoft, which refuse to allow access to the materials by rival search engines.

The Boston Library Consortium (BLC) is teaming with the Open Content Alliance (OCA) to build a library of digital materials that will be freely available via the Internet.

The BLC is composed of 19 academic and research libraries in Massachusetts, Connecticut, New Hampshire and Rhode Island. The consortium is digitizing all its content published before 1923. Content published before that date is considered in the public domain and not subject to copyright laws.

The cost for digitizing is US$0.10 per page, and the BLC is funding the effort at a cost of US$845,000 over two years. The work is also being supplemented by the OCA, which received a US$2 million grant from the Alfred P. Sloan Foundation. Part of that grant will be used to digitize the John Adams Collection at the Boston Public Library, a member of the consortium.

The OCA was developed by the Internet Archive and search company Yahoo in early 2005 as a way to preserve a variety of content, such as digitized collections and multimedia. Yahoo doesn't have a stand-alone book-search service.

The issue involves access to the digitized material. Search companies such as Google and Microsoft will scan the books for free, but want to restrict access for competitive reasons. The consortium wants access to its books available to anyone and in any search engine.

BLC Executive Director Barbara Preece said her organization selected the OCA because it kept the content search-engine neutral.

The OCA allows "you to hold onto your content and do whatever you want to do to your content, and it can be searched by any search engine whatsoever," Preece said. "OCA was the best way for us to go to keep our content open. Google pretty much decides who you can share your content with. With OCA, it doesn't matter what search engine you use to search the material. Google and Microsoft are interested in search, and the OCA is more interested in content and helping libraries handle their content the way they want to."

Google spokesman Gabriel Stricker said the company designed its Book Search to promote the sharing and use of the content the company is digitizing, where appropriate. He said for books in the public domain, Google provides full access to the material, including the ability to read a book in its entirety, download a PDF to a computer and print a work for free. He said there are restrictions for books still under copyright to ensure that copyright holders are protected.

"The libraries we work with receive copies of all the digital files that they can use to serve their students, faculties and partners," Stricker said in an e-mail. He added that libraries are also free to work with other organizations to digitize their content. Stricker did not directly respond to concerns that Google refuses to allow the material it digitizes to be available through other search engines.

Jay Girotto, group program manager for Microsoft's Live Book Search, said his company has been involved with the OCA since October 2005.

"Microsoft put in much more than US$2 million to fund the creation of a mass digitization program that could actually work," Girotto said. "We digitized about 100,000 books under the OCA principles, and we were hoping there would be other significant financial contributors." However, that didn't happen, he said.

"We saw many people in the library community willing to adopt Google's more restrictive stance around book search and sign up with Google, and we were faced with a decision about what to do," he said. "We were essentially providing most of the capital that was building out [the program], but there were really no restrictions on Google taking the output of the process -- the image file, the [optical character recognition] file and the metadata -- and simply having the same use to it that Microsoft had."

Girotto said Microsoft last November decided to put one restriction on the use of the material it was digitizing, which was that the material couldn't be used by its commercial competitors, including Google, Yahoo and Ask.com. But Microsoft still doesn't restrict distribution of copies of the books it digitizes for academic use among institutions, he said, although Google maintains this restriction.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about GoogleHISIslandLibrary of CongressMicrosoftVIAYahoo

Show Comments
[]