A handful of interesting technological elements sit at the heart of the massive, sprawling settlement that was announced Tuesday between Google and the book publishing industry, ending three years of litigation. Although the settlement has yet to be approved by the court and many details have yet to be worked out, the agreement is notable both for what it does and what it does not contain.
The most significant technological component of the proposed settlement is Article VI, which establishes a Book Rights Registry (BRR) that Google -- and any other parties that wish to -- will use to drive the business models contemplated in the settlement and to compensate rights holders. Google will spend over US $30 Million (out of $125 Million total payments) to build the BRR.
The BRR will be a separate, non-profit organization that runs a database of information about works and rights holders and determines payments to be made for licensed uses. If this sounds familiar, it ought to: this is roughly the same function as rights collecting societies with online capabilities, such as CCC (Copyright Clearance Center). In fact, one could argue that this proposed settlement sets up BRR to compete with CCC -- if not now, then in the future as both entities' interests in online business models involving rights licensing expand.
For the time being, however, BRR and CCC remain separate and distinct. The BRR's function will be to administer the relatively narrow range of business models for book content that the settlement defines. These include advertising revenue shares, the ability for users to purchase rights to access the full online book, and subscription models for libraries. Consumer online access will include the ability to copy and paste up to four pages at a time and print up to 20 pages at a time; printed pages will appear with visible transactional watermarks that identify the user or device, with encrypted identifiers to protect privacy.
The agreement contains nonbinding language that contemplates additional revenue models in the future, such as print-on-demand, per-page pricing for custom publishing (aggregation of portions of books or chapters for the educational market), PDF downloads, consumer subscription models, and a very narrow set of derivative works. In other words, it sets up Google on a course to compete with Amazon.com as well as educational content aggregators like CourseSmart.
Google, of course, gets a share of all of these revenue models, typically 37%. Publishers' participation in these business models is strictly "opt in," except that the settlement makes broad provisions for so-called orphaned works to be included by default. The settlement also provides for limited access to the entire repository from public and school libraries throughout the country.
The settlement also lays out rules for book content to be displayed in search results, with a nominal limit of 20% that users can view freely. This begs the question of whether the lawyers negotiating the settlement considered a finer-grained way for publishers to specify Google's rights to index content and display it in search results, a la the proposed ACAP standard. When we asked Google's attorney, Alex McGillivray, about this at a well-timed and star-studded seminar on Internet copyright last night, he said that he considered ACAP to be a potential solution to a different problem (this is true: ACAP is primarily intended for time-critical news content) but suggested that the parties were familiar with ACAP and used some ideas from it. He echoed Google's official position on ACAP that the current Robots Exclusion Protocol (a/k/a Robots.txt) technology is sufficient for publishers to prevent search engines from crawling their web content.
Another interesting part of the proposed settlement from a technological perspective is the responsibility that it confers on Google and other participating content repositories to secure access to digitized book content. A 19-page attachment to the proposed settlement outlines content security standards that must be followed, including rights metadata, image watermarking, and encryption of digitized files. The agreement requires the security standards to be revisited every two years.
In all, the proposed settlement solves the major problem that book publishers have had with the web, which we have called the discoverability paradox: the content that some would argue is most desirable because it has been professionally developed, edited, and produced is not discoverable online. This settlement should go a long way towards solving that problem, which will make life better for users while compensating rights holders. As Jonathan Zittrain of Harvard Law School said in his closing remarks at last night's panel, publishers and authors need to find ways to profit from abundance of their content, not scarcity. Their settlement with Google certainly points in that direction.