Controversy is brewing over the Automated Content Access Protocol (ACAP)
standards initiative, which is intended to automate the process of licensing
content rights to search engines like Google, Yahoo, and MSN. ACAP is
being driven by the World Association of Newspapers, European Publishers
Council, and International Publishers Association -- that is, mainly
international trade association umbrella organizations. Participants
include many leading rights licensing organizations as well as publishers, wire
services, and other content industry trade associations. ACAP held a
conference in London last Tuesday to
report on progress, and although the initiative has made significant progress in
its technical work, the three major search engines are showing few signs of
interest in cooperating.
ACAP sounds like a great idea: instead of the search engines indexing
whatever they want and publishers responding through litigation -- as has
certainly been the case thus far -- why not agree on a machine-readable language
for expressing the rights that publishers want search engines to have.
Such rights would include the right to crawl (index) the content, to display
it in various forms in search results, to attribute its source appropriately (a
la Creative Commons), and even to implement certain access models such as "first
click free." Rights could be extended to content that is currently within
closed environments and possibly encrypted, and thus not available to search
engines; this would make publishers more comfortable about having their content indexed, add to the amount of content available to search engines, and ultimately make search engines more useful. On the other hand, there could also be rules that cover mandatory de-indexing of
content, such as when a newspaper article must be pulled for legal reasons or is more than 7 days old and goes into
a paid archive . We were not alone when we called for this type of scheme
in late
2005.
The ACAP technical working group has done a comprehensive job so far in
identifying relevant rights licensing use cases and existing technologies that
could be used, extended, or complemented to solve the problem. The latter
question has ended up focusing on the Robots Exclusion Protocol (REP, a/k/a
robots.txt), a popular language used in web page headers that enables web
publishers to block search engines from indexing web pages. Other existing
technologies such as Robots metatags are
being considered. ACAP messages could be carried along with content through such standard means as RSS, NewsML, or Adobe's XMP metadata framework.
ACAP is hoping to extend REP so that there is a standard set of semantics for
content usage rules that cover cases such as those mentioned above, so that
REP statements enable specification of search engine access to a finer grain
(e.g., types of content items, types of services that may or may not exercise
the right), and other extensions.
The search engine companies argue that REP suffices for the intended purposes
as it is and does not need to be extended. This argument is disingenuous.
To begin with, REP is not a standard that is officially maintained by any
standards body; its semantics are not uniform; and its use is not consistent.
Beyond the protocol itself, there are legal and business issues behind the
search engines' objections to ACAP. Although ACAP does not call for
encrypting or obfuscating content -- and therefore its use would be strictly
voluntary -- the search engines view it as a way of publishers exerting control
over content rights that exceeds what copyright law specifies, such as Fair Use
in the US (or Fair Dealing in the UK, Canada, and Australia). This is a
legitimate concern. But more generally, search engines like Google are
bent on achieving control over online content supply chains, and a scheme like
this threatens that control.
The battle for control of content value chains among publishers, network
service providers, device makers, and other parties will go on forever.
Still, couching the search companies' arguments in subterfuges over technical
specs sounds like hypocrisy to us. Technologists that speak out in favor
of ambiguity and inconsistency are doing themselves as much of a disservice as
publishers who think they can rely entirely on the legal system to control
rapidly expanding technology. Technologists who pride themselves on
precision and quality ought not to look the other way when it's expedient for
business reasons.
If we have learned nothing else throughout the explosion of digital content
technologies, it is that technology will out -- and there's very little anyone
can do to stop it. Now that at least some publishers are finally starting
to embrace sensible technological solutions to online rights licensing --
instead of over-engineered and impractical standards, litigation, or lobbying --
the technology community ought to participate fully.