Process models and licensing and copyright models

From LiquidPubWiki

Jump to: navigation, search

Contents

Introduction

The advent of the Web 2.0 or the Social Web has triggered a profound transformation of the process of knowledge production and dissemination: the creation, maintainance, control and sharing of information has become a decentralised, i.e. distributed, collaborative and peer-monitored, process involving global networks of both professionals and volunteers. This has provoked, at least in the last 30 years, a major change in the institutional framework and the social practices that are associated with the production and dissemination of information that has lead today to an "information networked economy" (cf. Benkler 2006) whose main rules and principles are wholly different from those of traditional economies.

Information is a nonrival ressource, that is, the consumption by one person does not make it any less available for the consumption of another. Note that this is not a property of new, networked means of information production: it is an exception that exists from the onset in the market of informational and cultural goods: a new reader of Shakespeare's sonnets does not harm any previous reader or prevents more readers to access it. The specificity of information markets may be expressed economically in terms of marginal costs: the increased production of new products does not affect, or poorly affetcs, the marginal costs of the production. Hence, an exceptional legal apparatus of protection of information and culture markets, in terms of property rights and copyright, has always been necessary in order to assure control and concentration of means of productions for the actors of information markets. This ensemble of legal techniques of control has different histories in the three main domains concerned with the market of ideas: science, authorship and patenting (cf. Biagioli 2009). While these three forms of protection of intellectual creativity were clearly distincted in history, today some distinctions have been blurred and we are facing a very confused legal framework, whose effects become often harmful for the actors it is supposed to protect. (cf. Harnad, 2000).

For example, many studies (cf. Lerner 1999) reveal that the impact of intellectual property of patents on innovation is fairly limited: information is both the input and the output of its own production process (the famous "on the shoulders of giants" effect), that is, new information goods or innovation builds on existing information. The increase of patent protections in the last 150 years thus increases the cost for the current innovators to access existing knowledge, hence decreasing their potential of creativity. This effect is oddly true today also for the access of scientific knowledge, even if, historically, the status of scientific authorship and that of the inventor were clearly distincted. Scientific authorship is different from other forms of authorship because it doesn't result in any property rights for the author: a scientific discovery cannot be copyrighted by an author: historically, the author may claim only the "priority" of his or her discovery and, once the discovery has been peer-reviewed by the community, share it in the "public domain". Now, today, the domain of research papers is not public, but regulated by copyright priviledges credited to the publishing companies, with possible effects that could be similar to those cited in the world of patenting, that is, that reduced access to previous information results in reduced creativity and originality. With the advent of the Social Web, the notion of "public domain" as a space for sharing scientific knowledge has re-emerged in the discussion. Still, today the space of knowledge sharing is not a res nullius, as the public domain used to be, but an organized informational space, whose networked capacities have to be defended as "common goods" of the humanity. Hence, we're facing a transition today from a conception of free circulation of ideas in the the empty space of the public domain, to a more cooperative idea of sharing knowledge as a common good through a common ressource whose use must be regulated, that is, the Web.

This State of the Art assesses the impact of the Social Revolution of the World Wide Web both on the production and distribution scientific knowledge, with a special focus on intellectual property rights. The first part provides an overview on the different innovative features and services of the Web 2.0 and gauges its potential for scientific research. The second part reviews and differentiates the notions of copyright, scientific authorship, and e-commons. The latter is defined as comprising both works that are in the public domain and works that are "copylefted", i.e. licensed by their authors to be copied, shared or modified by anyone under the same terms as those set by the license. Different copyleft licenses will be presented, a.o. those provided by Creative Commons. After discussing the relationship between scientific research and the e-commons, we will describe the changes of the copyright and licensing practices in the scientific publishing industry. An overview of the main Open Access models concludes this State of the Art.

The whole first part of this State of the Art is based on research by Giuseppe Veltri (2008); Sections 3.1 to 3.2 are by Luc Schneider, with contributions from Gloria Origgi and Roberto Casati. Section 3.4 on copyright and licensing practices, as well as business models, in the scientific publishing industry are by Diego Ponte.

The impact of the Web 2.0 on the Research Process

Introduction

Over the last few years, the take-up of social computing applications has been impressive. These digital applications are defined as those that enable interaction, collaboration and sharing between users. They include applications for blogging, podcasting, collaborative content (e.g. Wikipedia), social networking (e.g. MySpace, Facebook), multimedia sharing (e. g. Flickr, YouTube), social tagging (e. g. Deli.cio.us) and social gaming (e. g. Second Life) (cf. Figure 1).

Figure 1: The Conversation Prism by Brian Solis

The importance of social computing has been acknowledged the business community, the academic community and by the public opinion at large. It is considered to be a potentially disruptive ‘Information Society’ development, in which users play an increasingly influential role in the way products and services are shaped and used. This may have important social and economic impacts on all aspects of society. There is, however, little scientific evidence on the take-up and impact of social computing applications. The objective of the first part of this State of the Art is to provide a systematic assessment of the principles and potential research uses of social computing applications.

An inventory of the Social Web and Social Computing

Social networks

Social network sites represent a fundamental layer in the complex phenomenon of the Social Web. We define social network sites (SNSs from now on) as web-based services that allow individuals to

  1. Construct a public or semi-public profile within a bounded system,
  2. Articulate a list of other users with whom they share a connection,
  3. View their list of connections and those made by others within the system.

The nature and nomenclature of these connections may vary from site to site.

While we use the term "social network site" to describe this phenomenon, the term "social networking sites" also appears in public discourse, and the two terms are often used interchangeably. We chose not to employ the term "networking" for two reasons: emphasis and scope. "Networking" emphasises relationship initiation, often between strangers. However, what makes social network sites unique is not that they allow individuals to meet strangers, but rather that they enable users to articulate and make visible their “offline” social networks, or “latent ties” (Haythornthwaite, 2005).

While SNSs have implemented a wide variety of technical features, their backbone consists of visible profiles that display an articulated list of Friends who are also users of the system. Profiles are unique pages where one can "type oneself into being" (Sundén, 2003, p. 3). After joining an SNS, an individual is asked to fill out forms containing a series of questions. The profile is generated using the answers to these questions, which typically include descriptors such as age, location, interests, and an "about me" section. Most sites also encourage users to upload a profile photo. Some sites allow users to enhance their profiles by adding multimedia content or modifying their profile's look and feel. Others, such as Facebook, allow users to add modules ("Applications") that enhance their profile.

The visibility of a profile varies by site and according to user discretion. Structural variations around visibility and access are one of the primary ways that SNSs differentiate themselves from each other.

After joining a social network site, users are prompted to identify others in the system with whom they have a relationship. Most SNSs require bi-directional confirmation for Friendship, but some do not. The term "Friends" can be misleading, because the connection does not necessarily mean friendship in the everyday vernacular sense, and the reasons people connect are varied.

The public display of connections is a crucial component of SNSs. The Friends list contains links to each Friend's profile, enabling viewers to traverse the network graph by clicking through the Friends lists. On most sites, the list of Friends is visible to anyone who is permitted to view the profile, although there are exceptions.

Most SNSs also provide a mechanism for users to leave messages on their Friends' profiles. This feature typically involves leaving "comments," although sites employ various labels for this feature. In addition, SNSs often have a private messaging feature similar to webmail. While both private messages and comments are popular on most of the major SNSs, they are not universally available.

Beyond profiles, Friends, comments, and private messaging, SNSs vary greatly in their features and user base. Some have photo-sharing or video-sharing capabilities; others have built-in blogging and instant messaging technology. There are mobile-specific SNSs (e.g., Dodgeball), but some web-based SNSs also support limited mobile interactions (e.g., Facebook, MySpace, and Cyworld). Many SNSs target people from specific geographical regions or linguistic groups, although this does not always determine the site's constituency. Orkut, for example, was launched in the United States with an English-only interface, but Portuguese-speaking Brazilians quickly became the dominant user group. Some sites are designed with specific ethnic, religious, sexual orientation, political, or other identity-driven categories in mind. There are even SNSs for dogs (Dogster) and cats (Catster), although their owners must manage their profiles.

While SNSs are often designed to be widely accessible, many attract homogeneous populations initially, so it is not uncommon to find groups using sites to segregate themselves by nationality, age, educational level, or other factors that typically segment society even if that was not the intention of the designers.

Currently, there are no reliable data regarding how many people use SNSs, although marketing research indicates that SNSs are growing in popularity worldwide (comScore, 2007). The rise of SNSs indicates a shift in the organization of online communities. While websites dedicated to communities of interest still exist and prosper, SNSs are primarily organized around people, not interests. Early public online communities such as Usenet and public discussion forums were structured by topics or according to topical hierarchies, but social network sites are structured as personal (or "egocentric") networks, with the individual at the center of their own community. This more accurately mirrors unmediated social structures, where "the world is composed of networks, not groups" (Wellman, 1988, p. 37). The introduction of SNS features has introduced a new organizational framework for online communities, and with it, a vibrant new research context.

Content Creation in Social Computing

The winning principle: harnessing collective intelligence

The central principle behind the success of the giants born in the Web 1.0 era who have survived to lead the Web 2.0 era appears to be this, that they have embraced the power of the web to harness collective intelligence.

As users add new content, and new sites, it is bound in to the structure of the web by other users discovering the content and linking to it. Much as synapses form in the brain, with associations becoming stronger through repetition or intensity, the web of connections grows organically as an output of the collective activity of all web users.

Yahoo!, the first great internet success story, was born as a catalog, or directory of links, an aggregation of the best work of thousands, then millions of web users. Google's breakthrough in search, which quickly made it the undisputed search market leader, was PageRank, a method of using the link structure of the web rather than just the characteristics of documents to provide better search results.

eBay's product is the collective activity of all its users; like the web itself, eBay grows organically in response to user activity, and the company's role is as an enabler of a context in which that user activity can happen. What's more, eBay's competitive advantage comes almost entirely from the critical mass of buyers and sellers, which makes any new entrant offering similar services significantly less attractive.

Amazon sells the same products as competitors such as Barnesandnoble.com, and they receive the same product descriptions, cover images, and editorial content from their vendors. But Amazon has made a science of user engagement. They have an order of magnitude more user reviews, invitations to participate in varied ways on virtually every page--and even more importantly, they use user activity to produce better search results. While a Barnesandnoble.com search is likely to lead with the company's own products, or sponsored results, Amazon always leads with "most popular", a real-time computation based not only on sales but other factors that Amazon insiders call the "flow" around products. With an order of magnitude more user participation, it's no surprise that Amazon's sales also outpace competitors. Now, innovative companies that pick up on this insight and perhaps extend it even further, are making their mark on the web.

Wikipedia, an online encyclopedia based on the unlikely notion that an entry can be added by any web user, and edited by any other, is a radical experiment in trust, applying Eric Raymond's dictum (originally coined in the context of open source software) that "with enough eyeballs, all bugs are shallow," to content creation. Wikipedia is already in the top 100 websites, and many think it will be in the top ten before long. This is a profound change in the dynamics of content creation!

Sites like del.icio.us and Flickr, two companies that have received a great deal of attention of late, have pioneered a concept that some people call "folksonomy" (in contrast to taxonomy), a style of collaborative categorization of sites using freely chosen keywords, often referred to as tags. Tagging allows for the kind of multiple, overlapping associations that the brain itself uses, rather than rigid categories. In the canonical example, a Flickr photo of a puppy might be tagged both "puppy" and "cute"--allowing for retrieval along natural axes generated user activity.

Collaborative spam filtering products like Cloudmark aggregate the individual decisions of email users about what is and is not spam, outperforming systems that rely on analysis of the messages themselves. It is a truism that the greatest Internet success stories don't advertise their products. Their adoption is driven by "viral marketing"--that is, recommendations propagating directly from one user to another. You can almost make the case that if a site or product relies on advertising to get the word out, it isn't Web 2.0.

Even much of the infrastructure of the web--including the Linux, Apache, MySQL, and Perl, PHP, or Python code involved in most web servers--relies on the peer-production methods of open source, in themselves an instance of collective, net-enabled intelligence. There are more than 100,000 open source software projects listed on SourceForge.net. Anyone can add a project, anyone can download and use the code, and new projects migrate from the edges to the centre as a result of users putting them to work, an organic software adoption process relying almost entirely on viral marketing.

Main inovative features of Social Computing

Blogging

One of the most highly touted features of the Web 2.0 era is the rise of blogging. Personal home pages have been around since the early days of the web, and the personal diary and daily opinion column around much longer than that, so just what is the fuss all about?

At its most basic, a blog is just a personal home page in diary format. But as Rich Skrenta notes, the chronological organization of a blog "seems like a trivial difference, but it drives an entirely different delivery, advertising and value chain." One of the things that has made a difference is a technology called RSS. RSS is the most significant advance in the fundamental architecture of the web since early hackers realized that CGI could be used to create database-backed websites. RSS allows someone to link not just to a page, but to subscribe to it, with notification every time that page changes. Skrenta calls this "the incremental web." Others call it the "live web".

Now, of course, "dynamic websites" (i. e., database-backed sites with dynamically generated content) replaced static web pages well over ten years ago. What's dynamic about the live web are not just the pages, but the links. A link to a weblog is expected to point to a perennially changing page, with "permalinks" for any individual entry, and notification for each change. An RSS feed is thus a much stronger link than, say a bookmark or a link to a single page.

The number of blogs has doubled every 5-7 months for the last 3 years. Worldwide, in absolute numbers, in October 2006, Technorati was tracking over 50 million Blogs. The number increased to 70 million blogs in April 2007. 120,000 new blogs are created daily - that's about 1.4 blogs created every second of every day. According to Technorati, in October 2008, this figure went up to more than 133 million blogs.

A mapping of the distribution of blogs by language could give an indication of the relative sizes of some individual language-blogospheres. For instance, the Japanese-language blogosphere leads with 37% (up from 33% in Q3, 2006) of the posts, followed closely by the English-language blogosphere at 36% (down from 39% in Q3, 2006). There has been slight decrease in the number of English-language posts (33% in March 2007 from 36% in October 2006). The Italian-language blogosphere has overtaken the Spanish as the 4th largest. The newcomer to the top 10 languages is Farsi, ranked as the 10th.

Counting blogs based on the country of origin is difficult due to the worldwide phenomenon of people using Anglo-Saxon (US and UK) blogging hosts. A study, Hurst, M., Siegler, M., Glance, N. (2007), (puts forward a comparison between the geographical location of bloggers and the language in which the blogs are written. While almost 40% of blogs are written in English (according to Technorati), some 42% of the bloggers claim a location in an English-speaking country. Likewise, 38% of the bloggers claim a Chinese location, while only 10% of the blogs are written in Chinese.

Podcasting

A podcast can mean either the content itself or the method by which the content is distributed; the latter is also termed podcasting. Podcasts are produced either by 'professional' podcasters or 'private' podcasters (i. e. podcasts created by people, such as bloggers and individual podcasters) and an increasing number of uses are being found for podcasts. In this research, we refer to both podcast content and method. The number of podcasts is difficult to estimate. According to IDATE research released in July 2007, the estimated number of podcasts to date is over 100,000, when only three years ago, there were fewer than 10,000.104 Statistics on the amount of podcast content and podcast feeds are made available by podcast directories worldwide. Apple iTunes, for instance (see Figure 19), counted over 82,000 podcasts in their directories105 in 2006 (representing a 10 fold increase since 2005).

In terms of the number of podcast feeds, in the US for instance, Feedburner reported more than 40,000 podcast feeds under its management in 2006. In 2006, the creation of podcast feeds averaged 15% growth month over month. In August 2007, the figure went up to almost 1 million feeds from more than 500,000 bloggers, podcasters and commercial publishers, currently serving 128,358 podcast feeds (as of 4 August 2007).

In 2008, the Pew Internet & American Life Project found that 19% of US Internet users have downloaded a podcast for listening at a future point in time, compared to some 7 in an earlier 2007 survey and 12% in following survey in the second half of the same.

Social Tagging

Tagging describes the act of adding keywords, also known as tags, to any type of digital resource. Tags serve to describe the item and enable a keyword-based classification (knowledge management). They can also be used to search for content. The types of content that can be tagged varies from: blogs (Technorati), books (Amazon), pictures (Flickr), podcasts (Odeo), videos (YouTube), to even tagging of tags. Tags are not only metadata, but also content. Tagging also allows social groups to form around similarities of interests and points of view, hence the term social tagging. Social tagging is one of the web 2.0 success stories, tapping into the 'wisdom of crowds' - i. e it lets users connect with others, enabling social discovery and connections. Social tagging leads the way towards a semantic web, in bringing in a meaningful and personal search experience.

There has recently been a dramatic increase in the number of pictures tagged with geographical metadata (a method called geotagging or geocoding). Geotagging of photos brings a whole new level of context to images. Flickr's vision on the future of geotagging is “show me photos taken within the last 15 minutes within a kilometre of me. ”197 2 million photos were geotagged in Flickr in 2006 (more than 1.2 million photos were geotagged the next day after the feature was available in Flickr in 2006). In 2006, Flickr users have added, on average, over one million tags per week to the dataset. 198 Flickr allows users to drag photos on to a Yahoo map and mark them with a specific worldwide location. Zooomr is another photo sharing service that provides a geotagging tool (Google maps are used instead). As of August 2007, there are 2.6 million geotagged photos in Flickr (up from 1.6 million one year ago) (Butterfield, 2007) In Febrary 2007, Technorati was tracking over 230 million blog posts using tags or categories.

The use of tagging comes in many forms. Photo sharing sites like Flickr allows users to add labels to pictures, and video-sharing sites such as YouTube to tag videos, and Amazon uses tags to classify a product. Google’s tagging feature is called “bookmark,” though it applies the principles of tagging. Last.fm supports user-end tagging or labelling of artists, albums, and tracks to create a site-wide folksonomy of music. Users can browse via tags, and tag radio to allow users to play music that has been tagged a certain way. The number of bloggers who are using tags is also increasing month on month. About 2.5 million blogs posted at least one tagged post in February 2007. According to Pew Internet & American Life, nearly a third of US Internet users have tagged or categorized content online such as photos, news stories or blog posts in 2006 (Pew Internet, 2007). Some 19% of US Internet users watching video online have either rated an online video or posted comments after seeing a video online (Pew Internet & American Life Online Video 2007).

The Architecture of Participation

A review of the most widely acknowledged principles and dynamics that are shaping the Web 2.0 constitutes a good starting point to comprehend the nature of the innovations that social computing can bring to knowledge creation and sharing.

“Architecture is politics”

Some systems are designed to encourage participation. In his paper, The Cornucopia of the Commons, Dan Bricklin (Bricklin, 2006) noted that there are three ways to build a large database. The first, demonstrated by Yahoo!, is to pay people to do it. The second, inspired by lessons from the open source community, is to get volunteers to perform the same task. The Open Directory Project, an open source Yahoo competitor, is the result. But Napster demonstrated a third way. Because Napster set its defaults to automatically serve any music that was downloaded, every user automatically helped to build the value of the shared database. All other P2P file sharing services has followed this same approach.

One of the key lessons of the Web 2.0 era is this: Users add value. But only a small percentage of users will go to the trouble of adding value to your application via explicit means. Therefore, Web 2.0 companies set inclusive defaults for aggregating user data and building value as a side effect of ordinary use of the application. As noted above, they build systems that get better the more people use them. Mitch Kapor once noted that "architecture is politics." Participation is intrinsic to Napster, part of its fundamental architecture.

This architectural insight may also be more central to the success of open source software than the more frequently cited appeal to volunteerism. The architecture of the Internet, and the World Wide Web, as well as of open source software projects like Linux, Apache, and Perl, is such that users pursuing their own "selfish" interests build collective value as an automatic by-product. Each of these projects has a small core, well-defined extension mechanisms, and an approach that lets any well-behaved component be added by anyone, growing the outer layers of what Larry Wall, the creator of Perl, refers to as "the onion." In other words, these technologies demonstrate network effects, simply through the way that they have been designed.

These projects can be seen to have a natural architecture of participation. But as Amazon demonstrates, by consistent effort (as well as economic incentives such as the Associates program), it is possible to overlay such architecture on a system that would not normally seem to possess it.

Technical devices enabling user added value

RSS also means that the web browser is not the only means of viewing a web page. RSS is now being used to push not just notices of new blog entries, but also all kinds of data updates, including stock quotes, weather data, and photo availability. But RSS is only part of what makes a weblog different from an ordinary web page. Tom Coates remarks on the significance of the permalink, the device that turned weblogs from an ease-of-publishing phenomenon into a conversational mess of overlapping communities. For the first time it became relatively easy to gesture directly at a highly specific post on someone else's site and talk about it.

In many ways, the combination of RSS and permalinks adds many of the features of NNTP, the Network News Protocol of the Usenet, onto HTTP, the web protocol. The "blogosphere" can be thought of as a new, peer-to-peer equivalent to Usenet and bulletin boards, the conversational watering holes of the early internet. Not only can people subscribe to each others' sites, and easily link to individual comments on a page, but also, via a mechanism known as trackbacks, they can see when anyone else links to their pages, and can respond, either with reciprocal links, or by adding comments.

Interestingly, two-way links were the goal of early hypertext systems like Xanadu. Hypertext purists have celebrated trackbacks as a step towards two way links. But note that trackbacks are not properly two-way--rather, they are really (potentially) symmetrical one-way links that create the effect of two way links. The difference may seem subtle, but in practice it is enormous. Social networking systems like Friendster, Orkut, and LinkedIn, which require acknowledgment by the recipient in order to establish a connection, lack the same scalability as the web. As noted by Caterina Fake, co-founder of the Flickr photo sharing service, attention is only coincidentally reciprocal. (Flickr thus allows users to set watch lists--any user can subscribe to any other user's photostream via RSS. The object of attention is notified, but does not have to approve the connection.)

Blogging as a filter harnessing collective intelligence

If an essential part of Web 2.0 is harnessing collective intelligence, turning the web into a kind of global brain, the blogosphere is the equivalent of constant mental chatter in the forebrain, of conscious thought. And as a reflection of conscious thought and attention, the blogosphere has begun to have a powerful effect. First, because search engines use link structure to help predict useful pages, bloggers, as the most prolific and timely linkers, have a disproportionate role in shaping search engine results. Second, because the blogging community is so highly self-referential, bloggers paying attention to other bloggers magnifies their visibility and power. The "echo chamber" that critics decry is also an amplifier.

If it were merely an amplifier, blogging would be uninteresting. But like Wikipedia, blogging harnesses collective intelligence as a kind of filter. What James Suriowecki calls "the wisdom of crowds" comes into play, and much as PageRank produces better results than analysis of any individual document, the collective attention of the blogosphere selects for value.

While mainstream media may see individual blogs as competitors, what is really unnerving is that the competition is with the blogosphere as a whole. This is not just a competition between sites, but a competition between business models. The world of Web 2.0 is also the world of what Dan Gillmor calls "we, the media," a world in which "the former audience", not a few people in a back room, decides what's important.

Patterns of participation

In order to understand social computing adoption, there is a need to see how people approach these technologies. Social computing is used not only by the few people posting blog entries, photos on Flickr and videos on YouTube, but by a large share of Internet users in many different ways. The present research28 confirms that, statistically, the pattern of participation in social computing follows what has been described as a power law distribution (R. Mayfield based on http://www.orgnet.com/BuildingNetworks.pdf).

Moreover, the behaviour of "passive users" is increasingly being explored via technological means. 30 Simply reading or using social computing content can leave traces which can be used (anonymously) as a way of sharing preferences and interests (practically 100% of Internet users). The intensity of online participation then diminishes gradually (as described by the Concentric Model of Participation Intensity (CPMI) to at least a third (30% - 40%) of Internet users using social computing content e. g. reading blogs, or watching user-generated videos on YouTube, listening to podcasts, visiting wiki sites, or visiting/using social networking sites. Some 10% of Internet users provide feedback (posting comments on blogs and reviews) or share content on Flickr, or YouTube, or tag content in deli.cio.us. Only around 3% of Internet users in Europe are “creators” e.g. they create blogs or Wikipedia articles, or upload their user-generated videos on YouTube or photos on Flickr.

People also switch between activities. For example, while reading blogs, they may also visit social networking sites, contribute to Wikipedia, or upload their photos on Flickr. The latest surveys from Forrester (see Figure 10) show that the so-called 'joiners' (representing, according to Forrester, about 20% of US adult online population and mostly comprising Generation 'Y' i. e. 18-25 year olds) do a variety of online activities. For example, apart from using social networking sites, 56% of them also read blogs, while 30% publish blogs.

Another important aspect of social computing is the move from an ‘in group’ dimension of use and computing to an ‘out group’ one. The developing of web applications that are designed to expand the range of collaboration is one of the main features of social computing and Web 2.0. The move is from an ‘in-group’, based on the peers locally available, to a ‘out-group’ dimension that allows cooperation with individuals that are not immediately part of our environment.

Conclusion

Collaboration is not strictly defined in a top-down process, setting a team and inviting individuals from already known work environments. Instead, a bottom-up process is central in many applications of the Web 2.0, in which people are ‘pulled’ towards projects or groups by common interests and aims. In this case, the structure of groups is fluid an in constant change, size can be large and collaboration is structured so that the raw power of big numbers can exploited, usually dividing large and complex tasks in small ones.

The Social Web and Research Process

Applying the potential of the Web 2.0 to the research process

A number of functionalities were identified as crucial in applying the potential of the Web 2.0 to the research process and they are the outcome of brainstorming sessions with researchers and experts panel at the Institute Nicod between September and November 2008. These were: to edit a document individually or in a group, in real time or not; to share with selected users or with the wider community of web users (the in group/out group dimension) integrating with existing social networks websites (CiteYouLike, Delicious, etc); to evaluate a document through commenting, ranking; to allow categorization through tags and therefore allow retrieval of similar documents; the capacity of uploading files of different format (word, PDF, rtf, etc.); implementation of reputation and history of reviewers; an open and modular systems of add-ons to incentive user-generated application and future functionalities.

From an exploration of currently available tools, there is not an all-in-one tool that has all the 'desiderata' functionalities (cf. Figure 2). Hence, the different steps of the research process can benefit only from different tools at different stages. The sets of functionalities above described are implemented in several Web 2.0 applications.

Figure 2: An Overview of Web 2.0 Services

Overview of functionalities

A first relevant separation is between brainstorming tools and collaborative writing ones. At the moment, there is no web 2.0 tool that comprehends brainstorming using 'virtual boards', diagrams and other mind-mapping tools with online writing collaborative tools. Web applications such as Thinkature (http://thinkature.com), Mindomo (http://www.mindomo.com) or MindMeister (http://www.mindmeister.com) do not provide online collaborative writing tools for documents but not only for diagrams, flow charts and mind mapping. Although these tools are useful at different stages of the drafting and writing process, a complete application will allow users to go back to their 'brainstorming' steps to re-think or re-elaborate about the document under writing process.

This leads us to the wider issue of a better integration between different writing tools such as text documents, spreadsheets, notebooks, slides, etc. Among the tools that allow a good level of integration, Zoho (http://www.zoho.com) offers a rather interesting example of the shape of things to come, but currently there are no productivity applications suites available that offers such integration (another example is ThinkFree, http://www.thinkfree.com, but it is not free to use).

Another relevant separation of functionalities is between collaborative writing tools and web 2.0 references sharing applications. For example, online applications as Google Docs do not provide any tool for a researcher to build up a relevant personal library and to share it with other members of a team or with the web. There are already available such services web 2.0 applications, for example CiteyouLike (http://www.citeulike.org) that is explicitly for academic researchers or LibraryThing (http://www.librarything.com) aimed to a more general public. CiteYouLike represents a interesting implementation of reference sharing and it allows a social evaluation of academic articles that is very handful when in the process of selecting and looking for papers on a given topic. Both CiteYouLike and Library Thing not only allow evaluation of papers or books, but they allow users to make recommendations and to access the library of other users that might have similar research or cultural interests.

The previous section points us to another important set of functionalities that is represented by the evaluation of a document, a draft or a book by a community of users or readers. Tools such as Scribd (http://www.scribd.com) or Docstock (http://www.docstoc.com) are the most common examples of websites constituted by a database of documents uploaded and evaluated by users. A rating and award system is implemented in both tools to produce a user-generated selection and promotion of the most valid documents, in addition texts can be commented and reviewed and shared on other website through links and embedded 'text reader' provided by both websites. Recently, the online suite of applications Zoho introduced Zoho Share (http://share.zoho.com/) that essentially replicates Scribd functionalities in the Zoho enviroment but that it is integrated with Zoho Writer, Zoho Spreadsheet and other applications.

Of particular interest is the possibility to attribute reputation 'credits' to commentators and reviewers as incentives to a wider and regular process of social evaluation and to track commentators and reviewers history. There are few web 2.0 tool currently available for such task and the most interesting are: Intense Debate (http://www.intensedebate.com); coComment (http://www.cocomment.com) and SezWho (http://sezwho.com). These tools aim to provide a sort of unified profile and history for commentators and reviewers and they want to create a cross community reputation and rating service.

Currently, the main problem is that the lacks of connection between communities, therefore reviewers and commentators have multiple identities and their reputation is fragmented in different arenas. Tools such as IntenseDebate or SezWho are meta-reputational databases that should allow users to retain their history of commenting and reviewing on any social web site they want to interact with. Portability of reputation history and ID are likely to be crucial issues in the developing of the Social Web, see for example the initiative of OpenID (http://openid.net).

The last functionality that would be desirable to have implemented and that does not have almost any implementation of tools available is dataset sharing. One of the few tools is represented by Swivel (http://www.swivel.com/), a web 2.0 website that allows datasets sharing and basic statistic manipulations that can be saved and shared with other users. The underlying idea is to let people explore data and 'play' with it so that secondary data analysis can be shared and pursued almost as an 'hobby'. The great limitation of this tool is that datasets from public institutions (such as UN, OECD, etc) are available but there are almost none universities involved. It is designed to involve all community members and does not allow 'private sharing'.

On the contrary, a current undergoing project, 'Dataverse' (http://thedata.org/), at Harvard University is working exactly on datasets sharing. The description of the project is: 'The Project is an open-source software development community, housed at the IQSS. Via web application software, data citation standards, and statistical methods, the Dataverse Network project increases scholarly recognition and distributed control for authors, journals, archives, teachers, and others who produce or organize data; facilitates data access and analysis for researchers and students; and ensures long-term preservation whether or not the data are in the public domain'. This project developed an open source client software that allows the creation of individual ‘dataverses’ are self-contained virtual data archives, which are served by a Dataverse Network, and appear on the web sites of authors, teachers, journals, granting agencies, research centers, departments, and others. According to the developers of this project 'each dataverse presents a hierarchical organization of data sets, which might include only studies produced by the dataverse creator (such as for an author or research project), those associated with published work (such as replication data sets for journal articles), or data sets collected for a particular community (such as for a journal's replication archive, or a college class or subfield)'.

Winning options

In conclusion, there are no tools that comprehend all the 'desiderata' functionalities, but we can select few interesting examples of available web 2.0 applications among those we have described so far that might be a useful starting point:

  1. Zoho represents the most complete online collaborative writing suite of tools that includes several different functionalities from an editor as in Google Docs to a public repository such as Scribd.
  2. Thinkature is most complete online mind-mapping tools
  3. CiteYouLike is an likable example of an online shared personal library of references
  4. Scribd is an impressive tool for documents sharing, social evaluation and dissemination
  5. IntenseDebate is an inspiring example of a basic reputation system across different communities of the Social Web.
  6. Facebook can be represent an example of social web tool in which applications are user-generated as addons to the basic service introducing new practices of the tool itself.
  7. The DataVerse Project and Swivel are a very interesting exploration of datasets sharing and manipulation.

Open issues: Social Web and Sciences

In this section will explore one of the common critique to the approach of social computing and ‘wisdom of crowds’ applied to the domain of science. The wisdom of crowds depends on the existence of crowds, however, there are three barriers to Social Web extracting the wisdom in sciences as it does elsewhere.

The first is the lack of a crowd – or the ‘small world problem’- not only is the total number of scientists in any one field rather low in terms of Internet numbers, but it is even lower in reality with specialization. Some research domains have small numbers for Social Web. For example, there is much more college students for Facebook than there are neuroscientists for a potential “Neurobook”.

The second problem is that scientific communication is quite different than normal human communication. Scientists talk to their friends, but when talking to people they don’t know, it’s much more formal. They use communication to specify theories and to claim ground as theirs. Hence the problem is that people with common knowledge don’t share it with each other, simply because of social competition (and time constraints). It is the barrier based on the idea that what doesn’t get shared anyway isn’t likely to get shared simply because the technology exists to share it. In other words, if a scientist is not going to share something to his/her colleagues at a conference, he/she probably is not going to share it by web means.

The third problem is that there are no rewards for participating in these new forms of communication. The risks associated with sharing and opening up a scientist’s work to other peers before his ‘paternity’ has been acknowledged are present but the rewards are not clear yet. The basic idea is that one reward is constituted but the contributions that will rise by the sharing process but credit attribution, reputation and intellectual property are still unsolved issue that create a formidable obstacle for adopting Web 2.0 tools in sciences.

In conclusion, if we reconsider together these three problems – crowd’s too small, communication’s too formal, and no one gets rewarded – how do we overcome this to get Social Web’s very-real benefits into the sciences? We propose here few starting points more than exhaustive answers to the aforementioned three problems:

  1. Increase the size of the crowd. This potential solution starts with Open Access. More people reading the source materials is simply the only possible way to go. We need to abandon the ‘Walled Garden’ approach to the content. There are people out there who can learn this, but not without access to the canon. It also requires Research Web – that is to say the re-formatting of the scholarly canon so that it’s not just legally accessible as a set of PDF files, but something that can be endlessly manipulated, searched, indexed, and more. Scientific knowledge is inherently compatible with the idea of wiki – each paper is a nodal set of relationships between linkable entities – but it needs to be reformatted first. At the moment the combination of publisher firewalls and underlying data formats is a “bottleneck point” on Social Web utility, because it keeps anyone who is not already in the Science community on the outskirts.
  2. Incentivize participation. This is both a combination of Social and Research Web. It could be as simple as having rewards for whoever creates the most bookmarks, curates the local edges of a semantic graph, tags the most papers. It could be as simple as having a Technorati rank be considered in faculty hiring (though this is as fraught with problems as citations, if not more). It could be asking for proof of the reverberation of one’s research and ideas in any number of ways. The point is to get an environment where scientists see value in talking to each other more than they do.

These are only two starting points that have the function of showing how solutions might be at hand to the three main barriers of adopting web 2.0 tools in the sciences. It is a new challenge but the potential might bring great benefit to science in general and to social sciences in particular. To illustrate an example of such potential benefit is the aim of the next section of this paper, in which we present an example of how the adoption of social computing might change and improve a research practice in the social sciences.

The Impact of the Web 2.0 on Copyright and Licensing

Introduction

In the second part of this State of the Art, we gauge how the advent of the Social Web, in particular the ease of copying and sharing content on the Web 2.0, have transformed our conception of intellectual property as well as our practices related to the latter. This chapter is divided in three sections.

The first section provides a conceptual analysis and historical overview on the intertwined notions of copyright and what one may call, after Lessig (2001, 19; 2002, 1788) and Boyle (2003, 37), the "e-commons", which is distinct from the public domain and covers both copylefted and "free", i.e. unpropertised, resources. Since patenting is of prime importance for scientific research, this aspect of intellectual property will be also briefly discussed, together with the concept of scientific authorship.

The second section assays two recent reflections as to the need to limit or at least redefine the scope of intellectual property in the digital era for the sake of protecting the freedom of scientific research. James Boyle (2003) criticises what he calls a "second enclosure movement", a general tendency in current national and international legislations to fence off and slowly carve up the public domain, which may stiffle intellectual and scientific creativity by reducing the "commons" of freely available results and data. On a different note, Stevan Harnad (2001) pleads for a distinction of two dimenions of copyright, namely protection from theft of ideas (plagiarism) and protection from theft of text (piracy) and argues that only the former is relevant for scientific authorship that aims for impact and not for income. Both critical appraisals of the notion of intellectual property aim at the defense of a "scientific commons" in which authors may self-archive their papers, results and data for every other scientist to use and build upon.

The third section describes how the copyright and licensing models and the related business models in scientific publishing have been transformed due to the Web 2.0. In particular, this section adresses the issue of how the scientific publishing industry has started to adapt to the challenges of the Internet and the Social Web by diversifying its licensing and business strategies.

Copyright and the E-Commons

Copyright

Defining copyright

Copyrights are a kind of intellectual property, the other two categories being patents and trademarks (Koepsell 2000, 43). The rationale of patent law is to protect the exclusive rights as to the exploitation or distribution of inventions, i.e. new products, devices and processes, or improvements thereof, with the explicit exclusion of ideas and methods of operation, e.g. the buttons on a radio (Koepsel 2000, 47-48). Trademark protection aims at the exclusive right to use a certain product names (Koepsell 2000, 48). The scope of copyright is original expressions (Koepsell 2000, 49).

More precisely, the purpose of copyright is to grant the author of an original work exclusive rights for a limited time period with respect to the publication, distribution and adaptation of that work. After that period time the work enters the public domain (Berry and McCallion 2001). However, most legislations allow for "fair" exceptions to the author's exclusive rights, and giving users certain rights, such as to make copies for private use or to quote from published works, under the condition to give credit to their authors.

Copyright is intended as giving authors control over and profit from their works, thereby encouraging and fostering the creation of new works and the flow of ideas and learning. This seems to be mandatory in an epoch when an increasing number of people earn their living from intellectual achievements. (Tallmo 2005). Intellectual property is necessary for an author to be able to make money of his work (ibid.). Contrary to a modern misconception, copyright is essentially a right of authors and creative minds, not of publishing companies (ibid.). The problem of piracy and copyright infringements is not merely loss of income, but also distorted reproduction (ibid.).

Copyright applies to the expression of any idea or piece of information that is sufficiently original. In other words, copyright does not concern ideas or bits of information, but primarily the manner in which they are expressed (Koepsell 2000, 50). As such, a wide range of creative, intellectual, or artistic forms are covered, including news paper articles, poems, scientific papers, academic theses, plays, novels, personal letters, but also movies, dances, musical compositions, recordings, paintings, drawings, sculptures, photographs, software, radio and television and broadcasts.

Evolution of copyright

The history of copyright starts in the 18th century, with a very rich previous history in the XVII century (Chartier 1987). In fact, copyright law has its origin in the monopolies that appeared with the development of presses: publishers and bookbinders were organised in guilds and protected their primacy in information dissemination by keeping their manufacture methods secret. Indeed, until the early 18th century, publishers hold more rights over printed works than their authors (Koepsell 2000, 46).

The Statute of Anne (1710) in Britain can be regarded as the first copyright act; it established both the author of a work and its publisher as owners of the right to copy that work for a period of time of 21 years (Koepsell 2000, 46; Berry and McCallion 2001). The Copyright Clause of the United States Constitution (1787) provided for a legislation that was much more in favour of the authors: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

In 1886, under the instigation of Association Littéraire et Artistique Internationale (AIAI) and its president, the French poet and novelist Victor Hugo, the Berne Convention first established a form of international recognition of copyrights. It was influenced by the French legal concept of "droit d'auteur" and attributed the exclusive ownership of a work to its author. In the 160 countries currently adhering to the Berne Convention, copyrights for creative works generally are automatically in force as soon as they are written or recorded on some physical medium, unless the author explicitly disclaims them, or until the copyright expires and the work falls into the public domain (Geller 2003).

The regulations of the Berne Convention have been incorporated into the World Trade Organization's TRIPS agreement (1995), thus giving the Berne Convention effectively near-global application. The 1996 WIPO Copyright Treaty 1996 extended copyright to computer programs (Berry and McCallion 2001), while 2002 WIPO Copyright Treaty enacted greater restrictions on the use of technology to copy works in the nations that ratified it.

At present, all member states of the EU are signatories of the Berne convention. Furthermore, in the last decade of the 20th century, numerous steps have been taken to harmonise national legislations regarding copyright. The EC directive on the legal protection of computer programs (91/250/EEC) in 1991 was the first major attempt to harmonise national copyright laws within the European Economic Community. in 1993, a common term of copyright protection, 70 years from the death of the author, was determined by Council Directive 93/98/EEC harmonizing the term of protection of copyright and certain related rights. Since then, harmonisation of European copyright law was increased by a number of directives, notably Directive 96/9/EC of the European Parliament and the Council of 11 March 1996 on the Legal Protection of Databases, Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, the Directive 2004/48/EC of the European Parliament and of the Council of 29 April 2004 on the enforcement of intellectual property rights and the Directive 2006/116/EC of the European Parliament and of the Council of 12 December 2006 on the term of protection of copyright and certain related rights. The latter confirms the term of protection of copyright to 70 already fixed in the Council Directive 93/98/EEC.

Establishing copyright claims

In all countries where the Berne Convention applies, copyright is automatic, and need not be obtained through official registration with any government office. Once an idea has been reduced to tangible form, for example by securing it in a fixed medium (such as a drawing, sheet music, photograph, a videotape, or a computer file), the copyright holder is entitled to enforce his or her exclusive rights. However, in jurisdictions where the laws provide for registration, it serves as legal evidence of a copyright claim. For example, in the USA it is mandatory to register copyrights with the United States Copyright Office before a infringement suit may be filed in court.

In some countries, e.g. in the UK, commercial services provide a registration facility where copies of work can be deposited to establish legal evidence of a copyright claim. For the same purpose, in most countries inside and outside of the European Union, there are also legal requirements to file certain published works with the respective national library, especially if an ISSN or ISBN has been requested for the latter.

E-Commons and Copyleft

Public domain vs. e-commons

Following Boyle (2003), we distinguish two distinct domains "outside" of the area of intellectual property, namely the public domain and the e-commons. Both notions are evasive and also delusively close to each other, so that it is mandatory to spend some space to discuss and compare them. The concept of e-commons is the most important one, as it is tied to the practice of copyleft licensing.

Public Domain

The notion of public domain stems from the French "domaine public" which made its way into international and national law through the Berne Convention (Litman 1990; Boyle 2003, 58). David Lange (1981) was the first to raise the issue of the necessity to delimit and defend the public domain. Lange (1981, 149) argues that the very imprecision of the notion of intellectual property is one of the major reasons for its "reckless expansion"; the remedy is to acknowledge a "`no-man's land' at the boundaries" of intellectual property (Lange 1981, 147). However, Lange does not provide a further clarification of the concept of public domain, nor what individual rights exist within it (Boyle 2003, 59).

Lange's article triggered a whole literature on the topic of public domain. Lindberg and Patterson (1991), for instance, proposed to view copyright as a set of temporary and constrained priviledges that feeds the public domain with works as their copyrights expire. Jessica Litman (1990, 1023) contends that the main role of the public domain is allowing copyright law to function despite the unrealistic conception of individual creativity it presupposes. She defines the public domain as a "commons that includes those aspects of copyrighted works which copyright does not protect" (Litman 1990, 968). That is, according to Litman's definition, the public domain comprises the re-usable unprotected elements in copyrighted works as well as works that are completely unprotected (Boyle 2003, 61).

Yochai Benkler's (1999) approach to the evasive notion of public domain is comparatively pragmatical: the public domain is the totality of all uses, works and aspects of works that can be identified as free by lay people without carrying out a sophisticated legal inquiry into individual facts (Benkler 1999, 361-362). According to Boyle (2003, 62), Benkler's definition is intended to raise the issue whether lay people really have reliable intuitions as to whether a certain resource is free, i.e. both uncontrolled by someone else and free of charge. Boyle (ibid.) takes a contextualist, if not sceptical stance, on this issue: the delimitation of the public domain "depends on why we care about the public domain, on what vision of freedom or creativity we think the public domain stands for, and what danger it protects against" (ibid.). A certain pluralism about the notion of public domain is the consequence (ibid.).

E-Commons

The term "commons" has come to denote "wellsprings of creation that are outside of, or different from, the world of intellectual property", as is for instance regarded the Internet (Boyle 2003, 62). As such "commons" or "e-commons" and "public domain" would appear to be synonymous. But Larry Lessig (2001, 19-20; 2002, 1783, 1788) proposes a more restrictive definition: "e-commons" is the totality of works or information the uses of which are maybe not necessarily free of charge, but are such as to be unconstrained by the permission or authorisation of somebody else, certain liability rules excepted. A similar delineation of the concept of commons is proposed by Benkler (2006, 144). The focus is on control and the freedom from the will of another (Boyle 2003, 63) rather than on absence of costs: intellectual property should not restrain innovation in form of a monopoly (Boyle 2003, 64).

Hence being in the e-commons is compatible of being owned individually or collectively. A good example is open-source software that is available under so-called "copy-left" licenses that are actually copyright licenses granting end-users the right to modify or copy the software or any other expression of content as long as these uses comply with the copyleft license (Boyle 2004, 64-65). We will discuss the notion of copyleft below.

Hence, the distinction between public domain and e-commons is that the first is based on the dichotomy between the domain of property and the domain of the free, while the second draws the dividing line between the domain of individual control and the domain of "distributed creation, management and enterprise" (Boyle 2003, 66). Not only is the e-commons compatible with constraints, but the successful examples of e-commons, like open source software, actually presuppose constraints, be they legal - in the form of liability rules - or based on shared values and norms and prestige networks (ibid.).

It is important to note that the e-common is "outside" of the domain of intellectual property not in the sense that it excludes property rights, but only in the sense that it precludes that they may become an obstacle to innovation and intellectual creativity. Copyleft licenses which are the backbone of the e-commons actually exploit intellectually property rights in order to prevent the abuse of the very same rights, as we shall see below. Thus, the e-commons stands squarely on the ground of intellectual property. However, in a more liberal reading, which is adopted by Boyle, as we shall see below, the notion of e-commons covers both resources subject to intellectual property (but are copylefted) and resources that are free in the sense of being part of the public domain stricto sensu. That is, the e-commons in the wider sense includes the public domain, while stretching over into the area of intellectual property.

The Cornucopia of the E-Commons

Defining Copyleft

As already mentioned in the previous section, copyleft uses copyright law to remove restrictions related to the distribution of copies and modified versions of the protected work, while requiring that these copies and modified versions preserve the same freedom as the original. As opposed to traditional copyright that locks a work up, copyleft prevents the locking up of the work and its derivative works.

Copyleft is applied to computer software, documents, music, and art. Via a copyleft licensing scheme, an author may permit to everyone who receives a copy of his to reproduce, adapt or distribute it under the provision that the copies or adaptations are also licensed under the same copyleft scheme (Stallmann 1996). Thus copyleft can be regarded as an alternative to letting fall a work wholly into the public domain, namely as a copyright licensing scheme under which an author gives up some of his/her exclusive rights as to the reproduction, distribution or adaptation of his/her work (ibid.).

Short history of copyleft

The idea of copyleft licences originated in 1975, when Dennis Allison wrote a specification for a simple version of the BASIC programming language, Tiny BASIC (Allison 1976). This specification appeared in Dr. Dobb's Journal of Tiny BASIC, which computer hobbyists used to write their own BASIC interpreters to be published in the same journal (Warren 1976).

In 1984, Richard Stallmann decided to create what was to be the first real copyleft license, the Emacs General Public License (Emacs), the first copyleft license, after having become irritated by the fact that the company Symbolics, which he supplied with a public domain version of a Lisp interpreter he was working at, refused to allow him access to the changes made by company to the original product. The Emacs General Public License was to develop into the GNU General Public License (Stallmann 1998).

The GNU General Public License (http://www.gnu.org/licenses/gpl.html) was designed to make sure that the source-code remained open and freely available in order to foster the sharing of ideas, but it did not exclude commercial usage (Berry and McCallion 2001). The most successful GNU project was Linux, that was started in the 1991 by Linus Torvalds (Stallmann 1998), followed by Wikipedia (wikipedia.org), an online collective encyclopaedia that is collaboratively maintained by millions of volunteers, and which is licensed under the GNU GPL (Berry and McCallion 2001).

These projects have inspired Lawrence Lessig and others to establish Creative Commons (http://creativecommons.org) in 2001 with the support of the Center for the Public Domain. Creative Commons is an organisation which offers a wealth of copyright licenses, ranging from public domain licenses to sampling licenses, all with the aim of encouraging creative freedom. Releasing work under a Creative Commons license is not the same as giving it away, but it licenses ‘reuse’ under the conditions defined by the licence chosen by the author (Berry and McCallion 2001). Creative Commons licenses are offered, together with the “all rights reserved” model of traditional copyright, in Knol (knol.google.com), an online knowledge resource provided by Google as an alternative to Wikipedia.

Open Source copyleft licenses

The paradigm of copyleft licenses are the GNU General Public License (GPL) (http://www.gnu.org/licenses/gpl.html) for derivative or linked works that must also use the GPL or a compatible license, and the GNU Lesser General Public License (LGPL) (http://www.gnu.org/licenses/lgpl.html) where direct derivatives of the work must be released under LGPL or a compatible license, but any code under any license can link to the LGPL-licensed code. A typical example is of a library which would be incorporated into a larger work. This license is now generally deprecated by its originator, the Free Software Foundation.

A lesser used alternative from GNU is the GNU Affero General Public License (AGPL) (http://www.gnu.org/licenses/agpl.html). Similar to the GPL, this license requires source-code modifications to be published if the software itself is distributed, and also if it is used to provide a service via a network (for example, as an internet application, which runs on the host company's server and thus is not “distributed” in the terminology of the GPL). Finally, the GNU Free Documentation License (http://www.gnu.org/licenses/fdl.html) is a copyleft license designed for textbooks and manuals.

Note that not all open source licenses are copyleft licenses. Some are permissive licenses that offer many of the same freedoms as releasing a work and letting it fall into public domain. For example, the Berkeley Software Distribution (BSD) license (http://www.opensource.org/licenses/bsd-license.php) allows anyone to do whatever they wish with the code as long as they reproduce the original copyright notice.

Creative Commons licenses

An alternative to the GNU licenses is the suite of copyright licenses provided by Creative Commons (http://creativecommons.org/about/licenses). Creative Commons copyright licenses have been ported to over 45 international jurisdictions and by the year 2008, about 130 million works have been licensed under a Creative Commons scheme. The current version of the Creative Common licenses is 3.0 (http://creativecommons.org/about/history).

The generic Creative Commons licenses are designed to be jurisdiction-neutral, but to some extent are founded upon the U.S. Copyright Act. This makes it sometimes necessary to align theses licenses with other national legislations. Therefore, the Creative Commons model has three layers: the human-readable Commons Deed, the lawyer-readable Legal Code, and the machine-readable Digital Code or metadata. With the support of an international network of legal experts, Creative Commons seeks to port the Legal Code to a particular jurisdiction, while the Commons Deed and Digital Code always remain the same (cf. Figure 3).

Figure 3: The three layers of the Creative Commons licensing model

Creative Commons provides the following license conditions as options to the licensor:

  1. Attribution: the licensor allows others to copy, distribute, display and perform his/her copyrighted work as well as any derivative work based on it provided credit is given in the manner requested by the licensor.
  2. Share-alike: the licensor allows others to distribute derivative works only under the same license that gouverns his/her copyrighted work.
  3. Noncommercial: the licensor allows others to copy, distribute, display and perform his/her copyrighted work as well as any derivative work based on it only for noncommercial purposes.
  4. No Derivative Works: the licensor allows others to copy, distribute, display and perform exclusively verbatim copies of his/her copyrighted work, but no derivative works based on the latter.

The Creative Commons licenses combine the aforementioned license conditions:

  1. Attribution (by) (http://creativecommons.org/licenses/by/3.0/legalcode);
  2. Attribution ShareAlike (by-sa) (http://creativecommons.org/licenses/by-sa/3.0/legalcode);
  3. Attribution No-Derivatives (by-nd) (http://creativecommons.org/licenses/by-nd/3.0/legalcode);
  4. Attribution Non-Commercial (by-nc) (http://creativecommons.org/licenses/by-nc/3.0/legalcode);
  5. Attribution Non-Commercial Share Alike (by-nc-sa) (http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode); and
  6. Attribution Non-Commercial No Derivatives (by-nc-nd) (http://creativecommons.org/licenses/by-nc-nd/3.0/legalcode)

Attribution is the most accommodating license offered, since the user is free to do whatever he/she wants with the licensed work as long as he gives credit to the licenses/author, while Attribution Share Alike comes closest to open source software licenses. Attribution Non-Commercial Share Alike demands that all derivatives have to be used non-commercially. The most restrictive of Creative Commons licenses is Attribution Non-Commercial No Derivatives: it is also called the "free-advertising license" inasmuch as the licensed worked may be downloaded and shared, but not modified or used commercially.

Creative Commons provides users also the option to let their work fall into the public domain (though this option is not valid in every country outside the US), or to choose any of the GNU copyleft licenses or even the permissive BSD license.

E-Commons and Scientific Research

The Enclosure of the E-Commons

The targedy of the commons

The patenting of the human genome (Boyle 2003, 37) and the European Database Protection Directive which extends intellectual property rights over mere compilation of facts (Boyle 2003, 39; Boyle 2005), are in the eyes of James Boyle only two examples for what he calls the "enclosure of the intangible commons of the mind" (a similar expression for the same phenomenon is used by Yochai Benkler in his 2006, 146). The latter refers to the expansion of intellectual property into the area of uses, works or aspects of works that used to regarded as uncopyrightable. The traditional frontiers of intellectual property rights are under attack (Boyle 2003, 38), questioning the old assumption that the raw materials of scientific research, i.e. ideas, data and fact, should remain in the public domain and not become proprietary (Boyle 2003, 39).

Before we proceed, it is important to note that Boyle obviously uses the term "commons" in its wider meaning, i.e. as both covering the public domain in the strict sense and stretching over into the area of intellectual property (cf. supra).

Now even if the enclosure of the e-commons in some ways parallels the state-promoted transformation of common land into private property in the 19th century (Boyle 2003, 33-34), there are also dissimilarities between the commons of the mind and its earthy counterpart. Indeed common land is a rivalrous resource inasmuch as many individual uses of the latter mutually exclude each other. Herdsmen who roam the same common pasture compete with each other as to its use and may eventually ruin it: since it is to the immediate benefit of an individual herdsman to add one more cow to his herd, there is no incentive for each one of them to prevent over-grazing of the commons. A "tragedy of the commons" seems to be the outcome: rivalrous resouces that are not individually owned inevitably are overexploited (Boyle 2003, 41; Lessig 2001, 22). However, such a tragedy does not occur with respect to a commons that is non-rivalrous - such as in fact the e-commons : there is no limit as to how many times an MP3 is downloaded or a poem is read on the Web (Boyle 2003, 41).

Arguments for and against the sustainability of the e-commons

Defenders of the enclosure of the e-commons therefore prefer to argue that the problem with the informational commons is that there is no incentive to create this resource in the first place. Indeed, information resources are not only non-rivalrous, but also non-excludable: one unit of such a good may satisfy an unlimited number of users at no marginal cost at all (Boyle 2003, 42). Boyle quite plausibly objects that the Internet compensates this apparent deficiency by also reducing production and distribution costs, while enormously enlarging the market (Boyle 2003, 43). Moreover, the technologies of the Internet also facilitate quick detection of illegal copying, such that it is not obvious that copyright holders see their privileges diminished through the advent of the Web (ibid.).

Another argument in favour of the enclosure of the e-commons is the growing impact of information-based products in the world economy. However, one may reply that since information products are built out of parts of other information products, and thus every information item constitutes the raw material for further innovation, each additional extension of individual property into the e-common reduces access to and increases the cost of each new product and innovation. Hence, the enclosure of the e-commons may do more harm to innovation that good (Boyle 2003, 43-44).

As to the question what incentives or motivations there are for building the resources that make up the e-commons - whether it is for prestige, improving one's resumé, the satisfaction of exerting one's skills and creativity, or at least partly because of sheer altruistic virtues and values (as claim Benkler and Nissenbaum, 2006, 407) - it appears be spurious. Indeed, in a global network with a large number of members, there will be always enough talented people that will be willing to contribute to the creation and evaluation of information products, if production and distribution costs are near to zero. (Boyle 2003, 45-46). Under one condition, however: without centralised supervision, large-scale projects have to be modular in order to allow for an efficient division of labour (ibid.). Open source development is the paradigm of a distributed and non-propietary creation, a "commons-based peer production" (Benkler and Nissenbaum 2006), but so has been scientific research and the development of artistic movements long before the existence of the Internet (Boyle 2003, 47).

Distributed creation is also appropriate for capital-intensive projects, at least in the case of science, which more and more relies on data- and processing-intensive models. Lay volunteers have been successfully recruited to the task of distributed data scrutiny, as for example in NASA's "Clickworkers" experiment which recurred to volunteers for the analysis of Mars landing data (http://clickworkers.arc.nasa.gov/top). Another example for large-scale distributed information production in the field of bioinformatics is the open-source genomics project (www.ensembl.org) (Boyle 2003, 47-48; Bricklin 2006; Benkler and Nisenbaum 2006, 395 ff.). Thus, against economical prejudice in favour of free market competition based on individual property, distributed creativity in an information commons are certainly viable (Boyle 2003, 51).

A false manicheism ?

The danger of the enclosure of the information commons, so Boyle, is that "propertization is a vicious circle". He argues that in order to achieve optimum price discrimination with proprietary information goods that have no substantial marginal costs, the holders of intellectual property rights will demand ever greater extension of the realm of individual property into the information commons. (Boyle 2003, 50; Boyle 2000). However, the fundamental reason for the tendency to transform the e-commons into private property may be a cognitive bias against openness of systems and networks as well as non-proprietary creation - an aversion which may be due to the fact that our everyday experience of property is that over tangible resources for which the "tragedy of the commons" indeed holds (Boyle 2006). Hence the necessity to adapt our conceptions of property to the non-tangible commons of the mind (ibid.).

While one may agree with Boyle's general concern about the enclosure of the e-commons and deplore the propertization of the raw material of scientific research, it is appropriate to qualify an excessively Manicheistic view of the dynamics between intellectual property and e-commons. In general, the notions of public good and private ownership are by no means mutually exclusive. A classical example of non-excludable private goods are privately owned lighthouses in 19th century Great-Britain: the service provided by a lighthouse, namely the aid for navigation through the emitted light- or sound signal, cannot be reserved to a few ships (Foldvary 2003; Coase 1974). In a sense, e-commons resources are digital-era examples of a possibly private goods that are non-excludable.

It is certainly true that in a first stage reducing the extent of the public domain also means pushing back the frontiers of the e-commons. However, as Boyle himself concedes, the e-commons not only stretches into the area of intellectual property, but actually presupposes intellectual property rights in a crucial respect. As we have seen, copyleft licenses constitute a pillar of the e-commons. But these are copyright licenses that neutralise the monopolistic tendencies inherent in intellectual property. While it is true that intellectual property and public domain correspond to each other like figure and background, and hence each widening of the scope of the former diminishes the extent of the latter, this is not so for the relation between intellectual property and the e-commons. Paradoxically, the extension of copyright means a potential increase of the commons of the mind, provided copyleft licensing keeps up with propertisation.

Furthermore, e-commons and intellectual property do not exclude each other in terms of their associated business models: indeed, there is (maybe anecdotical) evidence that some information goods may well be simultaneously available both in the e-commons and on the proprietary marked, without any prejudices to sales in the latter (Boyle 2007). Not only academical works like Yochai Benkler's "The Wealth of Networks" (2006) and James Boyle's "The Public Domain" (2008), but also science fiction novels like "Down and Out in the Magic Kingdom" by Cory Doctorow have sold considerably well despite being available either in the public domain or under a Creative Commons license (ibid.).

The explanation of this peaceful co-existence may of course reside in the fact that paper copies and electronic copies of a text have complementary uses: pdf-copies are easier to search and quote, while books are more comfortable to carry around or to keep on the bedside table (even more so than print-outs). But in the case of other media, like music, the comparative advantages of having a hard copy besides the electronic copy may be too marginal to allow for such a harmonious co-existence: the quality of the music as registered on a CD may be higher than that of an MP3, but for anyone save afficinados of classical music, i.e. for the large majority of consumers that enjoy music as a mere entertainment, it makes no difference to listen to a CD player instead of enjoying the same piece or song on a MP3 player.

Self-archiving and Open Access

Authorship vs. Copyright

A distinction which goes often unnoticed is the one between authorship and copyright (Harnad 2001). Authorship is intellectual priority or "parentship" with respect to an idea or set of ideas, while copyright is the ownership with regard to its expression. Infringement of copyright, i.e. theft of text or piracy, is at least a civil offense, while theft of authorship or plagiarism, is morally and academically discreditable, but cannot be pursued in court. Also, authorship is unalienable: you can never loose the authorship of your own discoveries and ideas, whereas, in the case of copyright, you can decide to sell or give away the rights on your writings.

The modern conception of scientific authorship was shaped around the birth of the Royal Society and its publication series, The Philosophical Transactions, that started in 1665. The community of natural philosophers that founded the Royal Society established some standards and practices related to scientific authorship that are still in force today. For example, they decided that a scientific author cannot "own" his or her own discovery: science writing is a way of reporting about facts of nature, and nature cannot be object of copyright. The members of the Royal Society also introduced an early form of peer-review: new ideas or discoveries were "informally" discussed in the meetings of the Royal Society, and, upon approval by the community of peers, published in the Philosophical Transactions (M. Biagioli, P. Galison 2003).

This distinction between authorship and copyright is especially crucial for scientific literature, which is, in contrast to the majority of the published works, a give-away literature: authors of research papers and books do not seek (and generally do not receive) any royalties, but impact, that is the distribution, recognition and exploitation of their work by their peers. It is on the basis of impact that academics built their career and hence their income (Harnad 2001, Harnad 2008).

This means that unlike authors of non-give-away works who earn their keep in form of royalties, researchers are less worried about piracy than about plagiarism, i.e. the denial of authorship, since their main concern is that their ideas circulate and gain recognition among their peers. Of course, this does not entail that authors of scientific works would be delighted if their papers and books were pirated; in most cases, they still want to retain control over where their work appears, whether credited or not. However, any obstacle to accessing their works and hence to the impact of their ideas jeopardises their main source of income (Harnad 2001).

Self-archiving and Open Access

Based on this insight, Stevan Harnad, a prominent defender of self-archiving and open access publishing, has been one of the most vocal critics of the traditional subscription-based business model for peer-reviewed scientific journals. Subscription fees have reached a level of about 2000 Euros, which means that research institutions not only in the developing countries have serious difficulties to pay access to refereed journals for their members (Harnad 2001a). In other words, subscription tolls have become access barriers and thus also impact barriers.

Now, peer review is essential for quality assessment and certification of scientific research papers and hence for the academic reputation of their authors; as such it is the only service provided by serious scientific journals in which researchers are really interested (Harnad 2001). But it has been estimated that the review costs only constitute about 10% of the total subscription tolls (Harnad 2001; 2001a). The long-term solution advocated by Harnad is the spreading of electronic open-access journals, where publication costs are ideally minimised and are paid by the institutions that host the authors (the so-called "green route"), such that readers can access papers for free (Harnad 2008). We will return to this topic in the next section.

Meanwhile there is a cheap alternative: selfarchiving of pre- and post-prints in institutional eprint archives (e.g. http://www.eprints.org or http://hal.archives-ouvertes.fr/), which has been practiced by physicists since 1991. Some publishers, like Springer, provide copyright transfer agreements that explictly authorise authors to selfarchive a personal copy of the refereed and published version, i.e. the so-called postprint, of their paper (Harnad 2001). In case no such clause can be negotiated, there is a simple and competely legal strategy to circumvent restrictive copyright, namely by self-archiving the preprint and the corrigenda separately (the so-called "Harnad-Oppenheimer strategy": Harnad 2001; Oppenheim 2001). Of course, this strategy applies only provided the publishers do not request a minimal delay for self-archiving preprints of the final version !

Beyond Open Access

It cannot be denied that Open Access greatly facilitates the circulation of ideas and scientific results, but there is a real practical need to go beyond Open Access. Indeed, the latter does not solve the problem of the reckless multiplication of the scientific literature. This issue could at least partially be addressed by introducing the practice of re-use in the production of scientific texts. Yet, such practice is currently prohibited with respect to research papers because the related copyright either belongs to the publisher (in the subscription-based model) or to the author (in the open access model), with no license whatsoever for the public to re-use these documents.

Let us call "liberal copyleft license" any copyleft license that permits the creation of derivated works based on the licensed original. Liberal copyleft licensing is the conditio sine qua non for "commons-based peer production" (Benkler and Nissenbaum 2006), i.e. for the distributed creativity that has been the reason why open source software development has been so successful. The license to re-use and modify, together with a peer review in vast global communities, allows for a large-scale incrimental optimisation of any resource in the e-commons. But while science has applied this model of optimisation for the development of ideas and theories, scientific writing is still largely based on the cooperation of small numbers, if not on the romantic cliché of solitary creation.

Nonetheless, online collaborative encyclopedias such as Wikipedia are examples that large-scale commons-based distributed production can also be harnessed to the creation and improvement of texts. But even on a much smaller scale, re-use has been practiced under GNU public licenses in the writing of manuals (such as Oetiker, Partl, Hyna and Schlegl 2008). A similar approach is not possible in scientific writing: currently, you are not permitted to rewrite a scientific article, correcting some of its flaws (say, a gap in a proof), and publish the new derivated version under your name, even if you acknowledge the author of the original paper. Instead you have to write a completely new article that must not substantially textually overlap with the old one. Of course, you may contact the author as the holder of the copyright and negotiate to write a common, improved article. But in even this case, there is a waste of time and resources which would be unnecessary if the original paper were made available for re-use under a liberal copyleft license.

Conversely, even under Open Access, you cannot just publish a note in order for others to re-use and develop; otherwise, under a liberal copyleft license, derivated versions would be easily traceable and you could gather credits for having sown the seeds of a series of (hopefully) high-quality papers based on your original note. Liberal copylefting of scientific articles would foster and reward the early publication of research ideas, while safeguarding priority of authorship.

Of course, such an innovative way of producing scientific articles would presuppose not only changes in licensing, but also in the review process. E.g., drafts intended as seeds of more developed research papers would have to be evaluated differently as fully developed articles. We can only start to imagine what changes the application of commons-based peer production to scientific literature would both necessitate and cause.

Licensing in scientific publishing

Introduction

The issues of copyright and licensing in the scientific publishing industry can be regarded as two sides of the same coin, since they are strongly correlated concepts. While an extensive analysis of copyright and licensing was presented before, in this section we briefly review how copyright and licensing might be contextualized within the scientific publishing industry. The section is organized as follows; first, we review the configuration of the market with its main actors and dynamics. Then we review the evolution of copyright and licensing issues over time. This section is based on a voluminous existing literature about the publishing industry (ECDGR, 2006; Enserink, 2007; Houghton, 2005; House of Commons, 2004; Madras, 2008; Milmo, 2006; Morgan Stanley, 2002; STM, 2008; VV.AA., 2006; Ware, 2006; Wellcome Trust, 2003).

Copyright and licensing in publishing industry

The link between copyright and licensing might be summarized as follows: the right to use (namely copyright/copyleft) a particular object might be given (namely licensed) to another actor. To be clear, take a real-world example from the publishing industry. In this industry the object which is usually subject to copyright is a paper or a monograph. The right to use a paper or monograph is usually an exclusive privilege of the author(s). However, the author(s) may license the right to use (i.e. read, disseminate, etc.) the paper to someone else (for instance a publishers or a user). As this simple case shows, there are several ways copyrighted material can be given to someone else to use. After describing the main actors, we will assay how copyright and licensing have been declined within the scientific publishing industry before and after the advent of the Internet.

Main actors in the scientific publishing industry

The current scientific publishing industry might be summarized as having three main classes of actors:

  • Authors. These are commonly researchers that produce scientific papers or monographs. Taking the production of a paper as an example, the career of authors relies mainly on their ability to publish that paper (their intellectual work) in well recognized journals. That is why the copyright policies in the scientific publishing industry are quite an important issue.
  • Publishers. These actors gather intellectual work from authors and manage it to publish books and periodicals. Most of the publishers fall in one of the following two categories. The first category includes commercial/for profit firms while the second includes non-profit organizations. This differentiation is key as in the last decades non-profit organizations have pioneered new licensing models.
  • Libraries. These actors buy books and periodicals journals from publishers. They are the final users.

The concrete relationships between these actors as well as copyright and licensing issues have changed over time. One of the main elements to influence the market has been the internet. Indeed, internet has offered new ways to access to scientific knowledge which still are not exhaustively explored and experimented. To fully understand how the internet has changed copyright and licensing policies and practices we will now review them as working before and after internet. The whole section is summarized in Table 1.

Copyright and licensing in scientific publishing before Internet

Before the wide-scale commercial adoption of the Internet, hardcopies-based scientific journals represented an essential channel for the diffusion of scientific knowledge.

In this period, publishers could be considered as monopolists of the market as usually the copyrights of scientific publications were transferred to them by the authors. It is important to note that while authors usually gave away the right to exploit their works, they still retained the scientific authorship (see above). This aspect is very important as authors do not improve their careers by selling their works; rather their career is based on the ability to publish scientific work with well recognized publishers. As a matter of fact, publishers were working as (1) acquirers of copyrights from authors for their intellectual work and as (2) suppliers of licensing policies for the use of copyrighted material. We will review these two main points in detail.

Copyright policy. Traditionally authors transferred their copyright to publishers (either profit or not for profit). It could happen that in particular cases, such as for US government employees, a full copyright transfer was not possible. Usually in this case only a limited part of the transfer was executed. In some other cases, (mostly when contracting materials from companies for professional books) publishers did not obtain the copyright (because the company still wanted to keep this) but exclusive print and distribution rights (Table 1).

Licensing policy. Publishers licensed access to their copyrighted material as subscription to journals or by selling books. Libraries were paying the bill. This model has been termed “reader pay” as the right to use the intellectual work is paid by the readers. Usually the commercial strategy of publishers was that of contracting directly with universities and libraries. In particular this strategy allowed differentiating the prices of journals’ subscriptions on the basis of each library characteristics.

Copyright and licensing in scientific publishing after Internet

After internet the monopoly of publishers as summarized above started to decrease slowly. Indeed, the new opportunities given by the Internet as well as the social and political concerns brought by the “serials crisis”, affected both the well established copyright and licensing polices. We will review how copyright and licensing policies have changed since the advent of the Internet (Table 1).

Copyright policy. During this period, publishers lost their monopoly in the acquisition of the author's copyright. Indeed, the sharp fall of publication and dissemination costs triggered by the Internet have allowed the birth of different ways to publish intellectual material. In some cases, this opportunity was reflected in the possibility for authors to retain their copyright of their intellectual work. This latter situation was explored by the various Open Access experiments as outlined below (for a review, please see the website of the Budapest Open Access Initiative: www.soros.org/openaccess).

Licensing policy. The birth of Internet brought consistent changes both in terms of how copyright and licensing have been intended and in terms of how these polices have been applied from a commercial point of view.

First, the commercial application of the old “reader pay” policies changed. Indeed the almost-zero dissemination costs made possible by the Internet push publishers to sell bundles of journals “access rights” to libraries. This commercial strategy was called “big deal”, meaning that libraries were offered a number of journals’ subscriptions. A similar policy is the “Core + peripheral” one where a small number of core publications is in these models. Other forms of subscription policies were tested such as the National licence. The latter is a sort of national reading license that covers all the public libraries as well as education and research institutes. Other less known applications are based on a Pay per view (PPV) policy.

Second, initiatives such as the Open Access supported the publication of scientific material while allowing authors to retain the copyright of their intellectual work. The general idea of this approach - that has gained momentum in the last decade - is that scientific knowledge should be completely free and unrestricted access should be granted to scientists. While there are different open access dimensions (Table 2), we will focus on the so-called gold and green routes as they are the most known and applied.

Table 1. Review of the copyright and licensing policies and practices before and after Internet (Jeffery, 2008)

Pre-Internet
Post-Internet
Copyright Authors sell their copyright Authors sell their copyright Authors do not sell their copyright
Licensing Users pay to read Users pay to read Open access: users do not pay to read
Who pays Libraries Libraries/single users Authors Public institutions
Main examples of licensing Subscription to the hardcopy Subscription
- Big deal
- National license

Pay per view (PPV)
- PPV convertible to subscription
- PPV bundle

Core + peripheral
Gold Route:

Authors Pay

Hybrid models

Delayed OA
Green Route:

Institutional archives and repositories

Open Access Models

The Gold Route

The gold route of open access is the commercial version of the open access philosophy. It implies the author or author’s institution pays a fee to the publisher for publishing peer-reviewed research, the publisher thereafter making the material available “free” to all. No subscription to journals is necessary to read an Open Access certified journal or paper.

The Public Library of Science (MacCallum, 2007) cites the “Bethesda Statement on Open Access Publishing” when suggesting that an Open Access Publication is one that meets the following two conditions:

  • The authors and copyright holders grant to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship, as well as the right to make small numbers of printed copies for their personal use.
  • A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in a suitable standard electronic format is deposited immediately upon initial publication in at least one online repository that is supported by an academic institution, scholarly society, government agency, or other well-established organization that seeks to enable open access, unrestricted distribution, interoperability and long-term archiving.

There are three main variants of open access (OA) publishing:

  • Immediate full OA: the entire contents of the journal are made freely available immediately on publication. A well-known example is the PLoS Biology journal.
  • Hybrid and optional OA: here only part of the journal content is made immediately available. There are two distinct models:
    • The journal makes its research articles immediately available but requires a subscription to access other “value added” content such as commissioned review articles, journalism, etc.
    • The journal offers authors the option to make their article OA in an otherwise subscription-access journal in return for payment of a fee (e.g. Springer’s Open Choice or OUP’s Oxford Open schemes).
  • Delayed OA: the journal makes its contents freely available after a relatively short period, typically 6–12 months.

Note that while the economic impact of the Open Access philosophy is still small (10 per cent of the whole journals market is open access), its legal impact is still under scrutiny.

The Green Route

The “green” route to open access states that authors should be free to self-archive on public open archives their articles. The green route to open access might be divided in open repositories and open archives.

Open access archives are typically subject or discipline based, offering open and free access to pre-print and/or post-print papers in a particular discipline or subject area. On open access archives, also called subject-based archives, can be uploaded both pre-print articles, for example articles that have been submitted for publication but not yet accepted, and post-prints, such as articles that have been accepted for publication and/or published. Usually, they collect documents in a specific discipline and their main objective is to allow for a quicker and more efficient dissemination of papers that are deposited by the authors themselves.

Open access repositories are typically institutionally based, offering the same level of open and free access to the work and outputs of particular institutions (e.g. universities or research institutes). Both rely upon authors and/or their institutions posting material to the archive/repository (i.e. “self-archiving”). Open access repositories operate in the same way as open access archives, but they are associated with an organisation, such as a university or research institute, rather than a subject area or discipline. The House of Commons enquiry concluded that “institutional repositories have the potential greatly to increase the speed, reach and effectiveness of the dissemination of research findings” (House of Commons, 2004). The Wellcome Trust noted that (Wellcome Trust, 2003):

“… the existence of a central archive could transform the market. Access to all UK publications would be possible and would act as a brake on excessive pricing”. They would benefit authors, readers and institutions: authors would see their articles made available to a wider audience; readers would be able to access articles free of charge over the Internet; and institutions would benefit from having an online platform on which to display their funded research.”

The green route to open access is still to be explored. Indeed, it is not really clear how this route will change copyright policies. While within the open access framework, the posting of an already published paper to open archives or repositories does not raise copyright issues, the situation changes when considering a paper which is published within a “subscription based” journal. Here, the usual policy for publishers is to acquire the copyright from authors. Thus the latter could not legally disseminate their work anymore. Self-archiving must be specifically authorised by the publisher. However, as mentioned above, it appears that there is another legal way of dealing with copyright issues as to self-archiving of postprints, namely the "Harnad-Oppenheimer strategy" of archiving the preprint and the corrigenda separately, under the proviso that the copyright transfer for the post-print does not explicitly exclude this procedure.

Table 2 - Dimensions of Open Access (Jeffrey 2008)

Green Route The author can self-archive at the time of submission of the publication whether the publication is a grey literature, a per-reviewed journal publication, a peer-reviewed conference proceedings paper or a monograph
Golden Route

The author or author institution can pay a fee to the publisher at publication time, the publisher thereafter making the material available “free at the point of access.

Preprints

Preprints are articles that are pre-peer-review

Postprints

Postprints are articles that are post-peer-review

Eprints

Eprints can be either preprints or postprints but in electronic form

White Literature

White literature is peer-reviewed, published articles

Grey Literature

Grey literature is preprints or internal “know-how” material

References

  1. Allison, D. (1976). "Design notes for TINY BASIC". SIGPLAN Notices (ACM) 11 (7): pp. 25-33.
  2. Benkler, Y. (1999). Free as Air to Common Use: First Amendment Constraints on Enclosure of the Public Domain. New York University Law Review 74: pp. 354-446.
  3. Benkler, Y. (2006). The Wealth of Networks. New Haven, London: Yale University Press.
  4. Benkler, Y. and Nissenbaum, H. (2006). Commons-based Peer Production and Virtue. The Journal of Political Philosophy 14(4): pp. 394-419.
  5. Biagioli, M. (2009) « Priority, Originality, and Novelty : Construing the New in Science, Patents, and Copyright », unpublished paper presented at the Collège de France, January 15th 2009, Paris.
  6. Biagioli, M., Galison, P. (2003) Scientific Authorship: Credit and Intellectual Property in Science, Routledge, New York.
  7. Berry, D.M, McCallion, M. (2001). Copyright and Copyleft. Eye magazine. At http://www.eyemagazine.com/opinion.php?id=117&oid=290
  8. Boyle, J. Cruel, Mean, or Lavish?: Economic Analysis, Price Discrimination and Digital Intellectual Property. Vanderbilt Law Review 53.
  9. Boyle, J. (2003). The Second Eclosure Movement and the Construction of the Public Domain. Law and Contemporary Problems 66: pp. 33-74.
    Boyle, J. (2005). Public information wants to be free. Financial Times, February 24 2005. At http://www.ft.com/cms/s/2/cd58c216-8663-11d9-8075-00000e2511c8.html
  10. Boyle, J. (2006). A closed mind about an open world. Financial Times, August 7 2006. At http://www.ft.com/cms/s/2/64167124-263d-11db-afa1-0000779e2340.html.
  11. Boyle, J. (2007). Text is free, we make our money on volume(s). Financial Times, January 22 2007. At http://www.ft.com/cms/s/b46f5a58-aa2e-11db-83b0-0000779e2340.html.
  12. Boyle, J. (2008). The Public Domain. Enclosing the Commons of the Mind. New Haven, London: Yale University Press.
  13. Biagioli, M., Galison, P. (2003). Scientific Authorship. New York: Routledge.
  14. Bricklin, D. (2006). The Cornucopia of the Commons: How to get volunteer labor. Retrieved October 30, 2008 from http://www.bricklin.com/cornucopia.htm
  15. Berkeley Software Distribution (BSD). Retrieved November 30, 2008 from http://www.opensource.org/licenses/bsd-license.php.
  16. Butterfield, S. (2007). Geotagging: One day later. At http://feeds.feedburner.com/~r/Flickrblog/~3/17553027/geotagging_one_. Html.
  17. Chartier, R. (1987) Lectures et lecteurs dans la France de l'Ancien Regime, Seuil, Paris.
  18. Coase, R. (1974). The Lighthouse in Economics. Journal of Law and Economics 17: 357-376.
  19. comScore. (2007). Social networking goes global. Reston, VA. At http://www.comscore.com/press/release.asp?press=1555
  20. Council Directive 93/98/EEC harmonizing the term of protection of copyright and certain related rights. At http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31993L0098:EN:NOT.
  21. Creative Commons: History. Retrieved December 15, 2008 from http://creativecommons.org/about/history
  22. Creative Commons: Licenses. Retrieved December 15, 2008 from http://creativecommons.org/about/licenses
  23. Creative Commons: Attribution v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by/3.0/legalcode
  24. Creative Commons: Attribution-ShareAlike v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by-sa/3.0/legalcode
  25. Creative Commons: Attribution No-Derivatives v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by-nd/3.0/legalcode
  26. Creative Commons: Attribution Non-Commercial v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by-nc/3.0/legalcode
  27. Creative Commons: Attribution Non-Commercial Share Alike v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode
  28. Creative Commons: Attribution Non-Commercial No Derivatives v.3.0. Retrieved December 15, 2008 from http://creativecommons.org/licenses/by-nc-nd/3.0/legalcode
  29. Directive 96/9/EC of the European Parliament and of the Council on the legal protection of databases. At http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31996L0009:EN:NOT.
  30. Directive 2001/29/EC of the European Parliament and of the Council on the harmonisation of certain aspects of copyright and related rights in the information society. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32001L0029:EN:NOT.
  31. Directive 2004/48/EC of the European Parliament and of the Council on the enforcement of intellectual property right. At http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32004L0048R(01):EN:NOT.
  32. Directive 2006/116/EC of the European Parliament and of the Council of 12 December 2006 on the term of protection of copyright and certain related rights (codified version). At http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32006L0116:EN:NOT.
  33. ECDGR (2006). Study on the economic and technical evolution of the scientific publication markets in Europe. European Commission Directorate General for Research. At http://ec.europa.eu/research/science-society/pdf/scientific-publication-study_en.pdf.
  34. Emacs General Public License. Retrieved on August 28, 2008 from http://www.free-soft.org/gpl_history/emacs_gpl.html.
  35. Enserink, M. (2007). European Union Steps Back From Open-Access Leap. Science, 315: 1065.
  36. Foldvary, F. (2003). The Lighthouse as a Private-Sector Collective Good. The Independent Institute Working Paper 46. At http://www.independent.org/publications/working_papers/article.asp?id=757.
  37. Geller, P. E. (2003). International Copyright Law and Practice. Matthew Bender.
  38. GNU General Public License (GPL). Retrieved October 17, 2008 from http://www.gnu.org/licenses/gpl.html.
  39. GNU Lesser General Public License (LGPL). Retrieved October 17, 2008 from http://www.gnu.org/licenses/lgpl.html.
  40. GNU Affero General Public License (AGPL). Retrieved October 17, 2008 from http://www.gnu.org/licenses/agpl.html.
  41. GNU Free Documentation License (FDL). Retrieved October 17, 2008 from http://www.gnu.org/licenses/fdl.html
  42. Harnad, S. (2001). Skyreading and Skywriting for Researchers: A Post-Gutenberg Anomaly and How to Resolve it. In G. Origgi (ed.), Text-e: The Future of the Text in Internet, # Palgrave Mac Millian. At http://www.text-e.org/conf/index.cfm?fa=texte&ConfText_IF=7
  43. Harnad, S. (2001a). The self-archiving initiative. Nature 410: pp. 1024-1025.
  44. Harnad, S. (2008). The Postgutenberg Open Access Journal. To appear in: Cope, B. & Phillips, A (Eds.) The Future of the Academic Journal. Chandos.
    Haythornthwaite, C. (2005). Social networks and Internet connectivity effects. Information, Communication, & Society, 8 (2), 125-147.
  45. Houghton, J. (2005). Digital Broadband Content: Scientific Publishing. Directorate For Science, Technology And Industry, OECD. At http://www.oecd.org/dataoecd/42/12/35393145.pdf
  46. House of Commons (2004). Scientific Publications: Free for All?. The Science and Technology Committee. At http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39902.htm
  47. Hurst, M. , Siegler, M., Glance, N. (2007). On Estimating the geographic distribution of social media, in ICSWSM'2007.
  48. IDATE (2007). Podcasting - Development prospects and strategic implications. Retrieved September 15, 2008 from http://www.idate.fr/pages/index.php?rubrique=etude&idr=16&idp=210&idl=7.
  49. Jeffery, G. (2008). Open Access: An Introduction. (On line) ERCIM NEWS edition. Available at: http://www.ercim.org/publication/Ercim_News/enw64/jeffrey.html
  50. Koepsell, D. R. (2000). The Ontology of Cyberspace. Chicago, La Salle/Ill.: Open Court.
  51. Knol: a unit of knowledge: http://knol.google.com
  52. Lange, D. (1981). Recognizing the Public Domain. Law & Contemporary Problems 44: 147.
  53. Lerner, J. (1999) "150 years of patent office practice", NBER Working Paper, No. NBER W7477.
  54. Lessig, L. (2001). The Future of Ideas. New York: Random House.
  55. Lessig, L. (2002). The Architecture of Innovation. Duke L. J. 51: 1783.
  56. Lindberg, S. W., Patterson L. R. (1991). The Nature of Copyright: A Law of Users' Rights. University of Georgia Press.
  57. Litman, J. (1990). The Public Domain. Emory Law Journal 39.
  58. MacCallum, C. (2007). When Is Open Access Not Open Access?. PLoS Biol, 5(10): e285
  59. Madras, G. (2008). Scientific publishing: Rising cost of monopolies. Current Science, 95(2): 163.
  60. Milmo, D. (2006). Publishers watch in fear as a new world comes into view. The Guardian, April 19, 2006. At http://www.guardian.co.uk/technology/2006/apr/19/news.science1.
  61. Morgan Stanley (2002). Scientific Publishing: Knowledge is Power. Industry overview report. September 30, 2002.
  62. Oetiker, T., Partl, H., Hyna, T. and Schlegl, E. (2008). The Not So Short Introduction to LaTeX2ε. Version 4.26.
  63. Oppenheim, C. (2001). The legal and regulatory environment for electronic information. Tetbury, Glos. : Infonortics.
  64. Origgi, G. (2007) Text-e: The Future of the Text in Internet, Palgrave Mac Millian.
  65. Pew Internet (2007). At http://www.pewinternet.org/pdfs/PIP_Tagging.pdf.
  66. Pew Internet & American Life Online Video 2007.
  67. Pew Internet Data Memo (2008). At http://www.pewinternet.org/pdfs/PIP_Podcast_2008_Memo.pdf
  68. Stallmann, R. (1996). What is copyleft ?. Free Software Foundation. At http://www.gnu.org/copyleft/copyleft.html
  69. Stallman, R. (1998). About the GNU Project. Free Software Foundation. At http://www.gnu.org/gnu/thegnuproject.html.
  70. STM (2008). An Overview of Scientific, Technical and Medical Publishing and the Value it adds to Research Outputs. Position Paper on Scientific, Technical and Medical Publishing, International Association of STM Publishers. April 2008. At http://www.stm-assoc.org/documents-statements-public-co/2008-04_Overview_of_STM_Publishing_Value_Research.pdf
  71. Sundén, J. (2003). Material Virtualities. New York: Peter Lang.
  72. United States Copyright Office. At http://www.copyright.gov/
  73. Tallmo, K.-E. (2005). The Misunderstood Idea of Copyright. Computer Sweden, September 2nd 2005. At http://www.nisus.se/archive/050902e.html
  74. Veltri, G. (2008). Social Computing and the Social Sciences. TGE Adonis Report. Paris: Institut Jean Nicod.
  75. VV.AA (2006). Assessing the impact of open access. Preliminary findings from Oxford Journals, Oxford University Press. At http://www.oxfordjournals.org/news/oa_report.pdf
  76. Ware Mark (2006). Scientific Publishing in Transition: An overview of current developments. At www.stm-assoc.org/storage/Scientific_Publishing_in_Transition_White_Paper.pdf.
  77. Warren, J. C. (1976). "Correspondence". SIGPLAN Notices (ACM) 11 (7): pp. 1–2.
  78. Wellcome Trust (2003). Costs and business models in scientific research publishing. At http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003184.pdf
  79. Wellman, B. (1988). Structural analysis: From method and metaphor to theory and substance. In B. Wellman & S. D. Berkowitz (Eds.), Social Structures: A Network Approach (pp. 19-61). Cambridge, UK: Cambridge University Press.
  80. Wikipedia: Copyrights (describing copyright issues related to Wikipedia). Retrieved January 10, 2009 from http:// en.wikipedia.org/wiki/Wikipedia:Copyrights
Personal tools