Legal protection of databases


The term "database" is one that has a number of different meanings, particularly in computer science. However, in general it is taken to mean a "structured collection of data held in computer storage; especially one that incorporates software to make it accessible in a variety of ways"[1]; in particular the term is applied to data organized so that it is possible to extract all material meeting some criterion. The use of computer storage is not an absolute requirement and, as we shall see, the term can be applied to almost any large collection of information.

The position of databases in copyright law is a curious one. It is necessary to draw a distinction between the individual contents of the database and the database itself. While the contents may be protected by copyright, it is equally possible - even likely - that they may be simple facts not capable of legal protection. For example, a database might consist of the heights and positions of every mountain in a country or of the names and directors of every limited company in the UK. In each case these contents consist of simple facts that even may have been published previously. This means that, in general, the contents do not meet the originality test for copyright protection. Nevertheless a significant amount of effort may have gone into creating the database and it seems inequitable to allow others to simply copy it without compensating the original compiler - otherwise what is the incentive to compile such databases in the first place?

This essay starts by examining the protection of databases, both before and after the introduction of the Database Directive[2] and then moves to the Commission's evaluation published in 2005[3]. It assesses the various policy options proposed as part of that evaluation and their justifications and makes suggestions as to which should be adopted.

Traditional protection of databases

Traditionally there has been little or no separate protection of databases in copyright law. For example, the Berne Convention[4] - usually viewed as the basis of copyright - does not use the term at all and only mentions "collections" twice, only one of which is significant here[5]. In Article 2(5) it states that:

Collections of literary or artistic works such as encyclopaedias and anthologies which, by reason of the selection and arrangement of their contents, constitute intellectual creations shall be protected as such, without prejudice to the copyright in each of the works forming part of such collections.

It should be noted that this right is specific to "literary and artistic works". The implication of these words and of Article 2(8)[6] are that compilations are only protected if they are made up of the sort of works which already meet the originality test for copyright, as opposed to otherwise unprotected facts.

However, despite this, the UK courts have managed over the years to establish copyright in collections of data. In general this has been done on the basis of the amount of effort put into creating the collection. For example, a leading case here is that of Kelly v Morris[7]. Kelly went to a large amount of effort to create and maintain a directory of names and addresses in London. Morris then published his own directory, parts of which were admittedly copied from Kelly's, claiming in effect that the facts were in the public domain and not subject to copyright. However, the court held that:

generally, he is not entitled to take one word of the information previously published without independently working out the matter for himself, so as to arrive at the same result from the same common sources of information, and the only use that he can legitimately make of a previous publication is to verify his own calculations and results when obtained.[8]

This position was supported a few years later when Morris in turn sued the creators of another directory[9] on a similar basis, and it was held that:

no one has a right to take the results of the labour and expense incurred by another for the purposes of a rival publication, and thereby save himself the expense and labour of working out and arriving at these results by some independent road. If this was not so, there would be practically no copyright in such a work as a directory.[10]

In contrast, the results of random draws from a hat have been held not to carry copyright, either individually or in a collection, because there was "neither literary ability, nor skill, nor labour"[11] involved in their production.

The courts have continued to support this "sweat of the brow" doctrine, protecting the effort involved in compiling collections of simple facts. As late as 1994, the "Greyhound Services" case[12] held that the work and effort involved in assembling attractive greyhound races (stated as being 12 to 14 hours for each race meeting) meant the resulting race card was protected by copyright[13] but, on the other hand, the minimal work involved in calculating a dividend from a formula did not create a copyright in the specific value or, more importantly, in collections of dividend values:

If there is no copyright in the individual forecast dividends, then there cannot be copyright in the 12 forecast dividends for the 12 races. There is no skill or judgment and minimal labour in writing them down 12 times. They amount to a mere collocation to which copyright does not attach.[14]

This situation differed in other countries, of course. The USA followed the "sweat of the brow" doctrine for many years, but in 1991 the Supreme Court decided[15] that a compilation of facts could only be copyrighted if, and to the extent that, there was originality in selecting the facts to be included. In particular, a directory consisting of names in alphabetical order and numbers allocated according to a mechanical process was not protected.

The Database Directive

The Database Directive[16] was enacted to provide a Europe-wide framework for the protection of databases and other organized compilations of material. This was justified as harmonizing existing rules[17] but also because electronic databases were a growth area with an imbalance in the market[18] and "investment in modern information storage and processing systems will not take place within the Community unless a stable and uniform legal protection regime is introduced for the protection of the rights of makers of databases"[19].

The Directive defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means"[20]. This is a wider definition than most computer professionals would use - for example, it covers most web sites, in that each page is a separate work and the structured naming of the pages forms the systematic arrangement. It introduces two major rights relating to such databases. Firstly, where "the selection or arrangement of their contents, constitute the author's own intellectual creation"[21], the database (but not its contents) is explicitly protected by copyright. As such, it is unlawful to carry out acts such as reproducing, adapting, selling, or distributing it to the public without the permission of the author.[22] Secondly, it introduced a new "sui generis" right in relation to databases. The essential element of this right is that it is unlawful to extract or re-use a significant portion of the contents of a database, including making lots of insubstantial queries to extract a substantial amount, though simply consulting the database - even when not authorised - is not extraction[23]. For a database to be protected via this right there has to have been a substantial investment in the obtaining, verification or presentation of the contents.[24] The right lasts for 15 years from the creation of the database or, if it is made available to the public within that period, 15 years from when it was made available to the public.[25] One significant point to note is that any change to the contents, or even just further verification of them, that involves a "substantial new investment" is treated as creating a new database with its own term of protection; this protection applies to those contents that are unchanged as well as the new material.[26]

For many years it was believed that the sui generis right created by the Directive protected all significant databases of facts. However, the European Court of Justice held that investment in creating material to be included in a database was not covered by "obtaining", nor was checking done during the creation process "verification", and if that was the only substantial investment involved then the database was not protected by the right[27].

The 2005 evaluation

Article 16(3) of the Directive requires the Commission to generate a report every three years (starting in 2000) on its application and, in particular:

the application of the sui generis right, including Articles 8 and 9, and shall verify especially whether the application of this right has led to abuse of a dominant position or other interference with free competition which would justify appropriate measures being taken, including the establishment of non-voluntary licensing arrangements. Where necessary, it shall submit proposals for adjustment of this Directive in line with developments in the area of databases.[28]

Despite this requirement for the report to be triennial, the first actual evaluation was done in 2005 and published in December of that year[29]; the delay may have been because only three member states transposed the Directive within the deadline and it took until 2001 for the other 12 to do so[30]. The Evaluation was based on a restricted survey of 500 companies involved in the database industry within the EU, of whom 101 responded[31], the Gale Directory of Databases[32], and other rightsholder views expressed to the Commission.

The Evaluation attempted to examine the effects of the Directive in relation to its two policy objectives of harmonization and of protection of the database industry. Starting with the former, copyright protection was removed from non-original databases in those countries which followed the "sweat of the brow" doctrine (UK and Ireland) or which protected "catalogues of information" (Denmark, Finland, and Sweden) and this seems to have caused little or no issue, presumably because courts are already used to applying the test of originality as it applies elsewhere in copyright law. However, although the sui generis right is also supposed to be harmonized, this does not turn out to be the case in practice. The statement of the right uses a number of terms that it does not define, such as "substantial investment", "obtaining", and "substantial part". Both national courts and the ECJ have reached diverging opinions on these terms, leaving an uncertain position that fails to harmonize the law in any real sense. For example, in Algemeen Dagblad v Eureka[33] the court concluded that creating the list of a dozen headlines found on a newspaper's website was not a substantial investment, even though a new such list was created daily while, on the other hand, in Kidnet v Babynet [34] a different court decided that constructing a list of 251 website links was substantial investment and a copy of 239 of them infringed the sui generis right. In NVM v De Telegraaf[35] it was held that even a very few items of data might form a "substantial part" of a database because those items might be of great value to someone[36]; on that argument, there is never an "insubstantial part" and it is hard to see how this position can be compatible with the wording of the Directive. Finally, the ECJ examined the term "obtaining" and reached the startling (to many) conclusion that it excluded any effort in creating the contents of a database and only included that involved in collating existing material[37]. The Evaluation concludes that the sui generis right is difficult to understand. Furthermore, it expects that database makers will find ways around the exclusion established by the ECJ judgement and expresses a concern that, particularly in the instance of "single-source" databases, the sui generis right comes perilously close to protecting raw data rather than expression, contrary to the longstanding principles of copyright law[38].

Moving to the second policy issue - protection of the database industry - the Evaluation is even more critical in its findings. Unsurprisingly, a majority of respondents (65%) believe that the protection of databases has been strengthened by the Directive while a smaller majority (54%) feel that the ECJ judgement has reduced the number of databases protected[39]. However, when it looked at whether this additional protection has actually encouraged the European database industry the evidence ranges from ambiguous to negative. The Gale directory shows a significant growth in the number of "Western Europe" databases from 1996 to 2001 (the year in which transposition was complete) but then a fall back to 1998 levels by 2004[40], which might suggest that the effects of the Directive were to reduce the number of databases in the market. On the other hand, Gale counts all databases equally and does not indicate how much data is represented; it is also unclear whether on-line databases are included in the numbers. Thus at best these figures need to be taken with a large amount of salt. Similarly, when the "Western Europe" database sector is compared with that of North America - again, according to Gale - there was a slight improvement running up to 2001 but a significant decline in relative performance thereafter[41]; if the figures are examined in more detail, this is because the number of North American databases grew by over 28% in those three years while the number of Western European ones shrank by 24%[42].

Nevertheless, the publishing industry claims that the sui generis right is essential to their businesses, even though they fail to provide any empirical justification for this position and fail to explain how it is consistent with the buoyant US database market that lacks that protection[43]. Of course, there is an obvious vested interest in the continuation of the protection and rightsholders have a strong political voice that cannot be ignored.

There were also concerns about the right raised by others. These tend to focus on the question of exceptions: the exceptions to the sui generis right are rather more restricted than is usual in copyright law. For example, although there is a "private use" exception, it is limited to non-electronic databases[44]. The biggest single concern expressed was by academics, concerned that essential information required for scientific research and educational purposes is being locked up in databases, impeding research and reducing the resulting benefits[45].

Rather than reach a single conclusion on future policy, the Evaluation canvasses four different options, to which we now turn[46].

Option 1 - repeal the Directive

Repealing the Directive would have the basic effect of removing the harmonization that it provided and eliminating the sui generis right. The legal situation in the member states would vary, firstly as the repeal was transposed and then as each chose anew how to protect databases. For example, the UK might well revert to the "sweat of the brow" test, which would presumably include "spin-off" databases where the substantial effort was in creating the data in the first place and not in assembling them into a database[47]. On the other hand, Parliament might well decide that the ECJ decision took the correct path and exclude the effort of creation from the test. Meanwhile, other countries will probably make their own choices in these matters. The result would be significant legal uncertainty for database creators. Even if countries merely reverted to the pre-Directive situation, there would be three significantly different regimes within the EU: the UK and Ireland with "sweat of the brow", the Nordic countries with the "catalogue" rule, and the remaining countries following the "originality" test that was maintained by the Directive.

In this varied legal environment, the Evaluation suggests, database providers would turn to other means of protection. For example, contract law could be used to impose specific restrictions on use of a database, or technical measures such as access control mechanisms can be used to prevent unwanted dissemination or excessive use of the database. To some extent this is likely to happen irrespective of the legal regime in place, since providers will want to maximize their income from their database and this means preventing any use other than that explicitly paid for. The existence or not of copyright or sui generis protection is unlikely to change this behaviour or discourage it, since the provider has no incentive not to use these other mechanisms as well.

Therefore the removal of harmonization is unlikely to reduce the practical protections available to database providers but it does introduce uncertainty when other protections have not been used. Whether or not the lack of harmony has distorted the internal market in the way suggested by the Directive[48], there has been no suggestion that harmonization has been harmful nor any significant criticism of it, and therefore repeal of the whole Directive, or of this aspect of it, would be misguided at best.

The second effect of repeal is, of course, to withdraw the sui generis right. Again this would lead to the confusion engendered by the lack of harmonization; other issues related to this repeal are a subset of those discussed in the following section.

Option 2 - withdraw the sui generis right

The second option is to retain the "originality" test for copyright protection and simply repeal those parts of the Directive that implement the sui generis right. Doing this would mean that copyright protection remains harmonized, meeting the relevant objections to complete repeal. Since it is the sui generis right that has caused the most problems in interpretation and since it is the additional justification for the creation of this right - the need to protect the European database industry - that has the least evidence to support it, withdrawal of just this new right might appear to be more appropriate than complete repeal.

However, withdrawal would mean that the situation with non-original databases would still remain confused since there would be no harmonization of any protection of such databases. Arguably it would not be permitted for a member state to provide copyright protection for non-original databases, since the section of the Directive relating to copyright reads:

databases which, by reason of the selection or arrangement of their contents, constitute the author's own intellectual creation shall be protected as such by copyright. No other criteria shall be applied to determine their eligibility for that protection.[49]

The wording "no other criteria" may have been intended to indicate that all original databases are protected by copyright and none may be excluded, but it could equally have meant that originality is the only test to be used for copyright protection and therefore no other databases may be included in the scope of the protection. This latter view is supported by the preamble:

the criteria used to determine whether a database should be protected by copyright should be defined to the fact that the selection or the arrangement of the contents of the database is the author's own intellectual creation;[50]

The phrase "should be defined to the fact" is not the clearest piece of drafting ever, but "if and only if" would appear to be the most sensible interpretation. This would mean that the UK could not revert to the "sweat of the brow" test for database protection.

However, it would still be open to member states to create their own versions of the sui generis right or some other protective right that would be similar to it, or even one similar to copyright protection provided that it wasn't called "copyright". For example, the Nordic states could reintroduce their "catalogue" rule or the UK could retain the "database right" introduced during transposition of the Directive[51].

The most likely effect of withdrawing the sui generis right is that a few member states would introduce an effective replacement - though quite probably with subtle differences - while the remainder would leave non-original databases unprotected or rely on other torts to protect them. Since one of the primary justifications for the Directive was that the lack of harmonization of intellectual property law was distorting the internal market, there is no more reason to repeal this one harmonized right than there is to repeal the whole Directive. Of course, as with option 1, it is likely that database producers will turn to other methods of protection in any case and so repeal is unlikely to have a major effect in practice.

One possibility would be to not simply withdraw the right but make it explicit that users of databases are permitted to extract and re-utilize the data where it is not covered by copyright. This would avoid the issues of an unharmonized right, but it is unlikely to be acceptable to the commercial database industry and is almost certainly going to be unacceptable politically.

Option 3 - amend the sui generis right

Rather than a simple repeal of part of all of the Directive, with the attendant issues, this option involves keeping the sui generis right but changing it in some way to correct some or all of the shortcomings noted in the Evaluation. A number of possibilities are canvassed, but in every case the protection of databases remains harmonized throughout the EU.

The most obvious starting point is to address the issue considered by the ECJ: is the sui generis right intended to protect investment in creating the content of a database, in collating the information to go into the database, or only in the creation of the database structure? The judgement chose the second of these possibilities but at least some language versions of the Directive are less than clear on this[52]. While some might see a benefit in fixing this, it is also the case that the judgement appears to have settled this matter and therefore no further change is necessary. As the Evaluation points out, there is a risk that change would merely confuse matters further by introducing "yet another layer of untested legal notions"[53].

On the other hand, the judgement leaves "single source" databases in a curious position. If a person puts significant investment into creating a collection of facts which are not themselves covered by copyright and then collects them into a database which she makes available, even for a fee, she has no protection against a third party extracting those facts and republishing them. But if she sells the collection of facts to another person who puts trivial effort into organizing them, the sale price constitutes significant investment in obtaining the facts and the resulting database is protected by the sui generis right. The same is true if she simply leaves the facts in a state requiring the other person to put effort into collating them (e.g. by only making printed copies available so that they need to be re-entered into a computer). If the purpose of the new right was to encourage the creation of databases, then surely this right should be applied to all databases no matter how or by whom they are created?

Of course, there are potential difficulties with this approach. The first of these is that protecting "single source" databases may mean that the right effectively protects a set of facts rather than their expression. A person coming up with otherwise uncopyrightable material can gain protection simply by putting that material into a database; indeed, they can then use it to prevent others from making use of those facts at all. This is contrary to a basic tenet of copyright law: that it protects forms of expression and not the underlying data. While the creator of such a database may eventually be held to be abusing a dominant position, as in the Magill case[54], this is explicitly an exceptional situation[55] and requires specific tests to be passed before a court will take action, such as trade between member states being affected and a new product being suppressed. This will, in general, be a high hurdle to overcome. The original Commission proposals for the sui generis right included mandatory licences when a public body was a single source but this did not make it through to the final Directive[56]; in any case it would not have applied to private database creators. It would be possible to make mandatory licensing a quid pro quo for use of the sui generis right with single source databases, but this would no doubt lead to a separate set of disputes over whether any particular set of data were indeed "single source". While many cases will be obvious, there are likely to be others where it is in dispute whether a third party could recreate the data independently or not, or could only do so at significant cost.

There is also the question of whether many single source databases are actually deserving of protection. Many non-original databases are "spin-offs" of other activities and would have been created whether or not any protection existed. For example, the list of television programme names and times that were at the centre of the Magill case could easily be entered into a database; indeed, any listing sorted by date and time probably meets the definition in Article 1(2). Much effort and investment has gone into creating that list, but not because there was an intention to create a database that can be exploited; rather, the effort is because the list is an essential stage of the television production process and it would be needed purely for internal purposes. The same applies to the list of properties generated by a taxation authority, or the schedule of matches put together by the controlling authority of a sports league, or many other examples. The effort would have been made whether or not the resulting database can be exploited. These are known as "spin-off" databases and the producers are referred to as "secondary" in the Evaluation. There are fairly strong arguments against protecting such databases: the lack of protection will not harm the database industry, the monopoly created by protection does not result in any countervailing public benefits, and restricting access may prevent others from making good use of the data and generating benefits of their own. The Evaluation also notes[57] that case law in some member states already excludes spin-off databases from protection; the Algemeen Dagblad case is cited as an example[58]. Such exclusion does not seem to have resulted in significant harm.

All these arguments suggest that the ECJ reached the right decision (in the ethical, not interpretative, sense) by effectively excluding such databases from the sui generis right but, at the end of the day, it is for the European legislative process to reach a conclusion on this.

Another change that could be made as part of this option is to replace or clarify some of the terms used that have resulted in confusion in national courts. For example, it is not intuitively obvious why consulting an on-line database and examining the results is not "extracting" data from it yet printing out those same results for later study is. It could be made clearer that the intention was only to prohibit actions with a detrimental economic effect on the rightsholder rather than any substantial extraction[59]. Terms such as "substantial" could be properly defined, or at least better guidance could be added. In particular, the words "evaluated quantitatively" clearly imply some kind of numeric threshold, yet neither the Directive nor the ECJ decision actually provides a value or even guidance towards one. On the other hand, as already mentioned, the Evaluation expresses caution against any such editorial activity, pointing out that it could equally well make things worse.

One exception to this caution is something mentioned by the Advocate General in her opinion in the BHB case[60]: Article 7(2) defines "extraction" and "re-utilization" in terms of a "substantial part of the contents of a database", yet elsewhere (including Articles 7(5) and 8(1)) the terms are applied to "insubstantial parts". This contradiction would be simple to resolve and the ECJ seems not to have been fazed by it[61].

The Evaluation raises the question of exemptions. The exemptions applied to the sui generis right are much more restrictive than are typical in copyright law. For example, there are no exemptions for commercial research, news reporting, adaptation for the visually impaired, teaching, or library archives, and the exemption for private study does not apply to electronic databases[62]. There is no obvious reason why these traditional exemptions should not apply equally to the extraction and re-utilisation of databases, particularly since most are restricted (at least in UK law) to not unduly prejudice the commercial rights of the database provider. In particular, the scientific and academic community believe that the sui generis right was aimed at commercial databases and requires rethinking in respect of its application to scientific ones[63]. This could be done by means of additional exemptions or by distinguishing the two cases within the Directive. However, this latter is not a step to be entered into lightly.

Finally, one matter that is not addressed by the Evaluation but I consider to be worthy of concern is that of "dynamic" databases. Where a database is subject to regular updating, or even simple re-verification of its contents, the result under Article 10(3) is to create a new database that has its own sui generis "clock" separate to that of the original database[64]. Even if only a small (but significant) part of the database is updated each time, the result is to provide effectively perpetual protection. This provision makes a certain amount of sense: if one considers a street directory that is updated annually, most of the contents are unchanged from year to year yet every single entry needs to be re-validated for each issue. On the other hand, in other areas of copyright law such updating does not affect the expiry of the copyright, or at most only affects the changed items, no matter how much effort was involved in doing the updating. It would seem reasonable at least to restrict the extended time limit of the sui generis right to that material that was added to the database at each revision.

Option 4 - do nothing

The final option in the Evaluation is to leave well alone and do nothing. It has the benefit that it is simple and cheap. Change is always risky and, even if the sui generis right is not providing the benefits to the European database industry that were promised, it is not doing any particular harm. Even if the right was too wide, the ECJ decision has tamed it. Doing nothing is always an attractive option, but in this case it has disadvantages: even though the two repeal options can be rejected because they would leave a lack of harmonization, this does not mean that some kind of change is not necessary. In particular, it would appear that the sui generis right is causing difficulties for scientific research and, at the very least, the exemptions applying in this area should be examined and modified appropriately.


The Directive has harmonized the application of copyright protection to databases by limiting it to those that meet the "originality" test. This has provided certainty for those supplying databases in more than one member state. There is no reason to withdraw this benefit and therefore no justification for repeal of the whole Directive.

It has also created a new sui generis right to prevent others from extracting or re-utilizing substantive portions of a database where there was substantive investment in obtaining, verifying, or presenting its contents. This was intended to provide protection for those parts of the European database industry that fell outside the "originality" test and was clearly based on the UK's "sweat of the brow" doctrine. This right appeared to come perilously close to protecting raw facts, particular when these have a single source; however, ECJ rulings have distinguished the investment in creating data from that in obtaining it.

The protection of this right does not appear to have generated significant benefits. It has not been replicated outside the EU; even other Common Law jurisdictions have abandoned the "sweat of the brow" doctrine. I would therefore recommend that the right be removed. This cannot be done by simple repeal - this would leave the potential of a tangle of different protections in the different member states - but instead users should be explicitly granted the rights of extraction and re-utilization currently reserved to the creator.

If this approach is unacceptable, there are still areas of the sui generis right that need to be amended. These include the provision of exemptions to match the rest of copyright law and, in particular, any modifications necessary to prevent the right having a prejudicial effect on scientific and medical research. They also include fixing a technical error. But I would not recommend attempting to change definitions - this is more likely to do harm than good - and, for the reasons expressed previously, would not attempt to overrule the ECJ decision that effectively excluded "single source" databases from protection.


Annex 1 - calculations relating to the number of databases

(See footnote 42 and the referring text.)

Year W.European Market share (%) Estimated market size
  databases W.Eur. N.Amer. Global N.America
        (minimum) (maximum) (minimum) (maximum)
1992 1838 26 68 6936 7208 4682 4937
1993 1938 26 68 7313 7600 4936 5206
1994 1923 24 68 7849 8183 5298 5605
1995 1931 24 69 7882 8217 5399 5711
1996 2052 22 69 9120 9544 6247 6633
1997 2696 28 64 9460 9804 6007 6323
1998 3092 29 63 10481 10849 6551 6889
1999 3262 30 64 10695 11058 6791 7132
2000 3546 30 63 11626 12020 7266 7633
2001 4085 34 60 11841 12194 7045 7377
2002 3941 33 62 11764 12126 7235 7579
2003 3820 29 65 12949 13404 8352 8779
2004 3095 24 72 12633 13170 9032 9548

Column 2 is taken from figure 5 of the Evaluation and columns 3 and 4 from figure 7. The estimated global market size is derived by finding the numbers for which the figure in column 2 is the indicated percentage in column 3, plus or minus ½%. The North American market is then derived by combining these figures with those in column 4, again plus or minus ½%. For example, looking at the year 2000, the Western European figure of 3546 is 29½% of 12020 and 30½% of 11626, meaning the global market lies somewhere between these two numbers. Then the North American market must be between 62½% of 11626 (i.e. 7266) and 63½% of 12020 (i.e. 7633).

Compared with the Western European peak year of 2001, the percentage changes in 2004 are:

