Free our Data?

Last night I was at the Free Our Data? discussion at the University of Manchester, running as part of the ESRC Festival of Social Science 2007. This was interesting, not least because I have been thinking about this debate purely in terms of geographical data, yet other types of data bring other issues and concerns. The question-mark is important, as it represents the crux of this evening’s debate. Should public sector data be available for free, or freely available?
The session was recorded and webcast by the University, so I won’t try and summarise the entire debate, but there were many thought-provoking issues raised by the panelists.

Charles Arthur, Technology Editor for the Guardian fervently believes that digital data should be freely available at the point of use as the costs of dissemination, reproduction etc are virtually nil. Some costs should be borne by increased taxes, but that the benefits to the economy created by free access to the data would outweigh that. Not sure I agree with the first point, that data dissemination doesn’t cost anything, but I can certainly see at our small-fry level we could charge less for the work we do, and do a better job, if we weren’t incurring an overhead for the cost of licensing data.

Jill Matheson, Director of Census, Demographic and Regional Statistics at the Office of National Statistics had three basic and eminently sensible points: That the value of data is in it’s use; that protecting confidentiality is paramount; and that there’s no such thing as free data, only hard decisions as to who pays for it. Jill then went on to say that the more people use data, the better the Quality Assurance Process.So- if we want more people to use the data, then let’s make it free. However, I’m not suggesting a “Statistipedia” approach, as that kind of editorial model would not be appropriate! Jill argued against Charles’ assertion that data can be disseminated at no cost, saying that making data accessible is what costs, rather than making it available. On the subject of confidentiality- I initially wondered if that was a bit of obfuscation, as once statistical data is anonymised confidentiality is obviously not an issue. I then wondered, however, how I would feel if my street was classed badly on the basis of some demographical analysis. That would be personal to me, but at what stage or scale does the data become safe- at the level of a zone, a town or city, a county?

Duncan MacNiven, the Registrar General for Scotland highlighted the way that they are making a lot of information freely available North of the border. However, I was intrigued by his argument for charging for some types of data but not others. He argued that demographic and census data should be (and is) free at the point of use, subsidised by the state, but that genealogical data should not be. Why should Scottish tax payers subsidise people in Australia looking for information on their family history?

I find this difficult to agree with. As a tax payer in England I subsidise a lot of things that I am not interested in, such as healthcare for smokers. As a tax payer in the North of England, I am subsidising the cost of hosting the Olympics in 2012 in London. As an archaeologist, I have seen jobs cancelled since “we” won the Olympic bid, because developments have been curtailed or stopped and the money channeled into the Olympic development. Furthermore, once the data is out there in the digital realm, it’s available to anyone, no matter what country they are in.

Duncan Shiel, Head of Strategy at Ordnance Survey, was always going to be on the least popular side in this debate, but he did make a good point that having good quality data is more important than free data. I agree with this completely and sometimes worry that too much focus on the cost takes away from sensible debate, and leads to poor quality solutions or ways around the problem. Duncan’s next point though, was something I would have liked to challenge, had there been time. He said that the private sector should add value to public sector data. Lovely. So the private sector should pay to use the data, but should then give something enhanced back to the public sector for free? I don’t think so!

Finally Peter Elias, Professor of Labour Economy at the University of Warwick and a Strategic Advisor to the ESRC said that in an ideal world data for publicly funded work for the public good, at a publicly funded institution such as a University, producing results that are going to be publicly available should be able to use data freely. That seems sensible, but not always economically viable.

I went away thinking that commercial archaeological units, in the UK at least, are in a difficult position. We are not academic, in the sense of being part of an educational institution, although some of us are educational charities. However, the work that we do comes from a piece of planning law- PPG16, which requires that the majority of development in this country has some level of archaeological assessment undertaken. We have a duty to preserve the archaeology, preferably in situ but by record if necessary, and to retain it’s context for future generations. Good units will create an academically rigorous report on their findings, often subject to peer review in a reputable journal. And yet, we have to pay through the nose to use public data, unless we happen to work for English Heritage, or can persuade the developers that they should lend us their data for the duration of the project.

End thoughts- it’s a very complicated and rich debate, and perhaps geographical data is the easiest to resolve as it doesn’t have confidentiality issues. Both sides are quite entrenched, but at least the discussion is happening.

3 Comments so far

  1. Dave Smith on March 16th, 2007

    I have always been a proponent of free access to data, but have learned since that there are a number of instances where access to data can pose significant problems. Some of these are genuinely security-related (certainly the Archaeology community has concerns about sites which might stand risk of looting or being otherwise compromised), some regard confidential business information, and some may relate to pending litigation, enforcement or cost recovery. In some instances, it may relate to potential misuse of data. For example, if a third party stands up a site using governmental data, how well are we assured that the data is complete, that it is refreshed in a timely fashion, and so on?

    I think the trend for the future will be better served with free live data services published out by governmental agencies, as opposed to free data as flat files ready for abuse and misuse. Archival data should also be made available – this could certainly be downloadable files – clearly documented and associated with proper metadata denoting its scope and timeframe.

    I do also agree that in most instances much of the cost of developing the dataset has already been incurred by the agencies and funded via taxpayer investment, however with many budget-strapped agencies, in many instances sustainability or scalability to support external demand is difficult. The question of cost is a tricky one.

    Many great points, I enjoy your blog!

  2. [...] an interesting writeup from an archaeologist’s point of view. [...]

  3. admin on March 29th, 2007

    Thanks for the great comment, Dave. I agree with what you’re saying – that data needs to be reliable, or at least we need some measure of it’s reliability. This does relate to my concerns in my next post about openstreetmap being suggested as superior to the Ordnance Survey.

    Also- that’s why we (at Oxford Archaeology) decided to release our data as web mapping services. We can then ensure that whoever uses it gets up to date information. We are trying to deal with the issue of opening up our sites to the risk from looters- partially by removing the actual grid references from the datasets before they make it as far as the server, and possibly obfuscating the positions slightly, but this is all a work in progress.

Leave a reply

bodybuilding steroids