The next few years will be a crucial time for those recording, managing and using biodiversity data.
Changes in information technology, reducing budgets and new recording methodologies are having a big impact on data management, so now is the time to take stock and discuss the implications for the systems we use in the future.
Firstly though, let’s decide what we actually mean by data management. The phrase “data management” is perhaps one of the less exciting uses of the English language. For many, it conjures images of tedious computer programmes, masses of complex relational tables and tortuously long queries that end up failing after hours or even days of processing. Hardly a thrill a minute – the antonym of “sexy”.
Despite this however, there are those in the world of biodiversity recording for who the management of data is a very important task and who genuinely do care about it. For them, a correct decision on what system to use or how records should be formatted can make a huge difference to how easily and efficiently biodiversity data can be shared and accessed. For this reason, data management is actually quite a hot topic, which can often be controversial, especially when people consider how much it can cost and then try to evaluate what the optimal level of investment is. One thing that I do think is often overlooked is what the term “data management” actually means.
Here are a few figures for consideration. A species record can theoretically come from anyone, but it was estimated in 1995 by John Burnett, Charles Copp and Paul Harding that there were at least 40 000 active recorders. In a personal comment on social media, Richard Comont suggested that over 13 000 people have submitted one record to iRecord. Whilst these are very differing figures, they both suggest that biodiversity database managers need to be able to deal with records arriving from tens of thousands of different people. When it comes to linking records to a location name, there are infinite possibilities (people could, and often do, describe the same localities in their own idiosyncratic ways). In one recording area (Cheshire) there are at least 19 000 accepted location names, as well as many more synonyms. In terms of biodiversity itself, how many things can actually be recorded? I asked this question of Chris Raper who curates the UK Species Inventory at the Natural History Museum. Whilst it is estimated that there are around 70 000 species recorded in the UK, people can record (and often have to record) at different taxonomic levels, resulting in the theoretical potential for around 110 000 taxa to be recorded. These are just the taxa that been recorded in the UK so far however.
These figures give an indication of how complex the biodiversity data environment can be. The variables described above have been combined and recombined over 114 million times to create the species records available on the NBN Gateway so far, but the number of records being created and shared is accelerating, so the figure of 114 million will increase rapidly. At a local level, or for species interest groups, the numbers are reduced, that is to say there are fewer taxa to be worried out, and possibly fewer spatial references. However, even taking this into account, managing one of these databases is still a complex task, and it is getting larger and larger.
And this is just for species records. There are also databases of sites and habitats that need to be managed.
What is the required activity?
With all this in mind, I’d like to suggest that the definition of data management for the biodiversity data sector should go something along the lines of “the activity required to make sense of a highly complex data environment and to produce useful databases”. But what is the required activity?
I discussed this with two database managers (Teresa Frost, and Sue Timms of Leicestershire and Rutland Environmental Records Centre) who advised me that the initial tasks of data management will be carried out by the recorder themselves, who may be checking or double checking a taxonomic determination or grid reference before using and sharing their records. The second person likely to play a role in data management is the person receiving records on behalf of an organisation such as a scheme, society or LERC (local environmental records centre). Their task will also be to conduct validation and verification checks before the records join other records in a local or national database. This is an invaluable role as it is very important to ensure as many inaccuracies are corrected as soon as possible before records join a database and are reused. The vast majority of recording is done with pen and paper and a malformed grid reference, or mistranscribed species name can occur at any time, no matter how experienced the recorder is, especially when considering that, as Teresa says, taxonomy is “a live science” – i.e. taxonomic names can change quickly and often.
However, this is only part of the picture. In order to resolve any questions raised by records, a database manager will have to act as a point of contact for recorders, as well as other experts, particularly for matters of verification. This is the more human facing “soft” side of data management and is just as important as the “hard” computer tasks.
So in summary, a very complex data environment means that biodiversity database managers are tasked with ensuring data can be used effectively. Key to this is both engaging with the records (i.e. through the computer) and engaging with the recorders and the people needed to provide support to recorders. In this respect data managers provide an interface, both between people and systems, and between people and each other.
Will it always be like this?
According to the small sample of database managers consulted for this article it is starting to change. Online recording, for example, is having an effect. One of the successes of the range of online recording systems now available, is some of the data management tasks described above can be done up front, such as ensuring the use of valid spatial reference and correctly spelled species names. There is also the added plus of engaging a greater number of recorders. However, this can make it harder to guarantee the provenance of records and so whilst work on some of the initial data checks is reduced, there is more work to do to ensure that species have been determined accurately and to place the right verification expertise with the right recorders. Of course there is also the very important and labour intensive task of training and supporting recorders on online recording systems, which some people would call a data management task.
The cornerstone of successful data management is (other than the people doing it) is the database system used. For local environmental records centres, the choice of system is overwhelmingly Recorder 6. Only three of them don’t use it. Recorder 6 has been designed, supported and promoted for the purpose of hosting large biodiversity databases and there is little doubt that this has been achieved to a large extent. An ALERC estimate for the number of records that have passed through LERC Recorder implementations is around the 90 million mark, and there must be many more on the databases of local and national schemes and societies.
Recorder is not universally praised however, and even some of the people who have been using it for years and years still have their gripes. Those that do praise it, praise its flexibility as well as its ability to link records to sources and track changes to records and ensure provenance.
The future of Recorder
The future of Recorder is uncertain though, and this is going to affect tens of millions of records within the LERC community alone. On the 20th May 2014, it was announced on the NBN Forum that the JNCC would cease its support for Recorder “once the online environment contains core Recorder 6 functionality”. In August 2015, I asked for an update from Steve Wilkinson of JNCC, who said “I personally would like to see this [the MoA between JNCC, NE and NRW that supports Recorder] continue – as it really simplifies the data collation task here let alone the effectiveness of investment elsewhere (e.g. LRCs). How we see it moving in time is towards greater on-line data capture which will change the collation task significantly (and make it better for everyone). But right now I can’t say anything about where it is going to move to nor how fast.” So the message is possibly continued public sector support for Recorder 6 for the time being but, with a move online in the future (Steve’s comments were made at a time of financial uncertainty, hence the lack of confidence in the exact direction or speed of travel).
ALERC’s response to the 2014 announcement was to launch a consultation with its members and their partners to find out more about their attitudes towards data management. The results suggested that people are, on the whole, interested in a move towards online oriented databases, in line with the current position of the Recorder steering group. In fact at least one pioneering national scheme organiser is already looking at teaming up with their LERC to see if, together, they can implement Indicia to help manage his scheme’s data online, and other online solutions are being looked into by other LERCs too. One of the main drivers for this is the increasing difficulty for many local authority hosted LERCs to get the support they need from their own IT departments for esoteric software such as Recorder.
Is there an answer?
Returning to the original question of where are we going, is there an answer? Yes, it looks like we are going online, but exactly when and how still remains up for grabs. Therefore I think it would be great if organisations (such as NBN Secretariat and JNCC) publish a timetable by which they think this can happen, and to start to plan a route. And I would also like to see individuals use the forums provided (eNews, NBN Forum, NFBR News etc.) to discuss their needs and help shape the technology of the future. Let’s all also post any news of anyone who has successfully changed their data management systems so experiences can be shared. I know I will.