Open access data comes to Natural History Museum, London

Written by Kyle Copas, GBIF

The 80 million specimens in the Natural History Museum, London, comprise one of the world’s most important scientific collections—a repository uniquely capable of helping answer key questions about the past, present and future of life on Earth. With the initial release of a new data portal (http://data.nhm.ac.uk) and the datasets associated with a massive five-year digitisation effort, researchers and scientists the world over are gaining the benefit of greater access to these historic materials.

In February, NHM began publishing a dataset containing more than 2.5 million specimen records through GBIF.org, the open access data resource developed and maintained by the Global Biodiversity Information Facility (GBIF). In recent months, members of the GBIF Secretariat have provided NHM with behind-the-scenes assistance on implementing open-access standards, so we’re delighted to see their data become universally and freely available.

Upon publication, NHM’s specimen collection dataset became the newest of the 495 datasets already published by UK-based institutions on GBIF.org. Equally important is the dataset’s integration with complementary information about biodiversity from collections and observations around the world—and within the UK as well. To put this latest development in perspective, it’s worth remembering that, save for some notable exceptions like NHM and the Royal Botanical Gardens at Kew and Edinburgh, the great majority of more than 49 million individual occurrence records published from the UK are hosted and maintained by the NBN.

Still, as is the case with any historic museum, NHM’s collections offer a grand cabinet of curiosities, ranging from those with particular historical or scientific significance to the wonderfully unusual, and sometimes both. Take, for instance, this white-headed saw-wing (Psalidoprocne albiceps). First described in 1864 on the basis of this holotype specimen, this swallow is one of 8,000 type specimens in NHM’s bird collections—the world’s largest bird type collection—and the only one to be preserved in spirits. 

Some portions of the newly digitised NHM collection remain largely unresearched, so broader access may lead to unexpected insights. Among these are more than 10,000 marine invertebrate specimens from the ‘Discovery Collections’. Gathered in the course of various twentieth-century expeditions, notably from voyages in the waters of the southern Atlantic and Antarctic, these specimens could help researchers seeking potential sources of baseline information for research into ocean acidification.

Collections holders and data publishers may join biological scientists in benefiting from the ongoing collaboration between NHM and GBIF. Using a series of data quality checks built into the GBIF system, NHM developers have designed an integrated workflow that displays the ‘GBIF quality indicators’ for each occurrence record on the NHM data portal, as in this example (which relates to the image shown above of Papilio lycidas).

This simple tool, already popular with NHM curators, enables data managers and even users to flag records with problems quickly and easily, directing them to collections staff who can then address issues and improve the records. Currently created as a custom extension (source code available here), the tool may be extended further by its developers as they consider refining and releasing a stand-alone version that other publishers can apply to their own datasets.

All these exciting developments show solid returns on the investments that NHM and other UK contributors have made to the global infrastructure developed with and through GBIF, and we’re eager to watch all of it continue to grow as NHM advances toward its plans to digitise 20 million specimens—one quarter of its collections—during the next five years.

Web design by Red Paint