Storing your research data: who cares? How to do it? Why? A case study using Australian vegetation data


Successful maintenance and accessibility of ecological data enables comprehension of the nature and causes of ecosystem change and makes informed action possible. These days this is accepted. But the reality can be different. While collecting data in the field or from an experiment, you store them on your computer, in a server in your lab, or possibly ‘in the cloud’. When you have published your results, maybe you put a csv file in a repository at the request of a journal (to cover the data discussed), or maybe not, and you move onto the next thing. Not all of the data you worked hard collecting gets published no doubt, but does that mean it is not valuable? If you accessed some pre-existing data, did you find what you needed? Did you give up on some data sources because the data were not digital? Did that mean you missed some valuable information? And even if think you have stored your data well, what happens to your data over time? Interruptions to custodianship, outdated media, lost knowledge, and the continuous evolution of nomenclature makes conservation of environmental data challenging.

I shall use a case-study to illustrate some of the key assumptions often made, the pitfalls of data storage and custodianship, and the risks of technological arrogance. This case study is of the ‘rescue’ of Australian vegetation plot data collated in the 1980s from as far back as the 1880s, its re-rescue in the 2010s, and publication in an open-access repository (or two). The compiled data form an extremely valuable national collection that demanded publishing, or so I and my colleagues thought. I hope that the lessons learnt as we did this work will trigger a sober review of their value, and the importance of suitable and timely archiving, so the initial unique collection investment enables multiple re-use in perpetuity. The fundamental question remains: do we care enough?


Alison Specht, an honorary fellow of SEES, has a deep interest and expertise in the recording and preserving environmental observations across time and space. While doing her PhD and later her postdoc at this university, she was employed part-time in a project to help collate some archival data with the then Professor of Botany, Prof. Ray Specht. This, together with exposure to large infrastructure projects, such as the International Biological Program (IBP), started her life-long interest in the value of recording and preserving environmental data for future decision-making. After a career as a research academic, she became one of the first appointees to the Terrestrial Ecosystem Research Network ( and more recently was Director of the Centre for the Synthesis and Analysis of Biodiversity (CESAB) in France. In these positions she was able to expand her capabilities in facilitating trans-disciplinary, convergent research between scientists, policy-makers and managers to improve environmental outcomes, and in improving data management and preservation of archival data for effective long-term monitoring.

She established and is a core partner of the International Synthesis Consortium ( and is on the advisory committee of the Canadian Synthesis Centre, CIEE/ICEE and assisted the establishment of the new synthesis centre in Brazil, SINBIOSE. She is a member of several Research Data Alliance working groups. She has been a member of the DataONE ( Usability and Assessment Working Group since its inception in 2010 and is currently a co-leader of a Belmont Forum project, PARSEC, one of three Science-driven e-Infrastructure Innovation projects dedicated to bridging the gaps between environmental scientists and data scientists.