Aug 10, 2015

Losing data from the National Centre for e-Social Science (NCeSS) portal

Submitted by Andy Turner

Edited by Inna Kouper

The National Centre for e-Social Science (NCeSS) was a UK based program established around 2004 to stimulate the development of digital tools and services for social scientists. In around 2008 it adopted the use of Sakai as a system for communicating, developing information, storing and managing access to data. NCeSS was configured with a “hub” at the University of Manchester and a network of research nodes across the UK (see the Digital Social Research page for the list of nodes, many now archived).

Andy Turner, a researcher from the University of Leeds, worked on a project to develop demographic models for geographical simulation system. The project, abbreviated as MoSeS (Modelling and Simulation for e-Social Science), was one of the first phase research nodes of NCeSS. Some information is available on Andy’s page, but many links from there are now unavailable. Andy explains why:

“In 2011 the NCeSS Sakai Portal went off-line following a server failure and because there were no more resources for replacing the server. All the data was stored in a database on a National Grid Service server which for some reason had a catastrophic failure. All that remained for me to salvage were some backup database dumps, which fortunately also contained the portal front end configuration which enabled me with the help of my local IT team to get a database reader set up and a version of the NCeSS Sakai Portal working almost, but not quite as it had been. This was good enough to get some data out, but my local IT were not willing to make the system accessible again for security reasons. As a consequence of the problems some detailed social simulation model run results were lost. These would take a lot of time and effort to reproduce as they were generated on a fairly massive computer, which we got access to thanks to the UK-CERN collaboration GridPP and my collaborator Tom Doherty from the University of Glasgow. The work with Tom was undertaken as part of the Jisc funded project NeISS (a project to establish a National e-Infrastructure for Social Simulation), which was by the time of the server failure supporting the NCeSS Sakai Portal.

In theory, sufficient metadata has been stored from the simulation runs so that the results can be readily produced, but this is unlikely to transpire as the results were really only academically interesting as their inherent uncertainties were too great to make them of practical use. Anyway, I have given up on all that for now. I have moved on, but at the time it was rather painful seeing what probably amounted to almost three years of my effort turn to nothing. I may still get something more out of it in the long run because of the learning involved in this process. Explaining what happened to my academic superiors who desperately wanted research outputs was hard. One day I may return to research that pushes the boundaries of what we can and can’t do, but I know that is risky as failure is not tolerated well in academia."

Reflecting on the importance of preservation and curation of data, Andy writes:

“Preservation and curation are not easy. Sustaining research effort that may one day generate useful data and software is also not easy, especially when the goal is aspirational and probably quite a long way off and the steps are necessarily baby steps to begin with. In NCeSS, issues of sustainability were discussed from early on for each NCeSS research project and for the organisation itself. Documentation about this from 2008 was stored on the portal and so is now also inaccessible…

The soft learning experiences of failure and how these relate to sustainability and the importance of promoting collaboration, re-use and enrichment in the research process are key, but where can these be written up in the academic literature? The blog might seem ephemeral, but these days they can be captured by a directed Internet Archive WayBack Machine and preserved for the future.”

Our data stories blog is one of the places where such discussions can be recorded, and while we don’t have a solid sustainability plan, we do keep external backup copies of the stories. If you have stories similar to Andy’s to share, use our form, send it directly to datastories@dcc.ac.uk or register on the website and become a contributor.