Mar 30, 2015

Dedoose crash and data loss

Dedoose  is a web application that supports qualitative and mixed methods research that relies on text, images, audio- videos, spreadsheets, and so on. It was developed at University of California, Los Angeles (UCLA) with support from the William T. Grant Foundation. Web accessibility coupled with cloud storage and processing are among the key features of Dedoose and its “Anytime, Anywhere, Any Internet” motto. Researchers store data on Dedoose servers and can access it from anywhere and on any platform.

On May 6, 2014, Dedoose platform crashed. The cascading system failure coincided with a full database encryption and backup and resulted in the corruption of the entire storage system. The team wrote on its blog that “data added to Dedoose up to mid-April will be recovered and restored. … we are not optimistic we will be able to recover data added to the system for roughly the 2 – 3 week window preceding the failure”. It is not clear how many and how much, but researchers lost their data on Dedoose. Some comments below from “Hazards of the Cloud: Data-Storage Service’s Crash Sets Back Researchers”  illustrate the issue:
“... I lost about 20 hours of work, which isn't the end of the world, but hurts when you are trying to finish a PhD and work full time. The reason why people don't have back ups is because the back up isn't necessarily useful. The file that you work on in the program is essentially an annotated document (or audio/video file) that you select chunks as excerpts and then apply codes to, so that later you can analyze the corpus of documents for themes. The export from Dedoose is simply an excel file of the excerpts you've made, so it helps to have as a reference, but you wouldn't be able to work from it the way you can work from a word file that you've backed up.”

“... Many of us DID back up... however, I don't think you understand that backing up coded video and or audio files in Dedoose does not back up the project as you would view it within Dedoose (online)... only as a spreadsheet... You CAN, however, fully back up an NVIVO project or any file on your hard drive as an EXACT duplicate (not the case with Dedoose). ... I am completely dependent on them and their promise that they backup nightly and protect our data so well that we don't have to worry about it.”

“Allow me to add only that the fact that Dedoose apparently outputs only a spreadsheet evidences that these platforms, for all their bells and whistles, are databases. It is important, IMHO, that researchers become adapt at building their databases from the ground up, and only after doing so use any CAQDA. This doesn't (always) mean learning mySQL, Phyton, or other programing languages. It does mean knowing your way around Excell (or other spreadsheet app) and how to structure your data so that it can be moved into and out of platforms such as Dedoose.”

“Now I feel kind of empowered by my "keep the data in the hard drive, backup to the cloud, and once a semester, to an external hard drive" regimen.”

The crash raises some interesting questions about cloud vs local storage, backup possibilities and the responsibility of clients and vendors. How can we backup data in the cloud if some of the processing (visualizations, annotations, etc.) are not exportable? How many copies is good enough? What does the client (user) need to check for before signing up for cloud services?