Jul 30, 2012

Digital science ecosystem

From the GRDI2020 Final roadmap report: Global scientific data infrastructures: The big data challenges (pdf):

Data- any digitally encoded information, including data from instruments and simulations; results from previous research; material produced by publishing, broadcasting and entertainment; digitized representations of diverse collections of objects, e.g. of museums’ curated objects.

Research Data Infrastructures - managed networked environments (services and tools) that support the whole research cycle and the movement of data and information across domains and agencies.

An ecosystem metaphor is used to conceptualize science universe and its processes. A digital science ecosystem is composed of:

  • Digital Data Libraries that are designed to ensure the long-term stewardship and provision of quality-assessed data and data services.
  • Digital Data Archives that consist of older data that is still important and necessary for future reference, as well as data that must be retained for regulatory compliance.
  • Digital Research Libraries as a collection of electronic documents.
  • Communities of Research as communities organized around disciplines, methodologies, model systems, project types, research topics, technologies, theories, etc.

While I can see how the metaphor of ecosystem can be beneficial in conceptualizing science universe, I don’t think it was developed enough here. The whole report is structured around tools and infrastructure as it is understood rather narrowly. It seems that the biggest roadblocks are in the domains of human interactions: all those issues of social hierarchies and capital built into our social institutions.

Paul Edwards (one of the authors of another reading that seemed more sophisticated to me) somewhat wrote about it in his book “A vast machine” about infrastructure surrounding weather forecasting and climate change. He talks about how many-many efforts of various social actors facilitated the creation and inversion of infrastructure by constantly questioning data, models, and prognoses. Here is a large quote from the conclusion chapter of that book to demonstrate the emphasis on people and the making of data-knowledge-infrastructure (in bold, which is mine):

“Beyond the obvious partisan motives for stoking controversy, beyond disinformation and the (very real) “war on science,” these debates regenerate for a more fundamental reason. In climate science you are stuck with the data you already have: numbers collected decades or even centuries ago. The men and women who gathered those numbers are gone forever. Their memories are dust. Yet you want to learn new things from what they left behind, and you want the maximum possible precision. You face not only data friction (the struggle to assemble records scattered across the world) but also metadata friction (the labor of recovering data’s context of creation, restoring the memory of how those numbers were made). The climate knowledge infrastructure never disappears from view, because it functions by infrastructural inversion : continual self-interrogation, examining and reexamining its own past. The black box of climate history is never closed. Scientists are always opening it up again, rummaging around in there to find out more about how old numbers were made. New metadata beget new data models; those data models, in turn, generate new pictures of the past.” (P. N. Edwards, “A vast machine”, p. 432)

Why should we trust climate change and its infrastructures? Because of a “vast machine” that is built by a large community of researchers who constantly try to invert it. So in order to understand, develop and advance data-intensive environments, we shouldn’t consider social forces as external. They are part, if not the foundation, of the data universe. So I’d propose to equally emphasize tools (storage-, transfer- and sharing tools) and social arrangements (individuals, institutions, political contexts, events, and so on) as elements of ecosystem.