Thoughts based upon the readings about infrastructure, especially “Understanding infrastructures: Dynamics, Tensions, and Designs”, a great report by P. Edwards, S. Jackson, G. Bowker, and C. Knobel.
Development of (cyber)infrastructures is not a merely technical/engineering issue. To ensure success we need to be aware of historical context and socio-political issues as well as the messiness of everyday practices.
Historical (dis)continuities underlie many infrastructural projects. Cyberinfrastructures and data science / curation problems did not appear out of nowhere in the 20th century. They have historical precursors, such as:
- information gathering activities by the state (statistics as science of state) and the development of sciences as accumulation of records
- the development of technologies and organizational practices to sort, sift and store information
Questions of ownership, management, control, and access are always present in infrastructural developments. With regard to data, years of private ownership in data has led to many idiosyncratic practices and formats, which, along with an absence of the metadata, prevent understanding and use by other scientists.
A good quote: “The consequence is that much “shared” data remains useless to others; the effort required for one group to understand another’s output, apply quality controls, and reformat it to fit a different purpose often exceeds that of generating a similar data set from scratch.” (p. 19 of the report)
Cyberinfrastructure development means system building. Successful system-builder teams are made up of technical “wizards”, who envision and create the system, a “maestro,” who orchestrates the organizational, financial, and marketing aspects of the system, and a “champion” who stimulates interest in the project, promotes it and generates adoption. During infrastructural growth, users and user communities can also become critical to success or failure.
Design-level perspective differs from the perspective on the ground. The former can be neat and organized, while the latter can be disorderly and requiring a lot of work. Finding ways to translate between these two perspectives and to incorporate lessons learned from “below” into design from “above” is a challenge and a crucial element of success.
A great quote: “It is also possible that a tech-centered approach to the challenge of data sharing inclines us toward failure from the beginning, because it leaves untouched underlying questions of incentives, organization, and culture that have in fact always structured the nature and viability of distributed scientific work.” (p. 32 of the report)
Additional reading – my other post about general issues in data curation.