Jan 5, 2015

Data politics and the dark sides of data

A bit lengthy post about the dark sides of data discusses whether data and its vast amounts and ubiquitous collection mechanisms help to "tell the truth to power", i.e., to change the world for the better. Will picking up the traces and revealing wrongdoing fix the world? Most likely not, because it's not clear whether we will care or do anything because of data. Here is a great quote:

... lawyers cannot fix human rights abuses, scientists cannot fix global warming, whistle-blowers cannot fix secret services and activists cannot fix politics, and nobody really knows how global finances work - regardless of the data they have at hand...

The framework of countering power and problems with data need to be revised. It's not about quantity or even quality of data, but about using whatever little we know to address not only our understandings (i.e., our rational capacities), but also our feelings and beliefs. Here is what gets in the way (those dark sides that everyone should think about):

  • corporate infrastructure for data and its cultures that creates an illusion of free and neutral services (i.e., services that have no monetary and no political cost)
  • non-transparency of most digital data and lack of control over it, which prevents us from copying, deleting, or processing our own data
  • de-politicizing of digital data or constructing data as fuel for innovation and services rather than a ground for moral, ethical, and political decisions

The post is rather pessimistic, but changes do not happen at once, so we should probably keep trying.

Nov 11, 2014

4th RDA Plenary - Breakout session on engagement

Below is a summary from a breakout session on engagement that I co-chaired with Andrew Maffei at the Research Data Alliance 4th plenary in Amsterdam, the Netherlands (Monday, September 22, 2014).

Introduction / Overview

The session had about 25 people in attendance.

I provided an overview of the group and its activities. The group receives strong interest and support at plenaries, but in between the interest drops.
Activities to date include working on the model to connect technically oriented groups and domain interest groups (Domain Interest Group Form and Function model, or DIG-FF), a summer internship project, and participation in the RDA/US advisory committee.

DIG-FF Model: we need to observe inter-group interactions and support form and function of these groups as we can. It may be too early to propose a model. Rather, we can focus on small but practical things can facilitate inter-group communication (e.g., creating information-collection instruments, disseminating information, etc.).

Objectives for P4:
  • Present the summer internship project
  • Modify the case statement (create a charter)
  • Attend breakout meetings of the domain-specific groups and collect information about their work and outcomes
  • Find opportunities to work on the amplification and adoption theme promoted by the RDA/US within the group and through collaborations
RDA/US summer internship project
The project was done by the RDA/US intern Rene Patnode from the University of California San Diego under the mentorship of EIG chairs. Rene interviewed 16 chairs of the domain interest groups (DIGs) over the phone and email. The goal of the project was to understand the barriers for researchers to data sharing.
Observations and findings:
  • There is a significant representation of information systems professionals rather than researchers in RDA
  • Responses were consistent with the literature about barriers: sharing is extra work, user interfaces are poor, no fit with current research culture, no funding for data, lack of good data sources
  • To remove those barriers we might try to make data sharing enjoyable and social (e.g., more interaction between researchers, etc.).
  • Gamification (e.g., adding points, badges, etc.) is one possible approach. Citizen science is another mechanism for data collection and sharing.
  • IT solutions need to better mirror the workflow that is currently in use
  • Suggestions for RDA role: make processes of RDA engagement clear and transparent, support cross-pollination, take a political stance by lobbying, encourage better technical development
Many interesting points and questions were raised during the discussion. Below are some of them:
  • Collaborative virtual research environments are one way to improve inter-communication and incentivizing.
  • Does funders requirement for data management plans and its implementation actually improve the outcomes of data stewardship and sharing?
  • Data needs to be useful for someone else to create “an appetite” for removing burdens
  • Cultural change usually means that you have to address ALL the stakeholders. Hence the idea for RDA to take a more political role.
  • Grant budgets need to support data management plans, which need resources.
  • Knowledge Exchange (http://www.knowledge-exchange.info/) is an organization that has interests similar to this group.
  • Fun in sharing is good, but what are the other reasons for sharing? We might want to ask the question “What would you like?” and work on that. Dig into the benefits and show scientists in various areas how sharing data can be of benefit.
  • We have talked a lot about domains. Another orthogonal axis is to look at organizations. Can you get universities, institutions, and membership organizations declare values around data sharing?
  • Cultural differences in data sharing are often ignored. For example, there are different approaches to privacy and consent.
Next steps for the group

  • Develop a form to collect stories about benefits and pains of data management / sharing 
  • Start collecting stories Identify and reach out to champions of data sharing 
  • Design an ISHARE t-shirt 
  • Long term: build practical tools for engagement, pay attention to our own data practices, share the data from RDA, advocate for better RDA website, think about focusing on organizations instead of (or in addition to) domains, collaborate with the “Digital Practices in History and Ethnography” group on studying RDA as an organization 

Engagement in RDA is very important, we need to keep going!

More about our group here: RDA Engagement Interest Group

Sep 23, 2014

Summer school on synthetic biology

During the week of September 15-19, 2014 I participated in the summer school on societal implications of synthetic biology. Organized by Kristin Hagen and Margret Engelhard from the European Academy of Technology and Innovation Assessment and by Georg Toepfer from the Center for Literary and Cultural Research Berlin, it was held in Berlin, Germany, at the Center for Literary and Cultural Research.

Participants came from different countries - Austria, Italy, Germany, the Netherlands, Canada and the United States. Similarly, their backgrounds were quite diverse - biology, chemistry, philosophy, sociology, political science, and communications. The main goal of the school was to have an interdisciplinary discussion about synthetic biology as an emerging area of science and its implications for society. Participants wrote papers and presented them at the school. Additionally, several experts from various fields gave their talks. Below is a short summary of what we talked about:

The meanings and metaphors of life. Synthetic biology inevitably raises questions related to our understandings of life. On one hand, there is no universal definition of life and both philosophers and scientists continue to ponder over whether it is even possible to come up with such a definition. On the other hand, there may be no need for such definition, because a) we have an intuitive understanding of what life is and adapt as it changes, and b) having limited definitions works for specific purposes, such as understanding of how to create an artificial cell or argue against the scientific possibility of creating life from scratch. Metaphors that we use to answer the grand questions of life or to promote scientific advancements in synthetic biology bring together the domains of nature, artificiality, control, and aesthetics. Those metaphors are not “innocent” as they open some opportunities and close others.

Synthetic biology (SB) as a field. Synthetic biology is not a homogeneous discipline, it is a fuse of approaches that draw on synthetic chemistry, genetic engineering, and bioinformatics. The engineering of metabolic pathways, which allows to use bacteria and other microorganisms to produce chemicals, plays an important role in SB breakthroughs. Chemical synthesis of DNA, which allows a synthesized DNA to be inserted into an existing organism, is another important area of synthetic biology. The presentations that explained various types and flavors of synthetic biology talked about cells, pathways, chassis, microbes, reproduction, and evolution; they were colorful and full of exciting possibilities. We talked about promises of synthetic biology a lot, but I don’t think that science necessarily needs promises to justify its existence. As someone pointed out, science is a quest for knowledge, it should be interesting and exciting as such. I’m not sure science is a pure quest for knowledge, considering the convergences between science, technology, and industry. Nevertheless, I completely agree that it is exciting to learn about the world even if it's not clear whether this knowledge has applications.

Forms of communication and public dialog. Previous debates, such as the mad cow disease or GMO debate, and the resulting negative reactions demonstrate the importance of transparency in public communication of science. Early public engagement is seen as a way to improve understanding and acceptance of technology. On the other hand, the goal is not simply to promote public understanding and acceptance of technoscience, but rather to let voices of the public contribute to decision-making and regulatory frameworks. Many forms of public engagement, including polls, surveys, citizen panels, public discussions, and so on, have been promoted in the EU, and the results seem to indicate that even though not many people have heard about synthetic biology, many see continuities with previous scientific advancements and technologies and are willing to consider both positive and negative aspects of it.

Even from the short overview above it is obvious that there is a great diversity in the issues surrounding synthetic biology and approaches to their evaluation. Can they be integrated or synthesized? My own suggestion is to take a problem- rather than a debate-oriented approach and look for solutions to specific problems, while avoiding taking things for granted. Everyone has their interests and values and even the best intentions may result in bad outcomes. To use M. Foucault’s approach, we need to examine the order of things and the complex arrangements of what’s visible and hidden and what or who is included and excluded.

It was a week of stimulating discussions. The atmosphere was very friendly and collegial, and the disagreements were often phrased as humorous, slightly sarcastic remarks over dinner or drinks. My take-away from this summer school is that interdisciplinary dialog is possible, necessary, and fruitful. It works provided that we have ample time to interact and go beyond formalities (i.e., beyond formal presentations and opinion polls). The school has ended, but the work continues. We will revise our papers based on collective feedback, and they will become chapters in a forthcoming book.

See also:

May 9, 2014

Big data report from the White House

Another big data review, this time from the White House - "Big Data: Seizing Opportunities, Preserving Values" (pdf). The report explains what big data is (large, diverse, complex, longitudinal, distributed, making possible unexpected discoveries and creating an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it) and describes implications of big data for public and private sectors. In addition to many known and less known examples of how big data can be good or bad, the report provides initial thoughts on recommendations for big data governance. It divided its approach to policy framework into four overlapping core areas:

1. Big data and citizens - improve public services while preventing the government from accruing unlimited power by using increased surveillance, algorithmic profiling, and metadata tracking.

2. Big data and consumers - reduce cost of commercial services and personalize them while mitigating security breaches and risks of discrimination based on consumer profiles and lack of consumer awareness and data transparency.

3. Big data and discrimination - do less harm and prevent discriminatory uses of identification and re-identification techniques.

4. Big data and privacy - get used to less privacy while reconsidering the notice and consent framework.

In the concluding section the report had the following recommendations:

  • Advance the Consumer Privacy Bill of Rights.
  • Pass National Data Breach Legislation.
  • Extend Privacy Protections to non-U.S. Persons.
  • Ensure Data Collected on Students in School is Used for Educational Purposes.
  • Expand Technical Expertise to Stop Discrimination.
  • Amend the Electronic Communications Privacy Act.

It's a thorough report and is definitely worth a read, but similarly to my and my colleagues big data review (pre-print), it's just the beginning of studying implications and governance of big data.

May 2, 2014

Summary of drivers and barriers in data sharing

Nice summary of the drivers, barriers, and enablers that determine stakeholder engagement based on expert interviews in Dallmeier-Tiessen et al., 2014, Enabling Sharing and Reuse of Scientific Data (restricted access).

Drivers and benefits

  • Societal benefits - economic/commercial benefits; continued education; inspiring the young; allowing the exploitation of the cognitive surplus in society; better quality decision making in government and commerce; citizens being able to hold governments to accountable.
  • Academic benefits - the integrity of science; increased public understanding of science.
  • Research benefits - validation of scientific results by other scientists; recognition of their contribution; reuse of data in meta-studies to find hidden effects/trends; testing new theories against past data; doing new science not considered when data was collected without repeating the experiment; easing discovery of data by searching/mining across large datasets with benefits of scale; easing discovery and understanding of data across disciplines to promote interdisciplinary studies; combining with other data (new or archived) in the light of new ideas.
  • Organizational benefits - publication of high quality data and citation of data enhance organizational profile; preserved data linked to published articles adds value to the product; data preservation is more business; reputation of institution as “data holder with expert support” is increased; combining data from multiple sources helps to make policy decisions; reuse of data instead of new data collection reduces time and cost to new research results; use of data for teaching purposes.
  • Individual contributor benefits - preserving data for the contributor to access later — sharing with your future self; peer visibility and increased respect achieved through publications and citation; increased research funding; when more established in their careers through increased control of organizational resources; the socio-economic impact of their research (e.g., spin-out companies, patent licenses, inspiring legislation); status, promotion and pay increase with career advancement; status conferring awards and honors.

Barriers and Enablers are Related to:

  • Individual contributor incentives
  • Availability of a sustainable preservation infrastructure
  • Trustworthiness of the data, data usability, pre-archive activities
  • Data discovery
  • Academic defensiveness
  • Finance
  • Subject anonymity and personal data confidentiality
  • Legislation/regulation