Feb 20, 2015

Repository features to motivate more data sharing

One of the challenges of creating data stewardship infrastructure is engaging the users and meeting and prioritizing their needs, particularly the needs of long-tail science research. "What would motivate researchers to make their data available?" is a question we continuously grapple with. A recent study "Potential contributor perspectives on desirable characteristics of an online data environment for spatially referenced data" published in First Monday asked a very similar question in the context of geographic data. The researchers hypothesized that potential data contributors of small scale, local spatial data would be more willing to share their data if a repository included a simple, clear licensing mechanism, a simple process for attaching descriptions to the data, and a simple post-publication peer evaluation/commenting mechanism.

The paper draws on 10 qualitative interviews and 110 responses to an online questionnaire. The qualitative interview responses were mixed; they don't seem to reveal any patterns or unusual concerns. Some of the quantitative results were also mixed, but some provide good numbers to support the hypotheses:

  • 90% of respondents said attribution (licensing) is important
    • 62% think that non-commercial attribution is important
    • 54% think that restricting re-use is important, i.e., others may use the data but not modify it in any way
  • 93% said ability to attach keywords or other descriptions to data is important
  • 78% said that commenting capability is important
  • 85% said that stability and long-term maintenance of the repositories matters


This research, subject to the caveats listed below, suggests that it would be desirable from the perspective of potential contributors of data to provide infrastructure capability that would:

  • allow users to attach conditions to the use of their data;
  • provide basic information that could be translated into standards based metadata; and,
  • receive comments and feedback from users.

Feb 18, 2015

Research Data Alliance/US Call for Fellows

I'm a co-PI on a project that provides a great opportunity to the early career researchers and professionals to engage with the Research Data Alliance and help to improve data practices and make data management and data sharing easier and more transparent. Below are the details from the call for fellows:
The Research Data Alliance (RDA) invites applications for its newly redesigned fellowship program. The program’s goal is to engage early career researchers in the US in Research Data Alliance (RDA), a dynamic and young global organization that seeks to eliminate the technical and social barriers to research data sharing.

The successful Fellow will engage in the RDA through a 12-18 month project under the guidance of a mentor from the RDA community. The project is carried out within the context of an RDA Working Group (WG), Interest Group (IG), or Coordination Group (i.e., Technical Advisory Board), and is expected to have mutual benefit to both Fellow and the group’s goals.

Fellows receive a stipend and travel support and must be currently employed or appointed at a US institution.

Fellows have a chance to work on real-world challenges of high importance to RDA, for instance:
  • Engage with social sciences experts to study the human and organizational barriers to technology sharing
  • Apply a WG product to a need in the Fellow’s discipline
  • Develop plan and disseminate RDA research data sharing practices
  • Develop and test adoption strategies
  • Study and recommend strategies to facilitate adoption of outputs from WGs into the broader RDA membership and other organizations
  • Engage with potential adopting organizations and study their practices and needs
  • Develop outreach materials to disseminate information about RDA and its products
  • Adapt and transfer outputs from WGs into the broader RDA membership and other organizations
The program involves one or two summer internships and travel to RDA plenaries during the duration of the fellowship (international and domestic travel). Fellows will receive a $5000 stipend for each summer of the fellowship. Fellows will be paired with a mentor from the RDA community.

Through the RDA Data Share program, fellows will participate in a cohort building orientation workshop offering training in RDA and data sciences. This workshop is held at the beginning of the fellowship. RDA Data Share program coordinators will work with Fellows and mentors to clarify roles and responsibilities at the start of the fellowship.

Criteria for selection: The Fellows engaging in the RDA Data Share program are sought from a variety of backgrounds: communications, social, natural and physical sciences, business, informatics, and computer science. The RDA Data Share program will look for a T-shaped skill set, where early signs of cross discipline competency are combined with evidence of teamwork and communication skills, and a deep competency in one discipline.

Additional criteria include: interest in and commitment to data sharing and open access; demonstrated ability to work in teams and within a limited time framework; and benefit to the applicant’s career trajectory.

Eligibility: Graduate students and postdoctoral researchers at institutions of higher education in the United States, and early career researchers at U.S.-based research institutions who graduated with a relevant master’s or PhD and are no more than three years beyond receipt of their degree. Applications from traditionally underserved populations are strongly encouraged to apply.

To apply: Interested candidates are invited to submit their resume/curriculum vitae and a 300-500 word statement that briefly describes their education, interests in data issues, and career goals to datashare-inquiry-l@list.indiana.edu. Candidates are encouraged to browse the RDA website https://rd-alliance.org/ and pages of interest and working groups to identify relevant topics and mutual interests.

Important dates:
April 16, 2015 – Fellowship applications are due
May 1, 2015 – Award notifications
June 18-19, 2015 – Fellowship begins with the orientation workshop in Bloomington, IN

RDA Data Share, funded by the Alfred P. Sloan Foundation under award G-2014-13746, engages students and early career researchers in the Research Data Alliance. This engagement builds on foundational infrastructure funded by the National Science Foundation grant # ACI-1349002.

Feb 11, 2015

Institutional analysis of data practices

A short summary of a paper published in JASIST recently: Mayernik, M. S. (2015), Research data and metadata curation as institutional issues. J Assn Inf Sci Tec. doi:10.1002/asi.23425.

The paper begins by noticing a mismatch between the findings of two studies on the data practices in climate science. One of them (a report commissioned by the UK Research Information Network RIN) described the level of data sharing in climate science as low and the other (the book by Edwards "A vast machine...") argued that data sharing was a strong and common norm in climate science. Which one is true? Or, could it be that both studies are correct and climate science includes both the high and the low data sharing levels?

Data practices are institutionalized within a number of social systems, including formal organizations (such as universities and research centers), rules and sanctions (such as funding agency requirements and professional guidelines), and the norms of modern Western science, so the case study analysis in this paper is grounded in the institutional framework that has five characteristics: (a) norms and symbols, (b) intermediaries, (c) routines, (d) standards, and (e) material objects. Norms are largely associated with the norms of science (Merton and later work), symbols are logos and other visible signs of collective identity, but also terminological choices and metaphors. Intermediaries are individuals or collectives who connect resources and facilitate relationships. Routines are frequently repeated patterns of action and interaction, for example, meal or socializing routines. Standards are rules and specifications that define how information can be presented, organized, and transferred. Material objects are ... material objects.

The case studies are comparisons between data practices at the Center for Embedded Networked Sensing (CENS) and the Long Term Ecological Research (LTER) network and between the University Corporation for Atmospheric Research (UCAR)and the National Center for Atmospheric Research (NCAR).

Although there are some interesting observations in these case studies, it seemed that the first, conceptual part of the paper was stronger than the second. The five characteristics of the institutional framework were applied rather narrowly, without revealing many interconnections and directionality. For example, the standards section focuses on metadata standards and their choice. Are there any other standards relevant to data practices? How does the choice of standards affect norms and what is the role of intermediaries in establishing routines and other aspects of data practices? Another much more important question is: Once we describe the variability of data practices within and across disciplines, what's next? What exactly is the role of each institutional carrier in data practices?

Jan 5, 2015

Data politics and the dark sides of data

A bit lengthy post about the dark sides of data discusses whether data and its vast amounts and ubiquitous collection mechanisms help to "tell the truth to power", i.e., to change the world for the better. Will picking up the traces and revealing wrongdoing fix the world? Most likely not, because it's not clear whether we will care or do anything because of data. Here is a great quote:

... lawyers cannot fix human rights abuses, scientists cannot fix global warming, whistle-blowers cannot fix secret services and activists cannot fix politics, and nobody really knows how global finances work - regardless of the data they have at hand...

The framework of countering power and problems with data need to be revised. It's not about quantity or even quality of data, but about using whatever little we know to address not only our understandings (i.e., our rational capacities), but also our feelings and beliefs. Here is what gets in the way (those dark sides that everyone should think about):

  • corporate infrastructure for data and its cultures that creates an illusion of free and neutral services (i.e., services that have no monetary and no political cost)
  • non-transparency of most digital data and lack of control over it, which prevents us from copying, deleting, or processing our own data
  • de-politicizing of digital data or constructing data as fuel for innovation and services rather than a ground for moral, ethical, and political decisions

The post is rather pessimistic, but changes do not happen at once, so we should probably keep trying.

Nov 11, 2014

4th RDA Plenary - Breakout session on engagement

Below is a summary from a breakout session on engagement that I co-chaired with Andrew Maffei at the Research Data Alliance 4th plenary in Amsterdam, the Netherlands (Monday, September 22, 2014).

Introduction / Overview

The session had about 25 people in attendance.

I provided an overview of the group and its activities. The group receives strong interest and support at plenaries, but in between the interest drops.
Activities to date include working on the model to connect technically oriented groups and domain interest groups (Domain Interest Group Form and Function model, or DIG-FF), a summer internship project, and participation in the RDA/US advisory committee.

DIG-FF Model: we need to observe inter-group interactions and support form and function of these groups as we can. It may be too early to propose a model. Rather, we can focus on small but practical things can facilitate inter-group communication (e.g., creating information-collection instruments, disseminating information, etc.).

Objectives for P4:
  • Present the summer internship project
  • Modify the case statement (create a charter)
  • Attend breakout meetings of the domain-specific groups and collect information about their work and outcomes
  • Find opportunities to work on the amplification and adoption theme promoted by the RDA/US within the group and through collaborations
RDA/US summer internship project
The project was done by the RDA/US intern Rene Patnode from the University of California San Diego under the mentorship of EIG chairs. Rene interviewed 16 chairs of the domain interest groups (DIGs) over the phone and email. The goal of the project was to understand the barriers for researchers to data sharing.
Observations and findings:
  • There is a significant representation of information systems professionals rather than researchers in RDA
  • Responses were consistent with the literature about barriers: sharing is extra work, user interfaces are poor, no fit with current research culture, no funding for data, lack of good data sources
  • To remove those barriers we might try to make data sharing enjoyable and social (e.g., more interaction between researchers, etc.).
  • Gamification (e.g., adding points, badges, etc.) is one possible approach. Citizen science is another mechanism for data collection and sharing.
  • IT solutions need to better mirror the workflow that is currently in use
  • Suggestions for RDA role: make processes of RDA engagement clear and transparent, support cross-pollination, take a political stance by lobbying, encourage better technical development
Many interesting points and questions were raised during the discussion. Below are some of them:
  • Collaborative virtual research environments are one way to improve inter-communication and incentivizing.
  • Does funders requirement for data management plans and its implementation actually improve the outcomes of data stewardship and sharing?
  • Data needs to be useful for someone else to create “an appetite” for removing burdens
  • Cultural change usually means that you have to address ALL the stakeholders. Hence the idea for RDA to take a more political role.
  • Grant budgets need to support data management plans, which need resources.
  • Knowledge Exchange (http://www.knowledge-exchange.info/) is an organization that has interests similar to this group.
  • Fun in sharing is good, but what are the other reasons for sharing? We might want to ask the question “What would you like?” and work on that. Dig into the benefits and show scientists in various areas how sharing data can be of benefit.
  • We have talked a lot about domains. Another orthogonal axis is to look at organizations. Can you get universities, institutions, and membership organizations declare values around data sharing?
  • Cultural differences in data sharing are often ignored. For example, there are different approaches to privacy and consent.
Next steps for the group

  • Develop a form to collect stories about benefits and pains of data management / sharing 
  • Start collecting stories Identify and reach out to champions of data sharing 
  • Design an ISHARE t-shirt 
  • Long term: build practical tools for engagement, pay attention to our own data practices, share the data from RDA, advocate for better RDA website, think about focusing on organizations instead of (or in addition to) domains, collaborate with the “Digital Practices in History and Ethnography” group on studying RDA as an organization 

Engagement in RDA is very important, we need to keep going!

More about our group here: RDA Engagement Interest Group