DIKW: Data, Information, Knowledge, Wisdom: October 2013

Oct 24, 2013

Faculty engagement for librarians and curators

The Council on Library and Information Resources (CLIR) postdoctoral fellowships in data curation encourage connections between library, technology and research. CLIR fellows are hosted by a variety of institutions and have widely varying skills and responsibilities. Every month they (we) get together to discuss current issues and challenges related to digital curation. The most recent session focused on faculty engagement and featured two guests: Gabrielle Dean, the curator of literary rare books and manuscripts at Johns Hopkins University and Kelly Miller, the director of teaching and learning services and head of the college library at UCLA. Two current CLIR fellows, John Kratz and Bridget Whearty, led the session.

The notes below are my attempt to synthesize many useful pieces of advice and information shared during that session.

Engagement in the context of librarians/curators working with faculty is a relatively new term. Previously, the word "outreach" was used more often. "Outreach" has a sense of a method or certain approach, e.g., providing guidelines or distributing best practices. "Engagement" has a sense of participation, shared goals and activities. As any other type of engagement, faculty engagement is rather difficult. So here are some tips:

Set small goals and gradually extend your network, because engagement is an incremental activity that involves trust, relationship building and a lot of trials and errors.
Be positive (or even nice and cheerful if you can), your positive attitude toward your and other’s work will pass to others and invite them to be more open and interactive.
Be modest. Not that many people may be interested in the library and its services, automatically assuming that it’s valuable to others may backfire.
Be curious. Engagement is an opportunity to learn about many interesting things and people. Don’t be afraid to ask questions or even be naive sometimes, it will pay off in more knowledge and more connections.
Always say "yes" (within reasonable limits). This may be important at the beginning of engagement initiatives. Willingness to do the work communicates good will and stimulates interest. Get involved in possibly tangential projects, attend informal gatherings, create connections and then follow up and maintain connections via phone calls, emails, newsletters, etc.
Learn "the language". Engaging other audiences sometimes means getting into conversations without adequate background knowledge or expertise. Learning about faculty research in advance can help with terminology and ability to ask intelligent questions.
Talk, don't just listen. Listening is useful, but it’s also important to talk. Conversing, (i.e., listening, asking questions, and encouraging others to ask questions) helps to find shared points or issues to address.
Look for creative opportunities for engagement, e.g., shared learning, teaching, activist and interest groups, and so on. Engagement doesn't have to be limited to faculty. Engaging other interested groups, including undergraduate students, local schools, or private organizations can be useful and quite rewarding.
Seek effective ways of gathering information. Some faculty may not be responsive to emails or phone calls, graduate students may be more responsive to certain requests, surveys are not effective due to low response rate. Sometimes it depends on the institution and the nature of curated content. Think through priorities, audiences, and context in order to get best results.
Mistakes happen. False starts and even failures in faculty engagement are common for everyone. Rather than dwelling on mistakes, try to learn from them and do better next time.
Avoid political entanglements and personal battles. These things happen in many if not all organizations. To maintain a good working environment, try to stay positive, focus on the goals, look for opportunities to be creative and don’t take matters personally.

I wish there was more literature (both formal and informal) on this topic that could answer questions and help in practice. For example, what should someone know before starting a job that involves faculty engagement? Are there certain skill that might be helpful? What is the nature of the relationship between curators and faculty - is it a peer-to-peer or a nurse-doctor relationship? Can/should it be changed? What are the best ways to gather information about your targeted audiences and their needs? So on and so forth.

To stimulate a discussion on this topic and, perhaps, encourage more writing, an engagement interest group has been created within the Research Data Alliance (RDA). The group has a narrower focus, because it emphasizes the engagement of researchers and other stakeholders in research data sharing and re-use. Nevertheless, it may be a good platform for continuing a conversation on engagement and building a knowledge base/wiki.

Oct 10, 2013

Research Data Alliance (RDA) plenary thoughts

My notes from the Research Data Alliance plenary (RDA) are posted on the Digital Library Federation site - http://www.diglib.org/archives/5142/

Oct 2, 2013

About research objects

Notes from the article by Bechhofer, Buchan, De Roure, Missier, Ainsworth et al. "Why linked data is not enough", Future Generation Computer Systems, 2011, (pdf).

Scientific research is increasingly digital and collaborative, therefore a new framework is needed that would facilitate the reuse and exchange of digital knowledge. Simply publishing data fails to reflect the research methodology and respect the rights and reputation of the researcher.

The concept of Research Objects (ROs) as semantically rich aggregations of resources can serve as a cornerstone of such new framework. ROs would include research questions, hypotheses, abstracts, organisational context (e.g., ethical and governance approvals, investigators, etc.), study design, methods (workflows, scripts, services, software packages. etc.), data, results, answers (e.g., publications, slides, DOIs), etc. The authors argue that this approach is better than linked data, but later they acknowledge that linked data works fine, it just needs to be revised and extended.

Important assumptions in the paper:

ROs work well in the context of e-Laboratories - environments that are mostly based on automated management systems and execution of in silico experiments
Reproducible research is ultimately possible in any domain and always desirable.
All elements of scientific research can be made explicit and encoded in a machine-readable way, if not now, then in the future.

Terms that refer to different ways of reusability:

Reusable - reuse as a whole or single entity.
Repurposeable - reuse as parts, e.g., taking an RO and substituting alternative services or data for those used in the study.
Repeatable - repeat the study, perhaps years later.
Reproducible - reproduce or replicate a result (start with the same inputs and methods and see if a prior result can be confirmed).
Replayable - automated studies can be replayed rather than executed again.
Referenceable - citataions for ROs.
Revealable - audit the steps performed in the research in order to be convinced of the validity of results.
Respectful - credit and attribution.

The authors describe several environments that try to implement aggregation of resources into ROs approach.

myExperiment Virtual Research Environment relies on the notion of "packs", collections of items that can be shared as a single entity.
Systems Biology of Microorganisms (SysMO) project has a web plaform SysMO-DB and a catalog SysmoSEEK. It relies on a JERM (Just Enough Results Model), which is based on the ISA (Investigation/Study/Assay) format. Another approach to support ROs within the systems biology community is SBRML (Systems Biology Results Markup Language). Most of the experiments in this domain are wet lab experiments, so traceability and referenceability are more relevant than repeatability and replayability.
MethodBox is part of the Obesity e-Lab, that allows researchers to "shop for variables" from studies related to obesity in the UK. The paper doesn't describe what method is used to support RO aggregations.

Packs in myExperiment is the most advanced implementation of the idea of ROs and, ironically, it's based on linked data: "Work in myExperiment makes use of the OAI-ORE vocabulary and model in order to deliver ROs in a Linked Data friendly way" (p. 10).

OAI-ORE defines standards for the description and exchange of aggregations of Web resources. It is agnostic to relationship types, so it needs to be extended. The authors propose the following extensions: the Research Objects Upper Model (ROUM) and the Research Object Domain Schemas (RODS). ROUM provides basic vocabulary to describe general properties of RO, such as the basic lifecycle states. RODS provide domain specific vocabulary. Not much details are provided about these two extensions.

Rather than arguing that linked data is not enough, it seems that the paper argues that current implementations of linked data in packaging scientific results needs to be revised to explicitly include the structure of aggregations. The purpose of articulating structure in a machine-readable way is to create an environment where every component of research (including hypotheses, methods, data and results) can be re-enacted. A more obvious and important conclusion from the discussion about ROs is that a) we need to keep encouraging exchange and sharing of research in ways that are more transparent; b) there is still a shortage of platforms to do that. MyExperiment is a nice example, but it's still domain and platform-specific.

The approach described in this paper is quite forward-looking. It is a call for rather radical changes in scientific practices. I wonder how many labs have automated experiment management environments where all datasets, workflows, scripts and results can be connected and reconstructed without much back-channeling. Another question is how much effort it takes to create ROs in a way that would make science fully "re-enactable". We probably won't be able to do that with legacy data.

Pages