Jun 14, 2016

Mapping scientific fields, domains and specialties

I'm embarking on a new project that focuses on mapping research fields and studying the evolution of certain concepts and research communities. I have a certain field in mind that I'd like to investigate, but first I need to learn more about scientometrics and mapping of research domains. This is a first in the series of notes from my readings - a review chapter in the Annual Review of Information Science & Technology (ARIST) titled "Mapping Research Specialties"".

The chapter defines research specialty as a self-organizing network of researchers that tends to study the same research topics, attend the same conferences, publish in the same journals, and also read and cite each others’ research papers.

Other definitions of research specialties:

  • Kuhn (1970) - communities of one hundred members, sometimes less 
  • Price (1986) - an “invisible college” of approximately 100 “core” scientists, monitoring the work of individuals who are rivals and peers by reading about 100 papers for every one published
  • Lievrouw (1990) - a set of informal communication relations among scholars or researchers who share a specific common interest or goal 
  • Small (1980) - consensual structure of concepts in a field, employed through its citation and co-citation network 
  • Rogers, Dearing, and Bregman (1993) - a family tree in which earlier studies influence later studies
The term "specialties" rather than invisible college allows to avoid the assumption that the researchers are in frequent informal communication.

research specialty model
Fig. 6.2 from "Mapping research specialties"
Research specialties are therefore an interconnected group of researchers that has their own knowledge base with its own concepts, paradigms and validation standards, and uses particular channels of formal and informal communication.

Studies of research specialties are connected to the key questions raised by Chubin in his 1976 review of the field "The Conceptualization of Scientific Specialties":
  1. What are the social and intellectual properties of a specialty? 
  2. How do specialties grow, stabilize, and decline? 
  3. What are the temporal and spatial dimensions of a specialty? 
  4. How do specialties vary in size, scope, and life expectancy? 
  5. What are the institutional arrangements that support specialties? 
  6. What impact does funding have on the kind and volume of research produced in a specialty?
  7. What kinds of communication relations sustain research activities in a specialty? 
The following approaches are used in the studies of research specialties:

  1. The sociological approach (seems to be much more developed than others): science as an institution (Merton); science as a system of beliefs (Bloor, Barnes, Collins); science as culture (Latour, Woolgar, Knorr-Cetina); science as collaboration and competition (Whitley, Gibbons); science as boundary making and demarcation (Gieryn)
  2. Bibliographic or bibliometric: relevance (topics, novelty, availability, etc.); citations and co-citations; author co-citations; co-word analysis
  3. Communicative approach: knowledge diffusion through informal channels and discourses and rhetoric in science 
  4. Cognitive approach: paradigm shift (Kuhn) and branching of ideas (Mulkay)

Mapping research specialties helps to find the structure and dynamics of a research specialty and can include:

  1. A map of the network of researchers and research teams involved with the specialty.
  2. A map of the base knowledge supporting research in the specialty.
  3. A map of current research topics in the specialty.
 A map of a specialty is a representation of the structure and interconnection of known elements of the specialty, which includes research topics, teams, concepts, authorities, archival journals, research institutions, and technical vocabularies. Mapping techniques often include bibliometric methods, such as reference co-citation analysis, bibliographic coupling analysis, co-authorship analysis, author co-citation analysis, co-word analysis, paper to paper citation analysis, journal to journal citation analysis, and journal co-citation analysis.

Others goals of mapping include:

  • Mapping the social network of researchers - identify and characterize researchers and teams of researchers and their sponsoring institutions in terms of productivity, impact of research results, weak ties, levels of participation and collaborations. 
  • Mapping the base knowledge in the specialty - concepts, theories, methods, controversies
  • Mapping the topical structure 
  • Mapping the relations - researchers, concepts, and topics 
  • Mapping changes - shifts in base knowledge and topics, new subtopics, productive researchers, changes in funding
Techniques of mapping can include surveys of subject matter experts, bibliometric techniques (see above), web content analysis, and analysis of formal literature (most developed and frequently done).

The conclusion is not very optimistic though:
The problem of mapping specialties is complex and poorly defined. A number of techniques have been developed and applied. Each of these techniques reveals some separate aspect of the specialty. For example, co-authorship analysis uncovers the social structure of collaboration and research teams in the specialty, co-citation analysis uncovers structure of base knowledge in the specialty, and bibliographic coupling analysis reveals research subtopics. In and of themselves, these analytic techniques are inadequate as tools to map the whole research specialty: the social structure of researchers, the base knowledge they use, and the research topics they study. ... the metaphor of the blind men and the elephant is appropriate, as each analytic technique reveals the specialty in some limited aspect.

What is the solution for examining a specialty as a whole? Combine as many existing techniques as possible or develop some new techniques?

Jun 8, 2016

Cyberinfrastructure studies overview

In their introduction to the special issue on sociotechnical studies of cyberinfrastructure (CI) and e-research Ribes and Lee identify current themes and methodologies of CI studies (Computer Supported Cooperative Work (CSCW), 2010, Volume 19, Issue 3, pp 231-244, doi: 10.1007/s10606-010-9120-0)

Cyberinfrastructure (CI) is one of the current terms for the technologies that support scientific activities such as collaboration, data sharing and dissemination of findings. CI features that distinguish it from other CSCW work include: community wide and cross-disciplinary scope, computational orientation, and end-to-end (data-to-knowledge-to-user) integration.

Themes in CI studies:

  1. Relationality. What is supporting the work of another and who is sustaining those relationships?
  2. Integration of heterogeneity. CI involves computer specialists, data and information managers, domain scientists, and so on, but also non-human actors such as sensors and databases.
  3. Sustainability. What makes CI a long-term resource?
  4. Standardization. Ways to achieve integration on the technical and human levels.
  5. Scale. How to plan for change and growth in the number of collaborators, the quantity of data, and the geographical reach.
  6. The distribution between human work and technological delegation. 

Methods include historical, ethnographic, documentary, and interview-based approaches that focus on the following:

  • Investigations of ongoing planning, development and deployment efforts 
  • Activities of maintenance, upgrade and breakdown
  • Adoption of certain expressions of scientific activity and changes in their use
  • Adoption of new technological artifacts

Units of analysis can be a project or CI as a whole (focus on national policies and funding incentives). The introduction concludes by calling for more studies:

The stories of cyberinfrastructure are revealed by looking across multiple levels of granularity, various facets of social life, and diverse technological actors. Much remains to be studied in the areas of supporting domain specific practice, data sharing and curating, and infrastructural organizings. This is an exciting time for CI studies. Research is occurring in new and unexpected places, drawing on and bringing together the traditions of CSCW, information science, organizational studies, and science and technology studies. This cross-pollination, as exemplified by the papers in this issue, seems to be not only fruitful, but also very necessary.

Jun 6, 2016

The Net Data directory

The Berkman Center for Internet & Society announced the launch of the Net Data Directory - a free, publicly available database of data about the Internet that covers topics such as cyber-security, civil and human rights, social media and many more. The directory currently contains about 150 data source records and includes many types of sources, including website rankings, opinion surveys, maps of activities and so on.

The press release says that records are maintained by researchers at the Berkman Center, which means that keeping the directory current, relevant and error-free will be a challenge. As the number of sources grows, it will also be harder to navigate the directory through search and browse, without more sophisticated tools of filtering, recommendations, and visualizations.

Apr 18, 2016

Dataset on Parkinson's disease

In March 2016 Sage Bionetworks released a dataset that captures the everyday experiences of over 9,500 people with Parkinson's disease (press release). The data described in the data paper "The mPower study, Parkinson disease mobile data collected using ResearchKit" was collected via the mPower iPhone app, where participants were presented with tasks (referred to as ‘memory’, ‘tapping’, ‘voice’, and ‘walking’ activities) and asked to fill out surveys.

Not everybody agreed to share their data broadly with the research community. Out of 14,684 verified participants 9,520 (65%) agreed to share broadly, the rest split between withdrawing from the study and agreeing to share narrowly with the team only:

Study cohort description
Figure 1: mPower study cohort description. From http://www.nature.com/articles/sdata201611#methods

To provide proper safeguards and to balance sharing and privacy, the research team established a data governance structure. Access is granted to qualified researchers who agree to specific conditions for use, including the following:

  • participants cannot be re-identified
  • the data may not be redistributed
  • findings need to be published in open access venues
  • both participants and research team need to be acknowledged as data contributors
This effort is another example of the newly forming data sharing culture. And it uses Synapse that seems to make sharing easier from both technical and policy perspectives.

Apr 10, 2016

Big data analytics overview

The paper Beyond the hype: Big data concepts, methods, and analytics (2015, International Journal of Information Management, Vol. 35, N 2, pp. 137–144) reviews definitions and analytics techniques of big data and discusses some future developments. The article begins with a chart showing an explosion of publications in the Proquest database, which is quite similar to the chart in our JASIST publication "Big data, bigger dilemmas". Both charts show that 2013 was the year when the term "big data" gained popularity:
"Beyond the hype ..."
"Big data, bigger dilemmas..."
The paper cites Diebold's paper "A personal perspective on the origin(s) and development of “big data”: The phenomenon, the term, and the discipline" to describe the origin of the term "big data":
"... the term “big data … probably originated in lunch-table conversations at Silicon Graphics Inc. (SGI) in the mid-1990s, in which John Mashey figured prominently".
After summarizing aspects of big data that were discussed many times elsewhere (volume, velocity, variety, veracity, etc.), the article provides a useful summary of the types of analytics that are common in big data research:
  1. Text analytics
    • Information extraction
      • Entity recognition
      • Relation extraction
  2. Text summarization
    • Extractive (location and frequency of text units)
    • Abstractive (semantic information)
  3. Question answering
  4. Audio (speech) analytics
    • Transcript-based approach (large-vocabulary continuous speech recognition, LVCSR)
    • Phonetic-based approach
  5. Video analytics
  6. Social media analytics
    • Content-based analytics
    • Structure-based analytics
      • Community detection
      • Social influence analysis
      • Link prediction
  7. Predictive analytics
In conclusion the paper argues for new techniques that would address such issues as the irrelevance of statistical significance, heterogeneity and computational efficiency in big data.