DIKW: Data, Information, Knowledge, Wisdom

Sep 17, 2019

Field interviewers and survey selection bias

Notes from: Stephanie Eckman, Achim Koch, Interviewer Involvement in Sample Selection Shapes the Relationship Between Response Rates and Data Quality, Public Opinion Quarterly, Volume 83, Issue 2, Summer 2019, Pages 313–337, https://doi.org/10.1093/poq/nfz012

The paper examines the role of interviewers in sample selection and its impact on data quality, especially the response rates. They use the European Social Survey - a face-to-face survey of behavior and opinions in many EU countries. Selection methods vary from using countries' registries of individuals (no interviewer involvement) to household and address registries, where the sample may have more than one house and the interviewer selects a housing unit and a person to interview) to random walks, where the interviewer selects every k-th unit and conducts interviews.

The total survey error framework associates the descrease in data quality with the following sources of errors:
- undercoverage (some persons have no chance to be selected)
- nonresponse (not all selected persons participated)
- sampling error
- measurement error (the response given does not match the true value)

This study focused on undercoverage and nonreponse as selection bias and used an external (another larger survey in the EU) and internal (difference from 50% female sample) measures of this bias.

The results suggest that when interviewers are not involved in sample selection, response rates are unrelated to selection bias. However, when interviewers are involved in sample selection, the response rates are higher, but they're associated with more selection bias. The paper concludes:

The most important issue for researchers who rely on survey data is how we can prevent manipulation of selection by interviewers. We recommend using sampling methods that minimize interviewer selection, as far as possible. Improved training and supervision of interviewers could also reduce interference in the selection process. If interviewers did not feel pressured to achieve high response rates, they might allow the selection process to be fully random and selection bias would be smaller.

Mar 29, 2018

Cybersecurity Curricular Guideline

A report released by the Task Force on Cybersecurity Education provides a comprehensive framework and guidelines for cybersecurity post-secondary education (pdf). According to the presentation of one of the task force co-chairs, Diana Burley, it was a huge effort with many consultations, travel, and experts involved. And it went through the endorsement process with four major computing organizations: ACM, IEEE, Association for Information Systems Special Interest Group on Security (AIS SIGSEC) and International Federation for Information Processing Technical Committee on Information Security Education (IFIP). The resulting report can hopefully help to define cybersecurity as a discipline, describe proficiency needed for cybersecurity experts, and connect academic programs with industry needs. Ultimately, bringing some common understanding and standardization into cybersecurity education should improve the education and help fill a shortage of security professionals.

In terms of definition, cybersecurity involves the creation, operation, analysis, and testing of secure computer systems. The report assumes that while it is an interdisciplinary area that includes law, policy, human factors, ethics, and risk management, it is fundamentally a computing-based discipline. One of the challenges in developing curricula guidelines was to accommodate large variability of cybersecurity programs - depending on in which department or program they're created, there can be significantly different content and emphasis. So the guidelines are designed to have some flexibility through the notion of disciplinary lens. The program should be based on a solid computer science foundation with input from computer and software engineering and information systems and technologies and include cross-cutting concepts such as confidentiality, integrity, risk, and systems thinking.

The report shows a serious effort to be comprehensive and yet flexible. It includes eight knowledge areas: data, software, components and connections, system, human, organization, and society. Each area has several comprising units along with described essentials and learning outcomes. There is some overlap between areas and units, which again, helps to accommodate the variety of existing education efforts. Below is a summary that provides a quick overview of some areas:

It is nice to see that ethics is a significant and explicit component of the curriculum. While it doesn't remove the challenge of educating technical professionals on ethics and human behavior, it certainly provides space for discussions. More information about the guideline and the task force is at http://cybered.acm.org/

Feb 14, 2018

Chomsky and Foucault on human nature and power

Notes from a televised debate between N. Chomsky and M. Foucault in 1971 (video and transcript).

Chomsky begins with examples from linguistics to illustrate the notion of "innate structures". Children are successful in learning the language because they can use "innate language" or "instinctive knowledge" to transform limited data they get exposed to into organized knowledge. This instinctive knowledge, which allows children to build complex knowledge structures from partial data, is a fundamental constituent of human nature. Such a constituent (a collection of innate organizing principles) must be available in other domains, such as human cognition, behavior, and interaction. This is what Chomsky refers to as human nature.

Foucault mistrusts the notion of human nature - it is one of the concepts that while not being strictly scientific, has the ability to "designate, delimit and situate" certain types of discourses. For Chomsky it is ok to start with the concept of human nature as somewhat mystical (similar to gravitational forces or other scientific concepts) and later explain it through physical components (e.g., neural networks). Chomsky describes his approach as looking at the earlier stages of scientific thinking (great thinkers, more specifically) and understanding how they were able to arrive at concepts and ideas not available to anybody before.

Foucault makes a distinction between individual attribution of a discovery and collective production of knowledge, which can be referred to as "tradition", "mentality", or "modes". The former has been highly valued, while the latter is usually negativized. Another distinction is between knowledge as human activity and truth. The latter may be hidden from humans, but it will be unveiled. Attribution and relation to truth are interconnected. Throughout history we see examples of how the subject of truth (the individual revealing it) has to overcome myths and common thought, he has to "discover". What if this close relation of subject to truth is an effect of knowledge? What if truth is a complex non-individual formation? Can we replace individuals in the production of knowledge?

This position highlights a difference between Chomsky's and Foucault's approach to creativity. According to Foucault, Chomsky had to introduce the speaking subject into linguistics because language has been commonly studied as a system with a collective value. In language we have a few rules and elements and an unknown system of totalities that can be brought to light by individuals. In the history of knowledge, it's similar, but one has to overcome the dominance of individual creativity to show that there are rules and elements that can be transformed without explicitly passing through an individual.

Throughout the debate both scholars touch on many concepts from science and politics. Some of them are described below to highlight their differences:

Concept	Chomsky	Foucault
Domain (Focus)	Language	Knowledge
Human nature	Comprised of innate structures that allow for learning and arriving at complex knowledge based on partial information	A historical construct that can organize knowledge, but also can delimit how we see human behavior
Creativity	A common human act of thinking about a new situation, describing it and acting in it	An individualistic act that has been emphasized throughout history without looking at general communal rules that are behind it
Freedom	Limited number of rules with infinite possibilities of application	"Grille" of many determinisms that affects how we arrive at knowledge and understanding
Ideal model of society	A federated, decentralised system of free associations, incorporating economic as well as other social institutions	No such model can be proposed, it is more important to expose the power that controls society, especially institutions such as education and medicine that appear neutral

Somewhere in the middle, Chomsky also tried to bring their differences closer:

CHOMSKY: ... That is, I think that an act of scientific creation depends on two facts: one, some intrinsic property of the mind, another, some set of social and intellectual conditions that exist. And it is not a question, as I see it, of which of these we should study; rather we will understand scientific discovery, and similarly any other kind of discovery, when we know what these factors are and can therefore explain how they interact in a particular fashion.

While Foucault didn't completely agree to that, the conversation was still building upon each other's ideas:

FOUCAULT: ... ultimately we understand each other very well on these theoretical problems. On the other hand, when we discussed the problem of human nature and political problems, then differences arose between us. And contrary to what you think, you can’t prevent me from believing that these notions of human nature, of justice, of the realisation of the essence of human beings, are all notions and concepts which have been formed within our civilisation, within our type of knowledge and our form of philosophy, and that as a result form part of our class system; and one can’t, however regrettable it may be, put forward these notions to describe or justify a fight which should - and shall in principle – overthrow the very fundaments of our society. This is an extrapolation for which I can’t find the historical justification.

Apr 21, 2017

March for Science - to march or not to march

Apparently, there is a big controversy with regard to March for Science (M4S) that will take place this Saturday April 22, 2017 in DC and in many other cities around the US.

The main stated goal of the march is to support publicly funded and publicly communicated science as a pillar of human freedom and prosperity. I was set on going because it seems that nowadays science needs support, because regardless of whether you believe in such thing as objective truth-seeking (I have my doubts), scientists can and should be political in defending their institutions and their role in public life. But mostly I was set on going because we need to resist anti-intellectualism and assaults on reason. My own reasons more-less clear, I didn't pay much attention for any discussion around the march. And I bought a t-shirt even though merchandising around protest movements seems out-of-place. Perhaps, because this march is not a protest or social justice movement.

Many people feel strongly that the march is wrong. That they were excluded from planning and organizing. Most importantly, that the march marginalizes non-white non-male scientists and disregards diversity. That it is a microcosm of liberal racism and that march organizers pushed out those who argued for inclusiveness and intersectionality. The controversy is scattered across mass and social media, but to summarize one side (organizers) is complicit in making the march a watered-down non-political "celebration of science". The other side (#MarginSci-ers) perceives the march as a social justice movements and wants the message of diversity (which applies to any context of American life) be reinforced through this movement as well. An interesting analysis of the march diversity discourse shows how organizers shifted their position with regard to diversity, thereby conforming to existing stereotypes and dominant discourse:

Unfortunately, through various miscommunications, including from the co-chairs and other key members of the MfS committee, the MfS audience has been primed to reinforce the established discourse about science. It took the better part of two months of constant lobbying and external pressure from minority scientists for the MfS organisers to finally reverse their stance. The fourth diversity statement finally states that science is political. At the same time, more recent media interviews that position diversity as a “distraction” undermine this stance.

In a sense, controversy is good. It highlights gaps in a movement and could potentially help to develop a robust program and action plan. But what is this movement? Upon reading the history of its organization, the march seems more like a top-down attempt to organize and contain rather than a grass-root protest and demand for change. It's being done professionally with attempts to control the message and the goals. Is "celebration" enough to ensure change? Do I need to celebrate science or to improve the mutual relationship between science and society? Are we mobilizing only because we want public funding and therefore need to "educate" the public and policy-makers?

There is a high probability that with the goals of celebration, connections, understanding and outreach, M4S will follow #Occupy and Women's March movements - much enthusiasm and no action due to the lack of clear vision and strategies for change. A strong movement should have strong demands, which can then translate to specific legislation and policies. For example,

Equal pay and opportunities in science and research
Strong science education across all states
Protections for whistle-blowers and government scientists from political repressions
No marketization of science and education
Exposing and dismantling the military-industrial-scientific complex

Feb 21, 2017

U.S. House, Indiana District 9 General Election 2010-2016 Visualization

This is my first attempt to create a choropleth map - a map that visualizes measurements by shading geographic regions. I used election results data from in.gov - general election of US House representatives from Indiana Congressional District 9, years 2010, 2012, 2014, and 2016. The maps below represent percent of people who voted for Democrat party Candidates (Baron P Hill in 2010, Shelli Yoder in 2012 and 2016, and William Bailey in 2014).

The process was tedious, but straightforward:

Find data and get it into appropriate format (some manual copying from PDF was needed)
Calculate statistics needed for mapping (here percent voting for Democrats within county)
Get geographic (shapefile) data
Combine stats and geographic data
Generate choropleth map

This tutorial on creating maps with R and this vignette about tmap package were very helpful.

A quick analysis of the District 9 US House elections over time shows that some counties (e.g., Monroe county) are strong in voting for Democrats and some counties (e.g., Morgan and Orange counties) are much weaker. In 2012 though Orange, Washington, Harrison and some other counties suddenly had nearly half of county voters voting for Democrats. The turnout in 2012 was higher than in 2010, but it was comparable to 2016, when Shelli Yoder was the candidate again. The year of 2012 was also when only two candidates ran, so may be we need to look at other candidates and how they take votes. More data and more analysis needed.