Why selection bias matters?

Either you have started in the field of data science or you have been in the field for a while, you must definitely hear about “selection bias”. So what is selection bias and why does that matter?

In technical terms, selection bias is defined as:

Selection bias is defined as a nonrandom imbalance among treatment groups of the distribution of factors capable of influencing the end points—that is, of subexperimental factors (including prognostic factors).

From: Handbook of Pharmacogenomics and Stratified Medicine, 2014

The purpose of most scientific studies, or research problems, is to find the effect of something on overcoming a problem. To do that, researchers usually find two groups with similar characteristics, and the only difference is that one group exposes to the intervention and one group does not. Then, they measure the difference in outcomes between the two groups.

The key assumption is that two selected groups should be “similar”. However, sometimes these groups are not similar to begin with. The key characteristics that differentiate the two from the beginning might play an important role in determining the outcome. That is what is called selection bias.

For example, you want to look at the impact of hospitalization on health. If you are comparing the health outcomes of those who were hospitalized with those who were not, you are making a selection bias. It is apparent that those who are hospitalized are generally sicker than the population as a whole. Hence, they are hospitalized. In this case, selection bias happens when people “self-select” into the study.

This is a very concrete and easy-to-understand explanation:

Selection bias is a kind of error that occurs when the researcher decides who is going to be studied. It is usually associated with research where the selection of participants isn’t random (i.e. with observational studies such as cohort, case-control and cross-sectional studies).

From: Institute for Work & Health (2014)

Selection bias causes unfairness when concluding the effectiveness of the program or the intervention. As a result of self-selection, there are other factors (a.k.a. cofounding factors) affecting the outcome other than the program.

Often, it is a bold statement to say that we can find ways to completely eliminate selection bias. A typical method to minimize selection bias is to create randomized control trials, where participants are randomly assigned into treatment group (the group that receives the intervention) and control group (the group that does not receive the intervention). However, selection bias can still sometimes occur despite randomized control trials. Therefore, it is important for researchers to acknowledge and examine their study design to minimize selection bias.

Leave a comment