🤓 [DS Daily] Causal Inference
"Statistics is like a bikini: what they reveal is suggestive, but what they conceal is vital." - Aaron Levenstein
🤔 What is "Causal Inference"?
Explain it like I'm a CEO:
Causal inference is the study of how changes in one variable (the cause) affect another variable (the effect). It's like figuring out why things happen, and how they're connected. Imagine you're trying to understand why sales of your company's products are down. Is it because of a new competitor, changes in consumer preferences, or a poor advertising campaign? Causal inference can help you figure out what's causing the decrease in sales so you can make informed decisions to turn things around.
Why do I care about causal inference?
Your company's decisions impact a lot of people and a lot of resources. Without understanding the cause and effect relationship, you might make decisions that actually make the problem worse or miss out on opportunities to make improvements. By using causal inference, you can make informed decisions that have a positive impact on your business.
How can I apply causal inference?
There are many methods to do causal inference, ranging from simple observational studies to more complex experiments. A simple example of causal inference in business might be to run a survey to find out why customers are leaving and use that information to improve customer retention. Another example might be to run an A/B test to determine if a new feature will increase sales.
🤓 For the experts
Three principles to remember and master:
Counterfactuals: A counterfactual is an imaginary scenario where the cause and effect are changed. For example, what would happen to sales if we had a different advertising campaign?
Randomization: Randomization helps us determine causality by randomly assigning causes to different groups and observing the effects. This is why running online A/B tests is so vital.
Covariate Balancing: Covariate balancing refers to ensuring that the groups being compared are similar in terms of other factors that could be affecting the outcome, such as age or income.
A key question to ask in causal inference is "what would have happened if we had done something differently?".
How do I do causal inference with observational data?
Causal inference with observational data can be challenging, as the relationships between variables are not easily identifiable and often confounded by other factors. However, there are several methods that can help control for these confounds and make causal inferences from observational data, including:
Matching: This method involves finding pairs of individuals with similar characteristics, and comparing outcomes between the pairs to estimate the effect of a treatment or exposure. This can be done through exact matching, where individuals are matched on all relevant covariates, or through propensity score matching, where individuals are matched on the estimated probability of receiving a treatment or exposure.
Instrumental variables: This method involves identifying a variable that is associated with the exposure of interest and affects the outcome only through the exposure, and not through any other direct or indirect pathways. The causal effect of the exposure on the outcome can then be estimated by using the instrumental variable to predict the exposure, and comparing the outcomes between individuals with high and low predicted exposure.
Regression adjustment: This method involves adjusting for covariates in a regression model that could be confounding the relationship between the exposure and outcome. This can be done through multiple regression, where multiple covariates are adjusted for simultaneously, or through regression stratification, where the regression is performed within subgroups defined by the covariates. Here’s a paper to read more.
Sensitivity analysis: This method involves exploring the robustness of causal estimates to different assumptions about the relationships between variables, by conducting multiple analyses with different assumptions and comparing the results. This can help identify potential sources of bias or confounding and suggest ways to improve the causal inferences.
It is important to note that while these methods can help control for confounding and make causal inferences from observational data, they are not foolproof and the results should be interpreted with caution.
📖 A bit of history
A key figure in the history of causal inference is Judea Pearl. He's known for his work on causal modeling and causal inference algorithms, including the use of Bayesian networks to represent causal relationships. He invented a new paradigm to describe causal relationships. Read Pearl’s blog here.
🐼 Data Science all the Things
There are several open source packages that can help you with causal inference. Here are some options in Python and R.
Python
The causallib (⭐️531) package has a variety of techniques for determining causal relationships using machine learning.
!pip install causallib
import causallib
model = causallib.CausalModel(data=data, treatment='A', outcome='Y')
result = model.refute_observational()
print(result)
R
The ggdag package can help visualize causal relationships.
install.packages("ggdag")
library(ggdag)
data(dagdata)
ggdag(dagdata, vertex.label = TRUE)
🧠 Drop your Knowledge
What's one thing you've learned about Causal Inference in your experience? Drop your insights in the comments below 👇!
What did you think of this article?