Causal Inference - My Learning Journey
Instructor: Parijat Dube
Term: Summer 2024
About This Repository
This repository documents my learning journey through DS-UA 201: Causal Inference - a course that taught me how to think critically about cause and effect in data science. Each assignment represents a step in building my understanding of causal inference methods, from basic concepts to advanced techniques.
My Learning Journey
Assignment 1: "The Power of Natural Experiments"
Learning to identify causal relationships in real-world data
What I learned: This assignment introduced me to the concept of natural experiments using leader assassination data. I discovered that even seemingly random events can reveal systematic patterns when analyzed carefully.
Key skills developed:
- Identifying natural experiments in observational data
- Calculating Average Treatment Effects (ATE) and Average Treatment Effects on the Treated (ATT)
- Understanding the importance of randomization checks
- Using statistical tests to validate assumptions
Why this matters: Natural experiments are everywhere in the real world - from policy changes to unexpected events. Learning to spot and analyze them gives us powerful tools to understand causality without expensive randomized trials.
Assignment 2: "Randomization: The Gold Standard"
Mastering experimental design and randomization inference
What I learned: I explored how randomization creates the foundation for causal inference, working with data from a gay marriage canvassing experiment and election fraud detection.
Key skills developed:
- Designing and analyzing randomized experiments
- Using the Neyman estimator for treatment effects
- Conducting randomization tests
- Understanding block randomization and its importance
Why this matters: Randomization is the backbone of causal inference. When done right, it eliminates confounding and gives us confidence that our results reflect true causal relationships rather than spurious correlations.
Assignment 3: "Heterogeneous Treatment Effects"
Discovering that one size doesn't fit all in causal analysis
What I learned: I moved beyond average effects to understand how treatments affect different groups differently, using GOTV (Get Out The Vote) data to explore conditional average treatment effects.
Key skills developed:
- Estimating Conditional Average Treatment Effects (CATE)
- Using stratification to identify effect heterogeneity
- Creating and interpreting Directed Acyclic Graphs (DAGs)
- Understanding when and why treatment effects vary
Why this matters: Real-world interventions rarely affect everyone the same way. Understanding heterogeneous effects helps policymakers target interventions more effectively and researchers design better studies.
Assignment 4: "Survey Data and Propensity Scores"
Learning to work with observational data when experiments aren't possible
What I learned: I tackled the challenge of causal inference in survey data, comparing political efficacy across countries and analyzing conditional cash transfer programs.
Key skills developed:
- Analyzing survey data for causal relationships
- Using propensity score matching to reduce selection bias
- Implementing inverse probability weighting (IPW)
- Understanding the limitations of observational studies
Why this matters: Not everything can be randomized. Learning to work with observational data while maintaining causal rigor is essential for many real-world applications in social science and policy research.
Assignment 5: "Time and Causality"
Mastering longitudinal analysis and instrumental variables
What I learned: I explored how time affects causal relationships, learning to use difference-in-differences and instrumental variables to identify causal effects in complex, real-world scenarios.
Key skills developed:
- Implementing Difference-in-Differences (DiD) designs
- Using instrumental variables to address endogeneity
- Working with panel data and fixed effects models
- Understanding parallel trends assumptions
Why this matters: Many important questions involve changes over time and complex causal mechanisms. These methods allow us to answer questions that simpler approaches cannot handle.
Final Project: "Synthesizing Everything"
Bringing all the pieces together
What I learned: The final project challenged me to apply everything I'd learned to analyze patent data, examining how foreign inventions affect domestic innovation.
Key skills developed:
- Integrating multiple causal inference methods
- Critically evaluating research designs
- Communicating complex findings clearly
- Understanding the practical limitations of different approaches
Why this matters: Real-world problems rarely fit neatly into one methodological box. The ability to combine different approaches and think critically about research design is what separates good researchers from great ones.
Technical Skills Acquired
- R Programming: Data manipulation, visualization, and statistical analysis
- Causal Inference Methods: ATE, ATT, CATE, DiD, IV, PSM, IPW
- Research Design: Experimental design, natural experiments, observational studies
- Statistical Analysis: Hypothesis testing, confidence intervals, randomization inference
- Data Visualization: Creating clear, informative plots and tables
- Critical Thinking: Evaluating research designs and interpreting results
Why This Course Matters
Causal inference is the foundation of evidence-based decision making. Whether you're:
- A policymaker deciding which programs to fund
- A business analyst evaluating marketing campaigns
- A researcher studying social phenomena
- A data scientist building predictive models
Understanding causality helps you move beyond correlation to identify what actually works and why. This course taught me to think like a scientist - to question assumptions, design rigorous studies, and interpret results with appropriate caution.
Personal Reflection
This course transformed how I think about data. I went from seeing patterns to understanding processes, from correlation to causation. Each assignment built on the previous one, creating a comprehensive toolkit for causal analysis. The most valuable lesson wasn't just learning specific methods, but developing the mindset to approach any causal question systematically and rigorously.
This repository represents my journey from a data analyst who could describe patterns to a researcher who can identify causes. The skills I've developed here will serve me throughout my career in data science and research.
