@Brady: In Jason A. Roy's Coursera course both no-interference assumption & only one way of getting treatment is clubbed under SUTVA;
@Theviswanath574 жыл бұрын
Whereas in your example of "Golden retriever or other dog" which I guess violating "only one way of getting treatment assumption" your putting consistency assumption
@BradyNealCausalInference4 жыл бұрын
@@Theviswanath57 Not entirely sure i understand you comment, but are you saying this: "SUTVA is satisfied if unit (individual) i's outcome is simply a function of unit i's treatment. Therefore, SUTVA is a combination of consistency and no interference (and also deterministic potential outcomes)." If so, that sounds right to me. That's taken from Section 2.3.5 of the course book (not everything makes it into the lecture)
@Theviswanath574 жыл бұрын
@@BradyNealCausalInference make sense, thanks
@sahilverma16354 жыл бұрын
Hello Brady. I have a silly doubt, what is the difference between Y(0) and Y | T= 0 ?
@BradyNealCausalInference4 жыл бұрын
Y(0) corresponds to "take a random person in the whole population and force them to take treatment 0." Y | T = 0 corresponds "take a random person from the subpopulation that happened to take treatment 0." Some of the comments in the threads on this video might also be helpful: kzbin.info/www/bejne/m5iQk3meg7CVpLs
@michelspeiser5789 Жыл бұрын
@@BradyNealCausalInference This is a very helpful formulation, that I recommend to be included in the course (unless it's already there and I missed it)
2 жыл бұрын
In Unconfoundedness, does the conditioning to X means if we fill the group "went to sleep with shoes" with ALL PEOPLE DRUNK, and fill the group "went sleep without shoes" ALSO WITH DRUNK PEOPLE, is a workaround for to fill both groups with random people, selected by a coin? The negative aspect of this is that some data will be lost because we only care about a subset of the dataset (e.g. DRUNK=1, ignoring all data with DRUNK = 0)?
@YashSharma-yw9er3 жыл бұрын
How is the two groups (shoe sleepers and non-shoe sleepers) not being comparable considered a separate reason for association not being causation? Isn't it indirectly a confounder as well?
@Fhoneysuckle Жыл бұрын
Hi Brady,thanks for your awesome lecture.But I have a question about the Ignorability and Exchangeablility.In the Causal Inferences: What if ,Randomization refer to the joint independence of potential oucome as full exchangeability. Randomization makes the potential outcome jointly independent of treatment T which implies, but is not implied by exchangeability.So why the Randomization/Ignorability means joint independence rather than marginal distribution?
@Theviswanath574 жыл бұрын
In Slide #40 with regards to Estimation: I feel it should be sigma_i rather than sigma_x; Currently it's 1/n * ( Sigma_x ( { E[Y | T=1, x ] - E[Y | T=0, x] } )) I feel it should be 1/n * Sigma_i ( { E[Y | T_i=1, X_i ] - E[Y | T_i=0, X_i] } )) which we can re-written as Sigma_x ( P(X=x) * (E[Y|T=1, X=x] - E[Y|T=0, X=x]) )
@BradyNealCausalInference4 жыл бұрын
You are absolutely right. Unfortunatley, some typos might stay in the videos, even if they have been fixed in the book.
@Theviswanath574 жыл бұрын
Reason: Let's there are four sub-groups with following conditional average treatment effect: 1, 0.5, 1.5, 2.5 Let's say P(X=x) = [ 0.5, 0.2, 0.2, 0.1] Let's say there are total 100 subjects with the first equation: ATE will be (1/100) * ( 1 + 0.5 + 1.5 + 2.5 ) = (1/100) * 5.5 = 0.055 with the second equation ATE will be ( 0.5*1 + 0.2*0.5 + 0.2*1.5 + 0.1*2.5 ) = 1.15
@sourajmishra14503 жыл бұрын
Hey Brady, Thanks for the great course!! In slide 17: Why does E[Y(1)|T=1] becomes E[Y|T=1]? and same for E[Y(0)|T=0] = E[Y|T=0]?
@shipan59402 жыл бұрын
my understanding, because the condition is T=1, so Y(T) = Y(1) = Y(all) = Y. My own way of explaining this. If T could be 1 or 0, it can't be simplified like this.
@rajeevbhatt74154 ай бұрын
It's after applying the consistency assumption because we are guaranteed that for T=t, we will get Y(t), so Y | T = t is sufficient.
@edisonge93114 жыл бұрын
Hi Brady, in page 18, I understand your point here, but I have a question about the definition of E[Y(1)|T=0]. If we observe T=0, then what the meaning of Y(1) here?
@BradyNealCausalInference4 жыл бұрын
Y(1) given that you observe T = 0 is the outcome you would have observed if you had taken T = 1. It isn't something that we can observe (usually)! I think I give the intuition for this on the potential outcomes intuition slide.
@edisonge93114 жыл бұрын
@@BradyNealCausalInference So observation T=0 is independent of the do-operation Y(1), then we also can get E[Y(1)] - E[Y(0)] = E[Y(1)|T=0] - E[Y(0)|T=1] , right? But we cannot use consistency law here, therefore, in ICI, Eq.(2.3), it's E[Y(1)] - E[Y(0)] = E[Y(1)|T=1] - E[Y(0)|T=0]. Is my understanding correct?
@scotth.hawley15604 жыл бұрын
Great lecture, but starting at 20:02 I become lost: How is E[Y(1) | T=0] not a contradiction? If you do(T=1) then doesn’t that force T=1?
@BradyNealCausalInference4 жыл бұрын
Yes, but T=0 is *conditioning* on T=0, not doing T=0. So condition on T=0 means "look at the people how happened to not take the treatment." Then for those people, Y(1) means "what would have happened had they taken the treatment?"
@scotth.hawley15604 жыл бұрын
@@BradyNealCausalInference Thanks so much for taking the time to respond! This clarification helped me be able to move forward.
@BradyNealCausalInference4 жыл бұрын
@@scotth.hawley1560 Glad to hear it! Thanks for bearing with me on the slow response time haha.
@Ptilu22 жыл бұрын
Hi Brady! Thank you so much for those lovely pedagogical videos! There is something I am struggling to wrap my head around though and I was wondering if somebody (you or some other kind soul) could help me with here. You presented ignorability as resulting from an assumption of independence between the causal variables Y(1) and Y(0), leading to E[Y(1)|T=0] = E[Y(1)|T=1]. Isn't this independence meaning that basically the treatment has no causal effect on Y? Instead of removing the arrow from X to T, aren't we removing all arrows leading to T? If I try to explain in other words my confusion: if the expectation of the outcome Y(1) does not change whether we give T or not, doesn't it mean that T is not causal for Y? I am obviously having a logic flaw here somewhere so I would be glad if someone could help me seeing it :)
@Ptilu22 жыл бұрын
I think I am confusing Y(1) with Y=1 here, while in fact it is Y|do(T=1). Some getting used to...
@kangchenghou50274 жыл бұрын
Thanks for the great lecture again! I learnt a lot and I have a few questions: 1. The fundamental problem in causal inference refers to that for each individual, we only get to observe one potential outcome. Ways to get around this is to make assumptions, therefore convert a causal estimand to a statistical estimand. So far in the course, it seems that we cope with average treatment effects. To estimate individual treatment effects, is it that we need more assumption there? Will we cover that in the course? 2. For positivity assumption, if for some covariates, P(T = 1 | X = x) is very close to 0 or 1. The estimation will be fine if we have access to the full distribution. When it goes to the estimation using the finite samples, it will leads to big variance. So to have good estimate to the treatment effects, we would want P(T = 1 | X = x) not go to the extreme, is this correct? This also reminds me of the bias-variance tradeoff: including more covariates reduce confoundedness (bias), but may lead to estimate with high variance (variance). Does this make sense? 3. This is more of a comment: I think the lectures mentions that including more covariates is better (correct me if I am wrong). I think it may worthwhile to mention, this is not always the case, for example X -> C
@BradyNealCausalInference4 жыл бұрын
1. Awesome question. Makes me think you already know the answer haha ;). To move from ATEs to ITEs, we do need to make stronger assumptions. The stronger assumptions we need to make have to do with the specific functional form and noise distribution (in addition to the causal graph). This corresponds to moving from Level 2 to Level 3 of Pearl's ladder. We will see this later in the course when we get to counterfactuals.
@BradyNealCausalInference4 жыл бұрын
2. You are exactly right on both counts. When we get to estimation in week 5, we will actually see that people sometimes just drop specific examples where P(T = 1 | X = x) is too close to 0 or 1. Your bit about the bias-variance tradeoff is also right (usually).
@BradyNealCausalInference4 жыл бұрын
3. Right again. I mention this in sidenote 8 of Chapter 2 in the book (www.bradyneal.com/Introduction_to_Causal_Inference-Sep1_2020-Neal.pdf). I think I meant to use weak language in the lecture (e.g. "there is a general perception that this is the case"). If I used strong language (e.g. "this is the case"), would you mind linking me to it, as I should probably correct that with an annotation.
@BradyNealCausalInference4 жыл бұрын
4. I do everything with PowerPoint and TikZ (since I use TikZ for the book, might as well just reuse those figures in the slides). I sometimes use Inkscape when I need more flexibility than both of those can easily provide.
@kangchenghou50274 жыл бұрын
@@BradyNealCausalInference Thanks for the detailed explanation! For 3, it could be just my perceptual bias :) You did mention this is not the general case. But just for the reference, 34:32 "for unconfoundedness, the general idea (which is not always true) is that the more covariates you condition on, the more likely you are to have satisfied unconfoundedness." For 4, may i know how do you integrate the latex with powerpoint?
@tOo_matcha2 жыл бұрын
31:13 that split of a second when you see the Death Star 😂
@KyleReevesSci Жыл бұрын
Was looking for this comment 😂
@charismaticaazim4 жыл бұрын
Brady, do the casual theory literature say anything about "knowing the presence of confounding variables, but not being able to know or measure what they are". This would hint the domain expert that there's something else that is influencing the decision. Also, in terms of the shoe example, since we know being drunk is contributing to the outcome it wouldn't really be a confounder if we know it, right.
@Theviswanath574 жыл бұрын
On Final Estimation example: Question 1: By controlling for age, our estimated ATE is matching with actual ATE; but whereas by controlling for both age & 'protein excreted in urine', our estimated ATE is just 0.85; Question 2: What's the causal graph with both age & protein excreted in the urine age blood_pressure } where age is confounding variable Actual ATE: 1.05 & estimated_ate: 1.05 ( Both from the "mean of differences" & from regression coefficient )
@BradyNealCausalInference4 жыл бұрын
I'm not sure I see a question in there haha. It sounds like you are describing the code. Note: some of that code is for Chapter 4, where we acually write down the causal graph, so it might not all make sense without Chapters 3 and 4.
@Theviswanath574 жыл бұрын
@@BradyNealCausalInference Cool, will wait for chapter 3 & 4 to be covered
@RobertKwapich3 жыл бұрын
Great course! Any particular books or review papers that you could recommend to read in more detail?
@alialthiab7527 Жыл бұрын
Have you found any?
@gwillis33233 жыл бұрын
Hey, you say that the approach at the end, where you train a regression of the form y=at + bx only works because the treatment effect is the same for all individuals (ATE=CATE). I don't think this is correct. In fact, the paper which introduced the Double Machine Learning approach starts off by showing that for the case of y = at + g(x), standard approaches which predict y well will give biased estimators for a (although granted, the Double Machine Learning approach really starts to shine when y=f(x)t + g(x)). Do you have any intuition on why the linear regression approach works so well here? Is it because the outcome variable depends linearly on both the treatment and the feature? Will it always work well in such cases? My intuition says no, that confoundedness can still mess you up. Maybe it's just a quirk of this exact dataset?
@tyflehd2 жыл бұрын
Hello Brady, thank you for the awesome video :) I come over here to get an intuitive understanding on causality. I have a question on lecture slide 14. If the group from T=1 and T=0 are comparable, shouldn't it be drunk on the right if it is sober on the left. Based on my understanding, let's say I am the topmost guy in both groups (T=1, T=0). How can I be included in a group 'go to sleep with shoes on' and the other group 'without shoes on' under the same condition 'drunk'? Please correct me if I am wrong. Thanks!
@rajeevbhatt74154 ай бұрын
The same person cannot be included in both groups. just the number of people in both groups is almost same, due to randomization.
@jitingjiang74014 жыл бұрын
Hi Brady, Thanks for this lecture. It is super great. I have one question about the fourth assumption for identification, i.e. consistency. To illustrate the concept, you mentioned an example with two different types of dogs as multiple versions of the treatment. I am wondering, is it really a problem? I guess one can always define a specific version of treatments as the T, right? Thank you!
@BradyNealCausalInference4 жыл бұрын
Yes, that just means being sufficiently specific about how you define the treatment.
@galaxystat4 жыл бұрын
Hi Brady, thanks for great lectures! I read the book of why by Judea Pearl. Any difference between potential outcomes framework and counterfactual calculation in the Peral's book ? I saw some comments in the book that Judea thought missing value interpretation was wrong. What methodology do you recommend in practical applications ? or they are just the same ?
@BradyNealCausalInference4 жыл бұрын
I think the two languages share a lot more than a lot of people seem to think. To me, they are simply different notations and different ways to formulate the assumptions. You should be able to understand both, so I include them both in the first month of the course. I use both, depending on the setting or who I'm talking to.
@adrianoyoshino3 жыл бұрын
In the consistency example I got the point that we can't have multiple treatments (like different type of drugs as a treatment). But does it has to have the same outcome always? I mean, is it possible having a case where I take a pill one day and I get better but I take a pill another day and the headache does not get better?
@rajeevbhatt74154 ай бұрын
Not following consistency is like adding more nodes to the causal graph. For example, the dog type in the given example, along with whether the person got a dog. Similarly, if the pill's effect is different each day, a day node needs to be added to the causal graph.
@Theviswanath574 жыл бұрын
Slide #40: Naive estimate might have been estimated through following regression equation: Y_i = alpha + Beta * T_i; alpha_hat is 5.33 ?
@BradyNealCausalInference4 жыл бұрын
Not quite. That simple regression and taking the coefficient from the regression is actually what I describe for slide *41*. And in your comment, *beta* hat is actually the ATE estimate (5.33), not alpha hat. In the notation I use in slide 41 (different from yours), it is alpha hat that is 5.33.
@Theviswanath574 жыл бұрын
@@BradyNealCausalInference yeah that's right, little confused; thanks
@Theviswanath574 жыл бұрын
Where can I get the data
@BradyNealCausalInference4 жыл бұрын
@@Theviswanath57 See the GitHub link in Section 2.5 of the book for the data generation and estimation code.
@charismaticaazim4 жыл бұрын
Reporting a mistake: Around 5:03 Brady said T=0 for taking the pill. It shld be T=1.
@souradipchakraborty70714 жыл бұрын
Can we have a non-linear cause and effect relationship? In that case, how do we estimate the exact effect ?
@BradyNealCausalInference4 жыл бұрын
Yes! You'd use the same estimator that is used in slide 40, but with a nonlinear model instead of linear regression. You can also use any of the other estimators that we discuss in week 6 of the course.
@souradipchakraborty70714 жыл бұрын
@@BradyNealCausalInference Thanks will definitely check the week 6 course. I asked as if there is non-linearity with respect to T, then Y_hat = alpha * T + alpha' * T^2 + alpha'' * T^3.... + beta_X. Then which coefficient would give us the causal effect of T on Y.
@Theviswanath574 жыл бұрын
@Brady: In Slide #41, I am wondering estimation should be sigma_x ( P(X=x) * ( E[ Y/T=1, X=x] - E(Y/T=0, X=x) )
@Theviswanath574 жыл бұрын
In your variant essentially we are saying that P(X=x) is same for all x; please correct me if I am wrong;
@BradyNealCausalInference4 жыл бұрын
@@Theviswanath57 In slide 40, it is that equation that you write, assuming that you meant "E[Y | T=1, X=x] - E[Y | T=0, X=x] " when you wrote "E[ Y/T=1, X=x] - P(Y/T=0, X=x)." However, in slide 41, we use a completely different way to estimate the ATE: linear regression and then using the coefficient of the regression. In general is not equal to the correct equation from slide 40. It is only equal when E[Y | T=1, X=x] - E[Y | T=0, X=x] is the same for all x (i.e. the treatment effect is the same for all individuals). I don't actually include the specific equation for the estimate in slide 41, but you can get it using the closed-form solution to linear regression. You can see the exact code that I used for this in Section 2.5 of the course book.
@Theviswanath574 жыл бұрын
@@BradyNealCausalInference regarding P(Y/T=0, X=x), yes I mean E(Y/T=0, X=x).
@Theviswanath574 жыл бұрын
Understood on "It is only equal when E[Y | T=1, X=x] - E[Y | T=0, X=x] is the same for all x (i.e. the treatment effect is the same for all individuals)."
@Theviswanath574 жыл бұрын
@brady if we have P(X=x) as part of the equation ATE is unbiased estimate even if "E[Y | T=1, X=x] - E[Y | T=0, X=x] is not same for all x "?
Thanks for the lecture! I have a question around kzbin.info/www/bejne/a6nCoYOboqaJrtU: Is E[Y(1) - Y(0)] (here the individual subscript i is implicit) properly defined since some data are missing?
@DailySFY11 ай бұрын
@mingmingchen7154 As you have pointed out it is a biased estimate. And Brady explains this clearly afterwards.
@chadpark92484 жыл бұрын
Thanks for the grate lecture again. I have a few questions about the text book In page 8. "A natural quantity that comes to mind is the associational difference: ~~~~Then, maybe E[Y(1)]-E[Y(0)] equals E[Y|T=1]-E[Y|T=0]." From these sentences, I got little confused What "maybe ~ equals" means.....
@chadpark92484 жыл бұрын
In addition, I have a question about the description section of "Consistency" on page 14. I understand Y(t) intuitively, but I don't understand "whereas Y(T) is the potential outcome for the actual value of treatment that we observe" intuitively. Do you have an example?
@BradyNealCausalInference4 жыл бұрын
Basically, it's just like a train of thought that is common to go down. "maybe E[Y(1)]-E[Y(0)] equals E[Y|T=1]-E[Y|T=0]" is the more formal way of writing "maybe causation equals association (correlation equals causation)." Of course, this thinking is often incorrect :)
@BradyNealCausalInference4 жыл бұрын
@@chadpark9248 For a given individual, they will observe a specific value, say t', for the random variable T. That means that they will observe the potential outcome Y(t'). So the realized value, t', of T gets connected to the observed outcome Y in that way (assuming consistency). Similarly, Y(T) corresponds to the potential outcome that we observe when we know the realized value of the treatment random variable T. It is distinct from Y(1), Y(0), or Y(t) which is meant to denote a specific potential outcome, that isn't related to the random variable T at all (even though, we use the same letter, but in lower case, for Y(t)).
@chadpark92484 жыл бұрын
@@BradyNealCausalInference Thank you for your detailed explanation.
Amazing video. One question. The example at the end of the lecture seems like a simple linear regression. Does it mean that when we run linear regression, we are doing causal inference? What is the difference between regression and causal inference, here?