0:09
In the previous lecture, we saw that not all students that was randomized into the different
treatment arms in project STAR stayed in the groups to which they were allocated.
Some students left the different groups for various unknown reasons.
However as long as reallocation happens randomly it does not invalidate the randomized controlled
trial in terms of estimating the causal effect of the treatment.
However, it is very likely that reallocation of individuals after randomization is not
random. Students allocated to the control group may
be unhappy by not receiving the treatment and might subsequently try to get into treatment
group anyway. In the case of small classes, most students
allocated to small classes are probably happy by being assigned to a small class and will
not try to reallocate but in many other cases some of those allocated to treatment may subsequently
try to defer treatment if possible. For instance training or education for unemployed.
Here it is often the case that many unemployed allocated to training or education does in
fact not want to do this and tries to evade treatment.
All this invalidates randomization and threatens the possibility to estimate the causal effect
of a treatment from a randomized trial. Therefore, even if RCT’s in theory enables
researchers to estimate causal effects, in practice this may prove difficult.
However, surprisingly, even in the case of non-random dropout, data from an RCT still
enables the estimation of causal effects. The following slides explains how.
Denote as before T the treatment indicator, taking the value one if the individual is
actually treated that is not only allocated to treatment but also actually treated and
zero if not treated. Further, define Z equal to one if allocated
to the treatment group and zero otherwise. Z is the randomization indicator.
Again the observed outcome is outcome either as treated or untreated.
Average outcome for those offered treatment is the average outcome without treatment for
those offered treatment plus the treatment effect (the difference between outcome with
and without treatment) for those actually treated (obtained by multiplying with the
treatment indicator) for those allocated to treatment.
Because allocation into treatment is randomized, the average baseline outcome for those allocated
into treatment is equal to the baseline outcome for those not allocated into treatment.
Now write the average gain if treated for those allocated to treatment (Z=1) separately
for those who choose not to receive treatment (T=0) and those who choose to receive treatment
(T=1). A fraction of those offered treatment declines
and one minus this fraction accepts. The gain for those who decline treatment is
zero, as they are not treated. The gain for those who accept treatment can
be rewritten as the gain conditional on being offered treatment (z=1) and accepting treatment
(T=1). We can now rewrite the average gain for those
treated (both those accepting treatment and those declining) as a weighted average of
the gain for those accepting and those declining treatment.
Further, we assume that no one can enter treatment without being offered treatment – that is
we exclude the possibility that you can sneak into treatment without being offered treatment.
Thus we only allow for dropping treatment if offered.
Hence, if you accept treatment it must have been offered, that is if T = 1, this implies
that z = 1 (but not necessarily the other way around).
From this, it also follows that the average gain for those offered and accepting treatment
is the same as the average gain for those who accepted treatment – so we do only need
to condition on that people accept treatment. From this, we are now able to derive the main
results – the “Bloom” equation named after its inventor, Howard Bloom.
On the left hand side of the equation, we have observed entities, that is, stuff that
we can calculate from the observed data and on the right we have our object of interest,
the average treatment effect for those treated. This implies that we can calculate the average
treatment effect for those who accepted treatment even if some individuals selectively leaves
treatment. Now given the algebra on the previous slide,
we can now prove the “Bloom” equation. First, the average outcome for those offered
treatment can be rewritten as the average baseline for those offered treatment plus
the average gain for those who accept treatment conditional on being offered treatment.
That is equal to the baseline outcome for those not offered treatment (due to the RCT)
plus the gain for those offered and actually treated.
This is, again, equal to the average baseline for those not offered treatment plus the weighted
average gain for those offered and accepting and those declining treatment.
This is equal to the average baseline outcome for those not offered treatment plus the average
gain for those offered and accepting treatment weighted with the fraction who accept treatment
when offered treatment. Going back to the Bloom equation at the top
of the slide, we can write the average outcome for those not offered treatment as the baseline
outcome for those not offered treatment – as this group is not treated and they are thus
unaffected by the treatment but otherwise equal to those offered treatment.
Collecting everything, we can write the nominator of the left hand side of the Bloom equation
as the baseline outcome for those not offered treatment plus the weighted gain for those
offered and accepting treatment minus the average outcome for those not offered treatment.
As can be seen everything but the right hand side of the Bloom equation nicely cancels
out leaving the right hand side of the Bloom equation.
This concludes the proof. So you have just seen that despite non-random
dropout from an RCT, we can still estimate the causal effect of the treatment for those
who accept treatment. Note that this is NOT the same as the average
causal effect for those offered treatment (including those who rejects treatment and
this has zero effect). We will never know what would have happened
to those who declined treatment, as this is not necessarily the same as what happened
to those accepted treatment due to selective drop out.
We now turn to something different. When persons are selected into treatment,
they are obviously aware that they are exposed to the treatment.
This, by itself may affect behavior. Therefore, while we can measure the causal
effect of the treatment, the interpretation is less clear if people respond to merely
being observed to a treatment. Do people change behavior because they are
affected by the treatment or because they know they are being observed? This phenomenon
is known as the Hawthorne effect, names so after the famous Hawthorne plant where researchers
try to manipulate productivity by changing the work environment.
However, it was later speculated that worker response was more due to being observed than
to change in work environment. Therefore, change in productivity was not
a result of change in work environment but from being observed by researchers.
Therefore, research did not imply that change in work environment affects productivity but
that being observed affects productivity, at least while being observed.
Later, other researchers has doubted the so-called Hawthorne effect and concluded that the whole
research design was flawed and that the data does not allow either conclusion.
However, to illustrate the idea behind the Hawthorne effect, we look at the STAR data.
The table on the slide shows class size by treatment arm – small classes, regular classes
and regular classes with a teacher’s aide. From the table it can be seen that for classes
of size 16 to 18 students, there are a number of classes of equal size in all three treatment
arms. Thus, if it is the actual class size that
matters and not treatment type, outcomes should be the same in all three treatment arms when
actual class size is the same. If it is a Hawthorne effect, there should
be a difference across treatment arms for the same class size.
Some caution should be considered here, though. Because, even if students are allocated into
treatments by lottery, actual class size could be a result of selective attrition and drop
out after the lottery. If we are willing to assume that ordinary
classes that are observed to be small is a result of negative selection (a bad teacher
for example) and small classes that are in high range for small classes is a result of
positive selection (a good teacher) we should expect that the causal difference between
class types is larger than the observed difference. Thus the estimated difference between treatments
arms for comparable class size is a lower bound for the true difference.
With the above caveat in mind the regressions on this slide shows the difference in math
achievement in kindergarten for students in the different treatments arms.
Students in small classes is the reference group.
The regression results in the top panel shows results for classes in the range less than
29 students and larger than 12 students, and the bottom shows results for classes with
less than 19 students and more than 16 students. This is the range from the table on the previous
slide, where all treatment arms has classes of comparable size.
From the regressions, we find that the effect of treatment arm is the same, irrespective
of whether we look at all class sizes or classes where class size is approximately the same
across treatment arms. Therefore, with the caveat from the previous
slide about the causal effect in the lower panel probably being larger than the estimated
effect, we are inclined to conclude that the causal effect of being in a small class is
more likely to be a Hawthorne effect rather than being an effect of being taught in a
small class. Therefore, when teachers and/or students are
allocated into a small class in the STAR study, this induced them to teach/study harder, not
because they are in a small class but because they are expected to perform better from being
in a small class. You should note that this example is made
up for illustrative purposes of this course and that there is not a general agreement
among researchers that the effect of the STAR project was a Hawthorne effect.
Until now, we have relied upon randomization to infer the causal effect from a treatment.
The upside of this was that it is the design that allowed the researcher to infer causality
and in principle, causality is undeniable. The downside is the external validity.
Is the observed causal effect due to the mechanisms of the treatment or is it a Hawthorne effect?
If a randomized controlled trial is infeasible or if we want to rule out Hawthorne effects,
there are alternative designs; one of the most notable, is called the instrumental variables
method. The basics of instrumental variables will
be laid out on the following slides and then the analogy to the estimator for the randomized
controlled trail will be explained. Say we want to estimate the return to education
by running a regression of log earnings on years of education.
Then we have learned that it would be dangerous to interpret the regression coefficient as
the causal effect of years of education on log earnings unless we have either randomized
years of education or that we have the full set of confounders that affect log earnings
over and above years of education. So, in the absence of data from an RCT on
years of education, what to do? Imagine that we have available a third variable,
z, that affects education but is otherwise uncorrelated with earnings.
Think of z as a variable that when it changes, causes changes in the level of education but
it has no direct effect of earnings. One example could be an educational reform
that expands the minimum years of compulsory school.
It certainly affects years of education but it is very unlikely that it affects individual
earnings over and above education. Obviously other things than a school reform
may affect education. This is indicated by the error term u.
Also, other things than education may affect earnings.
This is indicated by the error term e. It is also very likely that e and u are correlated
as they both capture the effect of stuff, e.g.
intelligence that determines the level of education and earnings.
We may write the figure as equations instead. One equation for the level of earnings, y,
12:55
and one equation for the level of education, x.
The instrumental variable only affects the level of education and NOT earnings.
Hence, it should not appear in the equation for earnings.
Note the resemblance to the treatment indicator z previously.
In an RCT, z is the indicator of whether the subject was allocated to the treatment or
control group and T was the indicator of whether treatment was actually accepted.
Here T is replaced by years of education, x.
However, the algebra is the same. Because we can estimate the causal effect
on the treated using randomization into treatment, we can also estimate the causal effect of
years of education because z (e.g. the school reform) acts as a randomizer.
In order for the instrument to deliver causal effects we need it to be independent of everything
else, just a randomization is independent of everything in the case of the RCT.
Therefore, given years of education, x, the school reform, z, must not have any direct
effect on earnings, y. This in turn implies that z must be uncorrelated
with what otherwise affects both years of education as well as earnings.
What follows does not seem to relate to how we derived the Bloom equation.
We return to this in a couple of slides. Instead, we turn to how we derived the linear
regression coefficient, now with the extension of the instrumental variable equation.
We start by working with covariance between the dependent variable and the instrument.
Inserting the expression of the dependent variables in terms of the x variable leads
to that we can rewrite the covariance between y and z as the effect of x on y, b, times
the covariance between x and z plus b times the covariance between e and z.
This implies that we can write the fraction of the covariance between y and z and the
covariance between x and z using the above expression of the covariance between y and
z and this gives us b plus a term involving b and the fraction between the covariance
between e and z and the covariance between x and z.
The denominator, the covariance between e and z is zero by assumption.
Hence, the fraction between the covariance between y and z and the covariance between
x and z is equal to b, the causal effect of x on y.
Therefore, the availability of an instrument allows us to estimate the causal effect of
x on y even when x and the error term, e are correlated.
As an example of instrumental variables in the case of the return to education, we use
the well-known case of quarter of birth; see e.g.
Angrist and Krueger (1995). The idea here is that due to the quarter of
birth, there is variation in when a person can leave compulsory school.
All pupils starts in compulsory school at the same date but might leave when they turn
15. As quarter of birth vary across respondents
but school start does not, quarter of birth might affect the educational level of the
respondents as some pupils are allowed to leave compulsory sooner than others.
This is in fact the case as the top figure shows.
Using US panel data on birth cohorts from the 1930’s we find clear seasonal patterns
in the mean years of education. Thus, quarter of birth, in part, affects your
level of education. This is a graphical illustration of the covariance
between the instrument (z – quarter of birth) and the independent variable (x – years
of education). The next figure shows the covariance between
log earnings and quarter of birth. Here we also find a clear seasonal pattern.
Log earnings partly depends on your quarter of birth.
If the instrumental variable assumption is correct – that the instrumental variable
only affects dependent variables through the independent variables, the reason for quarterly
change in earnings is due to an indirect effect through earnings.
Note that there is no empirical way of verifying the instrumental variable assumption.
It remains an assumption. But if it is true, the ratio between the data
in the two figures yields the causal effect of education on earnings.
We can derive the IV estimator in an alternative way that may be a little more intuitive.
In the first stage, we regress x – years of education, on the instrument – here quarter
of birth. For simplicity think of z as a binary dummy
variable, taking the value one if the respondent is born in the first quarter and zero otherwise.
From this we can obtain the predicted values of x given z.
The virtue of the predicted values of x using z is that the predicted values of x only pertain
the part of the variation in x that is common with z.
Because z is independent from the error term u, by the IV assumption, the predicted value
of x using z is also independent from u. In the second stage, we use the predicted
values of x instead of the observed values of x.
Note that the predicted values of x are also independent of e, again by the IV assumption.