One of the most satisfactory activities in academia is to engage in debate and discussion. Only by subjecting ideas to tough scrutiny can we advance in our knowledge. Thus, I am extremely pleased that Carlos Arrebola, Julia Mauricio and Hector Jimenez have reacted so quickly to my criticism of their recent paper (here) and come back with a thoughtful and forceful rebuttal. I am posting it below. You will see that there are important points of disagreement that will probably require two (or more) follow-up studies in the future. Seems like I need to brush up my econometrics...
Creating reliable econometric models of the CJEU case law:
a response to Sanchez-Graells’ criticisms
a response to Sanchez-Graells’ criticisms
by Carlos Arrebola, Julia
Mauricio and Hector Jimenez
In a
recent study, we used econometric methodology to quantify the degree of
influence of the Advocate General on the Court of Justice. Based
on data collected from 20 years of actions for annulment, we concluded that the
Court is 67% more likely to annul an act if the Advocate General suggests so in
her opinion. In a
post last Tuesday, Sanchez-Graells examined our paper. As
he said, our conclusion is ‘bold [...] and controversial [for its]
implications’, and as such it should be subject to ‘tough scrutiny’. We most
definitely agree on both the importance of our claim and the need to test it
rigorously. As we stated in our paper, if the conclusions are true, the role of
the Advocate General within the Court might need to be reconsidered in order to
secure judicial independence.
However, Sanchez-Graells voiced several
criticisms regarding our econometric model that prevent him from accepting the
validity of our results. We greatly welcome the debate, and appreciate the
comments in his post, although we ultimately disagree. While we acknowledge
that quantitative methodology is not perfect, we argue that our results are a
reliable estimation of the influence of the Advocate General (hereinafter, “AG”)
on the Court. If not in the specific number of 67% increased probability of a
judicial outcome, our results are at least an indication that the influence
relationship is positive, as it is shown by the six different econometric
models estimated in our study. In the spirit of discussion and debate of this
blog, we address Sanchez-Graells’ criticisms along with several other factors
that, in our opinion, should have been taken into account when assessing our
paper’s reliability.
1. The impossibility of using Randomised Controlled Trials
In his post, Sanchez-Graells suggests that we
were too quick to discard the possibility of testing the hypothesis of the
influence of the AG on the Court using Randomised Controlled Trials (“RCTs”).
For a layperson, RCTs are the type of scientific methodology used in many areas
of science to study causality. One of the main examples where RCTs are used is
medicine. In order to prove the validity of a new drug, several groups of
patients with similar features are randomly selected. Normally, one of those
groups would be the control group. The control group would receive a placebo,
instead of the actual drug. In this way, the researchers can easily infer
whether the health outcome is caused only by the drug. If both the group taking
the placebo and the group taking the drug had the same reaction, it would be
clear that some external factor other than the drug had caused it. If, on the
other hand, the group taking the drug and the placebo group reacted differently
(for example, in the case of an illness, if the group taking the drug was the
only one to recover), it could be said with certain confidence that the drug
caused the recovery.
In our paper, we suggested that RCTs are not a
possibility because it would require using the Court of Justice as a
laboratory, experimenting with cases, judges and AGs. Nevertheless,
Sanchez-Graells argued that we should have considered those cases in which the
AG does not participate as our “control group”. This is a misconception about
how RCTs are designed. A vital feature in the design of RCTs is making sure
that the observations that included in the sample are randomly drawn. This is
because, ideally, you would like every observation to be identical, so that the
only factor that affects it is the treatment that you are examining in the
experiment. In the case of medicine-related RCTs, you want patients with the
same characteristics, symptoms, etc., so that whatever happens after taking the
drug can only be traced back to the drug. In our study, we would need the same
case to be repeated several times, with the same legal problem to be solved by
the same judges, having access to the same amount of precedent, lawyers with
the same ability to plead cases, etc. Only having that could we then observe
what would happen if we took the element of the Advocate General out of the
equation. However, cases are never the same. Unlike illnesses, where patients
tend to have the same symptoms, cases are much more complex. Legal problems rarely
have the same surrounding circumstances.
So, if we followed Sanchez-Graells’ suggestion,
we would be ignoring a set of external factors that actually affect the outcome
of a case. We would be wrongly attributing it to the Advocate General’s
intervention, when actually it could be something else. That is, if we had two
cases, one with an AG’s opinion, and one without, in which the Court reached
different results, we could not say that the Advocate General caused that
different result. It could be that the case had different facts, and that is
why the Court decided differently. Or, it could well be that the judges were
presented with different arguments by the parties, and it was the lawyers, and
not the AGs, who persuaded the Court. Furthermore, Sanchez-Graells’ suggestion
is unfeasible because there is a clear bias. As he explained, the cases in
which the CJEU considers that there are not going to be problematic legal
issues, they decide not to have an AG opinion. It means that from the very
beginning of the case they are sensing that it might have an easy or clear
legal solution. In other words, Sanchez-Graells is suggesting that we compare
in our analysis a simple cold, with a more complicated condition, such as
cancer, and that we can thus establish whether radiotherapy has any impact on
health. The outcome to such a query would have a misleading result, because the
colds would have a rate of recovery close to 100%, whether the cancer would be
lower. However, that would not tell us anything about the effectiveness of
radiotherapy. In the same way, if a case deals with unproblematic legal issues,
the opinion of the AG will probably not do much to affect the Court, because
the Court would have come to that conclusion by itself without any external
influence. We cannot simply compare those two scenarios without losing
information. After all, there would not be any “random” selection of groups,
clearly not fulfilling the requirements to conduct a RCT.
For that reason, the only way to approximately
estimate causality is to use regressions, in which you can account for as many
variables as possible that may influence the Court, including the Advocate
General, and including variables that will account for how easy it is to solve
a case or clear a case is. That way we will know the exact magnitude of the
variable AG on the Court.
Once we establish that the most accurate measure
is a regression model accounting for variables that affect the outcome of the
Court, the difficulty arises in deciding which variables to include and how to
code them. It is in this respect that we think Sanchez-Graells raises his most
valid criticism of our study. We acknowledge that our variables are not
perfect. We will never be able to establish causality without a shadow of a
doubt. This is simply because, as we said, we will always miss variables that
affect the case that we will not be able to track, codify and insert in our
database. Taking this to an extreme and absurd example, we will never be able
to verify whether the judge in the deliberating room had a headache and wanted
to go home soon, rushing her decision. However, the fact that we will always
miss variables does not mean that our model cannot be reliable. We still
include a number of important variables that can explain a substantial amount
of what goes on in the courtroom. There are different ways in econometrics to
determine the extent to which a model, albeit missing variables, is an accurate
depiction of reality. For our study, these measures suggest that the model is
indeed reliable. We will come back to this in a moment.
Another aspect of coding variables is, as
Sanchez-Graells comments, the oversimplification. In our study, we used actions
for annulment, where the outcomes of a case can be (i) annulment, (ii) partial
annulment, (iii) dismissal of the case, or (iv) inadmissibility of the case. We
decided to simplify this variable by looking only at whether the Court decided
to annul (in any of its forms) or not. But, the oversimplification is necessary
to make it more reliable, because in order to have a dataset capable of
yielding significant results, we need to have a representative sample. In our
case, we only had data for a very small number of partial annulments. Including
them as a separate variable from total annulment would have only created
“noise” in our model, making the results less significant, statistically
speaking.
Sanchez-Graells especially criticises our
grouping of dismissal and inadmissibility cases together, because he says that
dismissing a case and declaring it inadmissible are very different things.
However, that discussion in his post is unnecessary, because as he himself
notes later on, our results ‘cannot be
interpreted regarding inverse AG recommendations (ie recommendations to
inadmit/dismiss)’. Our results are only relevant for decisions to annul or
partially annul; we do not make any claim about other type of cases, which
Sanchez-Graells also criticises.
However, the fact that we decided to look at the question
in terms of what happens if the AG suggests to annul the act, rather than if
she suggests to dismiss it or declare it inadmissible, does not affect the
reliability of our results. In fact, the only thing that Sanchez-Graells is
postulating is a new hypothesis. He is saying that, in his opinion, we would
have got other results if we had constructed the model differently. That is a
point that we cannot falsify without fiddling for a few more weeks with our
data in the econometrics software. But, we invite people, and we ourselves may
do it in the future, to carry out other studies, with the same or different
data to check that the results are not affected if we look at things in a
different way; by, for example, looking at what happens if the AG suggests
dismissal, or what happens if we gather data from other periods of time.
Nonetheless, the reliability of the results that we presented is a separate
issue.
So, if we have acknowledged that we are not going
to be able to include every variable, and that our data is only a sample, why
are we confident in our results? In the paper we explain it more technically,
but, basically, there are econometric measures that indicate that the model
that we have created is accurate when the estimation that we get from the model
is compared with actual data from reality. That is the reason why we know it is
a fairly reliable model.
3. Final caveat
Whilst reading Sanchez-Graells’ words, we could
not avoid feeling something we felt many times before. Lawyers are more
comfortable sticking to arguing with words.
We feel somehow threatened by this terra incognita called econometrics.
There seems to be a certain reticence to attempting to use mathematics to help
us in our enquiries. It is worth saying that we are not accusing
Sanchez-Graells of not wanting to engage with quantitative methodology. In
fact, we
know that he has used some statistics previously, and
we would not expect a “more economic approach” type of person to disregard this
evidence-based methodology.
We want to end this post with a final note about
quantitative methodology. We want to say that although judicial proceedings and
legal arguments cannot always be equated to numbers, and other methodologies
are extremely valuable to legal research questions, quantitative analysis can
help elucidate complex legal questions. As many other subjects in social
sciences did before us, statistics can become a tool at the service of legal
researchers. In this sense, it is worth reminding the readers that, a few
centuries ago, economics was equally a merely discursive subject, and anyone
who has read the Wealth of Nations can be a witness to that. But, now, economics and mathematics cannot be
separated. Therefore, we would encourage researchers to embrace statistics and
econometrics, and see how they can help with their enquiries. Quantitative
analysis tries to be evidence-based and objective. Therefore, anyone who
believes in the benefits of science will prefer a claim based on quantitative
methodology to a hypothesis made, to follow the words of Sanchez-Graells, on
the basis of ‘anecdotal impression’.