Football is a typical low-scoring game and games are frequently decided through single events in the game. These events may be extraordinary individual performances, individual errors, injuries, refereeing errors or just lucky coincidences. Moreover, during a tournament there are most of the time teams and players that are in exceptional shape and have a strong influence on the outcome of the tournament. One consequence is that every now and then alleged underdogs win tournaments and reputed favorites drop out already in the group phase.
The above effects are notoriously difficult to forecast. Despite this fact, every team has its strengths and weaknesses (e.g. defense and attack) and most of the results reflect the qualities of the teams. In order to model the random effects and the deterministic drift forecasts should be given in terms of probabilities.
A series of statistical models have been proposed in the literature for the prediction of football outcomes. They can be divided into two broad categories. The first one, the result-based model, models directly the probability of a game outcome (win/draw/loss), while the second one, the score-based model, focusses on the match score. We want to follow the second approach since the match score is important in the group phase of the championship and it also implies a model for the first one. The model proposed in How to impress your football fan colleagues is a first approach but it does only give the most typical outcomes.
The chances are very low (in fact almost zero) that all matches during the world cup end with these results. There are several models for this purpose and most of them involve a Poisson model. In other words, the distribution of the goals of a team is supposed to follow a Poisson distribution. This distribution is determined by one parameter called that describes the expected number of goals.
The forecast of a match vs.
goes now as follows:
- We determine the expected number of goals
scored by
against
and the expected number of goals
scored by
.
-
We simulate the number of Goals
as a Poisson distribution with parameter
and the number of Goals
as a Poisson distribution with parameter
-
We obtain
:
as a forecast of the match
vs.
.
-
We simulate the whole tournament.
Note that the result is the realization of a random variable. That means it takes different values with certain probabilities. A single simulation gives one single result and we lose the information on the probabilities of certain outcomes. One way to get a weighted (or probabilistic) forecast is to simulate the tournament many times, say 100.000 times, and count the number of times a certain result happened. This procedure is known as the Monte Carlo method.
The Model
The crucial part is the modeling (step 1). We analyzed several Poisson regression models with different degrees of complexity. We compared them using different kind of quality measures and present here just the best model. More details on this selection process can be found in our preprint.
We use the following model that uses a dependent Poisson regression approach. In fact, we use several Poisson regressions. These are fitted with the data described in What are typical football results? including matches since 1.1.2010.
The Poisson rates and
are determined as follows:
- We always assume that
has higher Elo score than
. This assumption can be justified, since usually the better team dominates the weaker team’s tactics. Moreover the number of goals the stronger team scores has an impact on the number of goals of the weaker team. For example, if team
scores
goals it is more likely that
scores also
or
goals, because the defense of team
lacks in concentration due to the expected victory. If the stronger team
scores only
goal, it is more likely that
scores no or just one goal, since team
focusses more on the defense and secures the victory.
-
We determine the Poisson rate
. This is done in several steps.
-
We determine how many goals
scores against an opponent
. The corresponding parameter
as a function of the Elo rating
of the opponent
is given as
(1)
where
and
are obtained via a Poisson regression.
-
Teams of similar Elo scores may have different strengths in attack and defense. To take this effect into account we model the number of goals team
receives against a team of Elo score
using a Poisson distribution with parameter
. The parameter
as a function of the Elo rating
is given as
(2)
where the parameters
and
are obtained via Poisson regression.
-
Team
shall in average score
goals against team
, but team
shall have
goals against. As these two values rarely coincides we model the numbers of goals
as a Poisson distribution with parameter
3. We determine the Poisson rate
The number of goals scored by
is assumed to depend on the Elo score
and additionally on the outcome of
. More precisely,
is modeled as a Poisson distribution with parameter
satisfying
(3)
Once again, the parameters are obtained by Poisson regression. Hence,
4. The result of the match versus
is simulated by realizing
first and then realizing
in dependence of the realization of
.
21 thoughts on “A more sophisticated forecast model”
Comments are closed.