Football is a typical low-scoring game and games are frequently decided through single events in the game. These events may be extraordinary individual performances, individual errors, injuries, refereeing errors or just lucky coincidences. Moreover, during a tournament there are most of the time teams and players that are in exceptional shape and have a strong influence on the outcome of the tournament. One consequence is that every now and then alleged underdogs win tournaments and reputed favorites drop out already in the group phase.
The above effects are notoriously difficult to forecast. Despite this fact, every team has its strengths and weaknesses (e.g. defense and attack) and most of the results reflect the qualities of the teams. In order to model the random effects and the deterministic drift forecasts should be given in terms of probabilities.
A series of statistical models have been proposed in the literature for the prediction of football outcomes. They can be divided into two broad categories. The first one, the result-based model, models directly the probability of a game outcome (win/draw/loss), while the second one, the score-based model, focusses on the match score. We want to follow the second approach since the match score is important in the group phase of the championship and it also implies a model for the first one. The model proposed in How to impress your football fan colleagues is a first approach but it does only give the most typical outcomes.
The chances are very low (in fact almost zero) that all matches during the world cup end with these results. There are several models for this purpose and most of them involve a Poisson model. In other words, the distribution of the goals of a team is supposed to follow a Poisson distribution. This distribution is determined by one parameter called that describes the expected number of goals.
The forecast of a match vs. goes now as follows:
- We determine the expected number of goals scored by against and the expected number of goals scored by .
We simulate the number of Goals as a Poisson distribution with parameter and the number of Goals as a Poisson distribution with parameter
We obtain : as a forecast of the match vs. .
We simulate the whole tournament.
Note that the result is the realization of a random variable. That means it takes different values with certain probabilities. A single simulation gives one single result and we lose the information on the probabilities of certain outcomes. One way to get a weighted (or probabilistic) forecast is to simulate the tournament many times, say 100.000 times, and count the number of times a certain result happened. This procedure is known as the Monte Carlo method.
The crucial part is the modeling (step 1). We analyzed several Poisson regression models with different degrees of complexity. We compared them using different kind of quality measures and present here just the best model. More details on this selection process can be found in our preprint.
We use the following model that uses a dependent Poisson regression approach. In fact, we use several Poisson regressions. These are fitted with the data described in What are typical football results? including matches since 1.1.2010.
The Poisson rates and are determined as follows:
- We always assume that has higher Elo score than . This assumption can be justified, since usually the better team dominates the weaker team’s tactics. Moreover the number of goals the stronger team scores has an impact on the number of goals of the weaker team. For example, if team scores goals it is more likely that scores also or goals, because the defense of team lacks in concentration due to the expected victory. If the stronger team scores only goal, it is more likely that scores no or just one goal, since team focusses more on the defense and secures the victory.
We determine the Poisson rate . This is done in several steps.
where and are obtained via a Poisson regression.
Teams of similar Elo scores may have different strengths in attack and defense. To take this effect into account we model the number of goals team receives against a team of Elo score using a Poisson distribution with parameter . The parameter as a function of the Elo rating is given as
where the parameters and are obtained via Poisson regression.
Team shall in average score goals against team , but team shall have goals against. As these two values rarely coincides we model the numbers of goals as a Poisson distribution with parameter
3. We determine the Poisson rate
The number of goals scored by is assumed to depend on the Elo score and additionally on the outcome of . More precisely, is modeled as a Poisson distribution with parameter satisfying
Once again, the parameters are obtained by Poisson regression. Hence,
4. The result of the match versus is simulated by realizing first and then realizing in dependence of the realization of .