There a billions of different outcomes of the 2018 FIFA World Cup™ and it is hard to get an overview on who is going to play against who in the different stages of the tournament. Most forecasts models only give the most probable outcomes. However, if this forecast is wrong at some stage (this will happen with a probability close to 1!) all other forecasts for later stages are useless. For example, let us consider a model predicting that Germany and Brazil will both win their group. If then one of the two teams do not win their group all games involving group E and F in the knock-out phase become useless. One way out of the dilemma is to forecast all different outcomes of the tournament and assign probabilities to each outcome. This is done in the following Sankey diagram. Please click on the graphic to obtain an interactive java script version. In this version you can move the different “sites” and hover over the “links” to see the corresponding probabilities. In this way you can “zoom in” the parts you’re most interested in. Continue reading “What are the probable outcomes of the whole tournament?”
Football is a typical low-scoring game and games are frequently decided through single events in the game. These events may be extraordinary individual performances, individual errors, injuries, refereeing errors or just lucky coincidences. Moreover, during a tournament there are most of the time teams and players that are in exceptional shape and have a strong influence on the outcome of the tournament. One consequence is that every now and then alleged underdogs win tournaments and reputed favorites drop out already in the group phase.
The above effects are notoriously difficult to forecast. Despite this fact, every team has its strengths and weaknesses (e.g. defense and attack) and most of the results reflect the qualities of the teams. In order to model the random effects and the deterministic drift forecasts should be given in terms of probabilities.
A series of statistical models have been proposed in the literature for the prediction of football outcomes. They can be divided into two broad categories. The first one, the result-based model, models directly the probability of a game outcome (win/draw/loss), while the second one, the score-based model, focusses on the match score. We want to follow the second approach since the match score is important in the group phase of the championship and it also implies a model for the first one. The model proposed in How to impress your football fan colleagues is a first approach but it does only give the most typical outcomes.
The chances are very low (in fact almost zero) that all matches during the world cup end with these results. There are several models for this purpose and most of them involve a Poisson model. In other words, the distribution of the goals of a team is supposed to follow a Poisson distribution. This distribution is determined by one parameter called that describes the expected number of goals.
This post is dedicated to Nina, she knows why.
… with zero-knowledge on football. Before and during the 2018 FIFA World Cup™ all your colleagues, friends or even family member talk about football? Who is going to win? What a surprise that team blabla lost, what do you think? Again such a close match! You do not see any way to avoid it? Impress them using the following flow chart. There are reasonable chances that your forecasts outperform the subjective and pretentious estimates of your colleagues.
The above flowchart is based on the results obtained in What are typical football results? and What are typical football results II?. Using these data you can easily cook up more sophisticated models. For more details on this series of posts see Who wins the 2018 FIFA World Cup™?.
We continue What are typical football results? The notion of weaker and stronger has not been made precise. Is is true that a team that has say 5 Elo points more than another team is really stronger? What might be an appropriate threshold? A glance at the current Elo ranking might give an indication that teams in within 50 points may be considered as equally strong. But is this true? At which threshold the probabilities of win, draw, lose will change?
In this post we continue our investigation of Who wins the 2018 FIFA World Cup™? and take a first look on historical data of FIFA football matches. These are obtained from the site www.eloratings.net using the wayback machine and some copy-and-paste. Unfortunately, our data set obtained in this way is not complete and we did not obtain data on all FIFA matches in this millennium. However, we were able to retrieve all matches of the FIFA World cup 2018 participants plus the matches of Italy, the Netherlands, and Austria. Yes, Italy and the Netherlands are not qualified, but we still are convinced that these two teams are amongst the strongest teams in the world. We added Austria to pay homage to the country where we spent a lot of quality time.
We will try to answer questions like:
- What is the most probable outcome of a game? [->]
- What is the probability to have a win, a draw or a lose? [->]
- What is the probability that the stronger team wins? [->] And with what result? [->]
The answers to these questions can be found following the links after the questions. Detailed answers can be found below.
In just a few weeks the 2018 FIFA World Cup™ starts. People already discuss passionately who is going to win and how the chances for their teams are. Almost everybody has an intuition, opinion, idea, feeling or whatsoever about the performances of the different nations. There might be a consensus among football experts and fans on the top favorites, e.g. Brazil, Germany, Spain, but more debate on possible underdogs. However, most of these predictions rely on subjective opinions and are very hard if not impossible to quantify. An additional difficulty is the complexity of the tournament, with billions of different outcomes, making it very difficult to obtain accurate guesses of the probabilities of certain events.
How can we make reasonable, objective and quantitative estimates of the outcomes? For example, what is the probability that Brazil, Germany or Spain will win the cup? What are the chances that England will make it to the Round of 16? What are the chances that Brazil beats Germany in the semifinals 7:1?
In this and the following posts, we give quantitative answers to all kind of these questions. This post will start with what we can learn by studying previous matches and tournaments. Once we found some appropriate data we will investigate which models are out there to model an event like the FIFA World Cup.
We made our first step in the blogging world analyzing the influence of social media in the presedential election 2016 in Austria. You may consult https://bpw2016blog.wordpress.com/ for several posts that are out of date now. You also find there comments and analysis on the election of the austrian parliament 2017 and the election of the german parliament in 2017. These blogs were mainly done for us in order to improve our understanding of the impact of social media and the power of statistical software. One point was clearly to demonstrate what kind of information that people share are publicly available on the internet. Besides this, the feedback on our blog made it once again crystal clear: the easier the statistics the more successful they can be broadcasted. For instance, the by far most successful posts were about the names of the liker of Angela Merkel, [->], and about the strongest fans of the candiates in the presedential election 2016 in Austria, [->].