Football under the hood


Football is a skill-based game where the outcome of a game depends on so many factors it could be regarded as a random variable. Therefore betting without any strategy is not a feasible way to profit from such a highly competitive game. Placing bets in accordance with insights into the competitive scene and good knowledge of the game could be one such strategy. An alternative strategy could be to formulate a mathematical model based on statistics. We have opted to construct a strategy with a mathematical foundation and we will explain our models in this section.

Three different models were created and validated, a ranked based model, an expected goal model, and a model that utilizes biases in the odds.


Before we go into detail, let's take a look at how the models perform. In order to find out we placed fake bets accordingly to the models suggestions the result is shown in the following figure.

We can see that our expected goal model is rather bad an loses all the money within a few weeks. The Elo model does earn money this year but other years it does not and the fluctuations are large compared to the odds bias model. The model we consider the best is therefore the Odds bias model, this because it has small fluctuations and has been earning money since 2013, this is the model that we are using on this web page.

The Elo model

Our Elo model is based on the Elo ranking system, widely used in both chess and football. In short the system assigns every team a score depending on how good they are and awards winning teams by increasing their score. If a team with a high score plays a team with a lower score, the better team i.e. the team with a higher score, is expected to win and will thus not increase their raiting by much if they win. The team with the lower score however, the underdog so to say, will not be expected to win and will be awarded a lot of points in the event that they do. Mathematically this can be controlled by the following equations: $$ ELO_{W,new} = ELO_{W} + 40 \left(1 - \frac{1}{1+10^{\frac{ {ELO}_W-{ELO}_L}{400 }}}\right) $$ $$ ELO_{L,new} =ELO_{W,old} + ELO_{L,old} - ELO_{W,new} $$ where $W$ stands for winning and $L$ for losing.

If we assume that the Elo rating is a good approximation of each teams skill level within the league then the probability ($P$) that the home team will win can be written as: $$ P_{HomeWin} = 1- \frac{1}{10^{\frac{ ELO_{HomeTeam}-ELO_{AwayTeam}}{ 400}}}$$ and for the away team we get $ P_{AwayWin} = 1-P_{HomeWin} $.

Notice that $P_{HomeWin} + P_{AwayWin} =1$ which implies that $P_{Draw} = 0$. This problem occurs as the Elo model was developed for chess, which has a binary outcome. However it is possible to expand the model so it can handle three outcomes, but since things will get a bit messy this will not be covered here, but feel free to download our paper that covers this matter.

Expected goals model

An expected goal model is a model type that tries to predict how many goals each team will score during the encounter and estimate the outcome probabilities based on that. Here is an overview of how our Expected goals model works, for a more complete cover please see our paper.

Before discussing how the model works a motivation of why it's needed is necessary. For example we could simply take the average of how many goals a team has scored over a period of time and use this result to estimate the expected value of the number of goals the team will make. Well if the average of the past year was used, then a good approximation of goals scored by a team would be achieved but this result is not useful to catch fast changes in team performance. To solve this problem a simple solution could be to take the average of a much shorter time period like 3-5 games for example. The model would definitely adapt to trends quickly but the accuracy would unfortunately suffer too much to be of any use. So clearly a trade off between accuracy and trend sensitivity has to be made if the mean is used to predict the outcome. An expected goal model might solve the problem.

The lack of accuracy derives from the fact that the variance of goals can be quite high if we just consider three to five games. However if a different variable that doesn’t vary as much is used to estimate the expected value, the problem could be diminished. The Expected goals model utilizes the “number of shots on goal” instead.

So how do we estimate the expected number of goals? Imagine that the football pitch is divided into a large amount of boxes. The amount of goal attempts from each box is then recorded for a very large set of games played. The number of shots that turned into actual goals can then be divided by the number of shots on the goal for every box, and a probability of scoring a goal from that box can be estimated. This would look something like this.

Instead of looking at the number of goals made let's look at the number of shots on goal made in the 3-5 last games. Let's note from which box the shots were taken and multiply them with their corresponding goal-probabilities and sum it up. Now we reached our goal (no pun intended) and hopefully have a expected number of goals that is both accurate and trend sensitive.

The Odds bias model

According to the Kelly criterion the optimal fraction of ones bankroll $f$ to place on a bet is given by $$f = \frac{pb-1}{b-1}$$ where $p$ is the "true probability" of winning and $b$ is the odds received on the wager. (This formulation differs slightly from what you might be used to since we have the odds in EU format, such that $b > 1$.) Since the odds $b$ will be given by the bookmakers, the only thing that remains is to find a good approximation of $p$ which we call $p^*$.

The bookmakers will attempt to set the odds such that the average game will make them earn money. Imagine that you are a bookmaker, and your experts and models predict that in one match, game outcome $A$ will happen with probability $p_A$. If you then have odds $b_A$ and your clients bet a total of $M_A$ on the game, your expected net gain will be $$ G = M_A - p_AM_Ab_A = M_A(1-p_Ab_A),$$ which comes from that you always "gain" $M_A$, but with probability $p_A$ you have to return $M_Ab_A$ to your clients. The bet will be "fair" if $p_A = \frac{1}{b_A}$, since this makes $G=0$. The bookmakers will then set $b=\frac{1}{p}-m$ for each of the three outcomes in a game, where $m$ is a small margin on which they expect to earn their money.

In the other models we have used ranking and expected goals in order to get $p^*$ but what if we use the odds to approximate $p$? We can do this by using $p=\frac{1}{b}$ and then normalize $p$ such that the probabilities of the three outcomes sum to 1. Directly applying the Kelly criterion to this probability will, due to the margin $m$, have the disappointing result that no bet is expected to be profitable. But before we draw any rash conclusions, lets take a closer look at what these probabilities look like.

Let $p_H$, $p_D$, and $p_A$ be the probabilities for home win, draw, and away win derived from the odds as $\frac{1}{b}$. In the figure above we have plotted $p_H-p_A$ (home team advantage) versus $p_D$ (probability for draw) for the games in Premier Leauge over the last 10 years. The blue dots represent the odds-probabilities, and the blue line is the corresponding trend line. (Think of this as $\frac{1}{b}$). The red line is a trend line for a multinomial logistic regression of the game outcomes. (Think of this line as the true probability $p$). Even though these lines are close to each other they are not identical which implies that $\frac{1}{b}$ is not an perfect estimation of $p$. However, we can measure the difference between the red and blue line for every point which we call $\Delta_b$. If we now form $p^*$ as $$p^* = \frac{1}{b} + \Delta_b$$ we will hopefully end up with an even better $p^*$. While the model we use on this page is slightly more complicated than this, it is this concept that lies behind it.