This post continues on from the mid-season review of the Elo system and looks at my Bayesian football prediction model, Predictaball, up to and including matchday 20 of the Premier League (29th December). I’ll go over the overall predictive accuracy and compare my model to others, including bookies, expected goals (xG), and a compilation of football models.
So far, across the top 4 European leagues, there have been 696 matches with 379 (54%) of these outcomes being correctly predicted. While this figure is immediately interpretable to us, accuracy as an evaluation of a statistical model has a number of drawbacks, the chief one being that it doesn’t take the predicted probabilities into account.
A proper scoring measure for ordinal data (i.e. takes probabilities and the order of the 3 outcomes into account) is the Ranked Probability Score (RPS), introduced by Constantinou & Fenton, 2012. Predictaball’s current RPS across all 4 leagues is 0.193, which may be directly less interpretable than accuracy (essentially a lower value is better) but provides more information about the forecasting ability of the model.
The RPS and accuracies for Predictaball are displayed in the table below, along with those for William Hill as a comparison. It shows several trends, firstly, that there is a certain amount of variation between the leagues’ predictability, which I’ve identified before (here and here). La Liga and the Bundesliga are a lot less easy to forecast than the Premier League and Serie A, both by Predictaball and William Hill (and have lower accuracies too). However, this highlights another interesting result, that RPS and accuracy aren’t necessarily well correlated. The RPS for the Bundesliga is very similar to that of La Liga for both models, yet its accuracies are far lower than La Liga’s.
Finally, it shows that my model is less accurate than that used by William Hill. However, I’d expect that to be the case, given that my model is a very simple model that only uses the team’s Elo rating (which is solely based on match results), rather than including any finer detail, such as xG, injuries, and other factors, which I’d fully expect to be included in a model used by William Hill in order to maintain their edge over punters who are starting to use their own models to influence their betting (such as Predictaball).
|League||Predictaball||William Hill||Predictaball||William Hill|
Comparison with 35 other models
A user on Twitter by the name of Alex B has very kindly collated the results of 35 Premier League forecasting models and keeps a running total. See his page here where he discusses the setup, and find the current standings here. Also note his fantastic cartoonifed drawing of my profile photo!
As of matchday 20, Predictaball is tenth out of 35 models with an RPS of 0.1775 (note this is different to the value above as Alex starting collecting predictions from matchday 9). I’m very happy with this result, as again I’m using a very straight forward model that is solely based on match results and doesn’t include any of the metrics that are now widely available, such as expected goals (xG). I also don’t include any player-level information. In future, and I’ll discuss this at the end of the post, I’d definitely like to tweak my model to improve a few steps up the ladder.
In the Elo mid-season review I posted a table comparing a team’s current standing to their Elo ranking in order to identify over and under-performing teams. In this post I’ll do something similar with a slightly different technique: I’ll calculate the expected points for each team by simulating the expected outcome from Predictaball’s forecasts for each game (I used a thousand simulations). I’ll then compare these expected points to those obtained from an xG model that Simon Gleave has developed and displayed on Twitter. I believe the expected points based from this model are not pre-match forecasts (prospective estimates), but rather retrospectively assigning 3 points each game to the team that had the higheset expected goals score.
The resultant table is below and shows a lot of information! Where the expected points differs from the actual by at least 3, the value is coloured in either green (team has more points than expected, team is over-performing), or red (team has fewer points than predicted, team is under-performing).
NB: It is important to note that the two models here are not directly comparable: an over-performing team as highlighted by Predictaball is one that is winning games against teams that are higher rated, while an over-performing team in terms of xG is one that is winning games despite having poorer scoring opportunities during the match.
Both models have Spurs identified as under-achievers, with them having 9 fewer points than the xG model predicted and 4 fewer than Predictaball estimated. The interpretation here is that Predictaball rates Spurs highly, but they are not obtaining the expected results. However, the difference in 9 points with the xG model indicates that they are playing well in these games and generating good scoring opportunities but they are not being converted.
Southampton, West Brom, and Crystal Palace are other teams marked down as under-achievers by both models, with Crystal Palace having a massive 13 fewer points than expected by xG. Burnley are well-identified as over-performing by both models, having 9 and 12 points more than expected by both models. The teams that both models have predicted most accurately are Chelsea, Watford, and regrettably for their fans, Newcastle, all of whom had their points total correctly predicted to within 1 point.
I’ve got several ways in which I plan to update the Predictaball system in 2018. I firstly want to look at different methods of generating the outcome probabilities, which I believe could be done simply within the Elo framework. Another idea is to fit a model based on the Elo ratings directly optimising the RPM, as the current Bayesian method models a multi-nomial outcome and thus doesn’t take the ordering into account. Probabilistic modelling methods, such as Edward, are starting to gain traction in the machine learning community and could be used for this purpose. They combine the probabilistic computing approaches of software such as BUGS, JAGS, and STAN, with modern deep learning software, such as Tensor Flow and PyTorch, and thereby allow a probabilistic model with an arbitrary cost function.
I’ve been saying for years how I’d like to incorporate player-level information, but I’m now rather happy with this rating method. It is conceptually simple, provides an alternative league ranking, easily interpretable, and doesn’t require much effort in obtaining the information to forecast each match. And most importantly it is rather accurate; my current model is the most accurate of the 3 incarnations I’ve used, and the Euro Club Index that is currently doing very well in Alex’s standings is a very similar rating method to mine.
One way in which I would like to adapt my model, however, is to predict the score. This would provide a closed loop between my Elo rating (that incorporates margin of victory), and my forecasting model, allowing me to forecast whole seasons in advance. It would also open up access to an increased range of bets, such as both teams to score. I’ll write about this if I do ever get around to developing it.
And finally, I intend to update the infrastructure behind Predictaball this year, to provide a web-app with both the current ratings and the match predictions freely available. I’d then like to broaden the scope to include more leagues and more sports. Watch this space…