A second post in 2 days on mixture modelling? No awards for guessing what type of analysis I’ve been preoccupied with recently!
Today’s post provides an ugly hack to fix a bug in the R flexmix package for likelihood-based mixture modelling and provides a cautionary tale about environments. In short, I’ve encountered problems when trying to predict the cluster membership for out-of-sample data using this package, and judging from a couple of posts I found online, I’m not the only one.
I’ve been spending a lot of time over the last week getting Theano working on Windows playing with Dirichlet Processes for clustering binary data using PyMC3. While there is a great tutorial for mixtures of univariate distributions, there isn’t a lot out there for multivariate mixtures, and Bernoulli mixtures in particular.
This notebook shows an example of fitting such a model using PyMC3 and highlights the importance of careful parameterisation as well as demonstrating that variational inference can prove advantageous over standard sampling methods like NUTS for such problems.
I’ve created a website for Predictaball with team ratings and match predictions for all 4 main European leagues, at thepredictaball.com. It has each team’s current rating and plots showing the change over the course of season, along with match outcome forecasts. Various statistics are also included, such as the biggest upset, worst teams in history, as well as this season’s predictive accuracy. Previously only Premiership match predictions were made available (via Twitter) and so I’m happy that I’ve finally got this website released.
This post continues on from the mid-season review of the Elo system and looks at my Bayesian football prediction model, Predictaball, up to and including matchday 20 of the Premier League (29th December). I’ll go over the overall predictive accuracy and compare my model to others, including bookies, expected goals (xG), and a compilation of football models.
Overall accuracy So far, across the top 4 European leagues, there have been 696 matches with 379 (54%) of these outcomes being correctly predicted.
This is going to be the first of 2 posts looking at the mid-season performance of my football prediction and rating system, Predictaball. In this post I’m going to focus on the Elo rating system.
Premier league standings I’ll firstly look at how the teams in the Premiership stand, both in terms of their Elo rating and their accumulated points, as displayed in the table below, ordered by Elo. Over-performing teams, as defined by being at least 3 ranks higher in points than in Elo, are coloured in green, while under-performing teams, the opposite, are highlighted in red.
My football prediction has previously relied upon a Bayesian approach to quantify a team’s skill level, by modelling it as a random intercept in a hierarchical model of the outcome of a match. While this model performed very well (62% accuracy last season), I was never fully satisfied since this measure of skill is an average across the last ten seasons that I had data for, rather than being updated to reflect the time-varying nature of form.
The last post showed that using a fully Bayesian multi-level model of the match outcomes helped Predictaball achieve a 58% overall prediction accuracy on the four European leagues, up 8% from last season. This post will describe the betting system I used to try and profit by identifying value bets in the offered odds.
Betting system Before delving into the profit analysis I’ll firstly quickly summarise the staking model I used since I haven’t mentioned it anywhere before.
And so we come to the end of another season of football, and more importantly, Predictaball! This season has seen several large updates that I was meaning to detail these at the start of the season but life got in the way.
The predictive model is now fully Bayesian I’ve added a betting system that identifies value bets I’ve expanded it to include the 3 other main European leagues: La liga Serie A Bundesliga Rather than detailing these new aspects as well as summarising the season’s performance in one massive blog, I’ll split this into two parts.
I’ve recently expanded my hierarchical Bayesian football (aka soccer) prediction football prediction framework to predict the results of Australian Rules Football (AFL) matches. I have no personal interest in AFL, instead I got involved through an email sent to a statistics mailing list advertising a competition that’s held by Monash University in Melbourne. Sensing an opportunity to quickly adapt my soccer prediction method to AFL results and to compare my technique to others, I decided to get involved.