A second post in 2 days on mixture modelling? No awards for guessing what type of analysis I’ve been preoccupied with recently!
Today’s post provides an ugly hack to fix a bug in the R flexmix package for likelihood-based mixture modelling and provides a cautionary tale about environments. In short, I’ve encountered problems when trying to predict the cluster membership for out-of-sample data using this package, and judging from a couple of posts I found online, I’m not the only one.
I’ve been spending a lot of time over the last week getting Theano working on Windows playing with Dirichlet Processes for clustering binary data using PyMC3. While there is a great tutorial for mixtures of univariate distributions, there isn’t a lot out there for multivariate mixtures, and Bernoulli mixtures in particular.
This notebook shows an example of fitting such a model using PyMC3 and highlights the importance of careful parameterisation as well as demonstrating that variational inference can prove advantageous over standard sampling methods like NUTS for such problems.
eXpected Goals (xG) is a popular method of answering that age old question of which team ‘deserved’ to win a match. It does this by assigning a probability of a goal being scored from every opportunity based upon various metrics, such as the distance from goal, number of defenders nearby, and so on. By comparing a team’s actual standings with those from the output of an xG model we get a retrospective measure of how well a team is doing given their chances.
A new version of multistateutils has been released onto CRAN containing a few new features. I’ll give a quick overview of them here, but have a look at the vignette for more examples.
msprep2 The first is a replacement for the mstate::msprep function that converts data into the long transition-specific format required for fitting multi-state models. msprep requires the input data to be a in a wide format, where each row corresponds to an individual and each possible state has a column for entry time and a status indicator.
Having become interested in football again due to the World Cup, I was thinking about Predictaball and how I never wrapped up the season with a brief review.
It’s been a big season for Predictaball, with the move to an Elo-based system, as well as the launch of a website. However, is the new match forecasting method any good?
Model accuracy Fortunately, to help answer this question, a very generous Twitter user by the name of Alex B has been collecting weekly Premiership match predictions from around 30 models and tracked their progress.
A month ago I mentioned that I’d been using a discrete event simulation for estimating transition probabilities from parametric multi-state models. I’ve now turned this code into a general package containing resources for multi-state modelling, called multistateutils (I know, I’m very imaginative) which may be of interest to other people working with multi-state models in R. The current release is available on CRAN, while the development is still on GitHub.
I’m very happy to announce the first ‘official’ release of version 1.0.0 of rprev, the R package for estimating disease prevalence by simulation. This is useful for epidemiologists who have registry data and want to know disease prevalence from time periods longer than is covered by the registry. I first released it almost exactly two years ago but had always intended to update it with the features in this release.
I’ve just returned from the 2018 Survival Analysis for Junior Researchers conference in Leiden, fresh with inspiration and wishing I was a PhD student again to have the luxury of time and independence to research all the ideas in my head. As with previous years it was extremely well organised with a variety of interesting talks, I particularly enjoyed the sessions on Causal Inference and Dynamic Prediction and hope to incorporate some of what I’ve learnt into my work.
I’ve just released an R package for estimating transition probabilities from multi-state models onto Github, found at https://github.com/stulacy/RDES. It’s not a package with a large potential audience, so I don’t intend to release it onto CRAN, rather it has a highly specific role that I developed for my own use and thought it could prove useful for someone else. Essentially, it extends the simulation functionality offered by the fantastic flexsurv package for obtaining predicted outcomes from multi-state models.
I’ve created a website for Predictaball with team ratings and match predictions for all 4 main European leagues, at thepredictaball.com. It has each team’s current rating and plots showing the change over the course of season, along with match outcome forecasts. Various statistics are also included, such as the biggest upset, worst teams in history, as well as this season’s predictive accuracy. Previously only Premiership match predictions were made available (via Twitter) and so I’m happy that I’ve finally got this website released.