Recent blog articles

More Posts

While I’ve been quite happy with the performance of my Predictaball football rating system, one thing that that’s bothered me since its inception last summer is the reliance on hard-coded parameters. Similar to many other football rating methods, it’s an adaptation of the Elo system that was designed for Chess matches by Arpad Elo in the 1950s. His aim was to devise an easily implementable system to rate competitors in a 2-person zero-sum game.

CONTINUE READING

I came across a tweet from Piers Morgan this morning in which he suggested that the BBC is favouring women since 43 out of the 53 paper reviewers on The Andrew Marr Show in 2019 were women. Unfortunately I was a day late to this hot-take, fortunately this is because I don’t follow Piers Morgan. However, I knew that there must be more to it than a single PC-baiting statistic and knowing that I had a ~3 hour train journey coming up this evening I thought I’d look into it a bit more.

CONTINUE READING

Back in March I rewrote thepredictaball.com from its original R Shiny implementation into a static website using the Vue Javascript framework. I intended to write about it at the time but I’ve been busy and hadn’t made time for it until now, which is handy given that the football season has just finished! Excuse the clickbait title, but I genuinely couldn’t think of a better way of organising this post.

CONTINUE READING

A second post in 2 days on mixture modelling? No awards for guessing what type of analysis I’ve been preoccupied with recently! Today’s post provides an ugly hack to fix a bug in the R flexmix package for likelihood-based mixture modelling and provides a cautionary tale about environments. In short, I’ve encountered problems when trying to predict the cluster membership for out-of-sample data using this package, and judging from a couple of posts I found online, I’m not the only one.

CONTINUE READING

I’ve been spending a lot of time over the last week getting Theano working on Windows playing with Dirichlet Processes for clustering binary data using PyMC3. While there is a great tutorial for mixtures of univariate distributions, there isn’t a lot out there for multivariate mixtures, and Bernoulli mixtures in particular. This notebook shows an example of fitting such a model using PyMC3 and highlights the importance of careful parameterisation as well as demonstrating that variational inference can prove advantageous over standard sampling methods like NUTS for such problems.

CONTINUE READING

Recent publications

More Publications

This talk discussed on an application of multi-state modelling to predict treatment pathways of a disease with heterogeneous disease management options, often involving multiple lines of active treatment.
Survival Analysis for Junior Researchers 2018, 2018.

Despite having notable advantages over established machine learning methods for time series analysis, reservoir computing methods, such as echo state networks (ESNs), have yet to be widely used for practical data mining applications. In this paper, we address this deficit with a case study that demonstrates how ESNs can be trained to predict disease labels when stimulated with movement data. Since there has been relatively little prior research into using ESNs for classification, we also consider a number of different approaches for realising input–output mappings. Our results show that ESNs can carry out effective classification and are competitive with existing approaches that have significantly longer training times, in addition to performing similarly with models employing conventional feature extraction strategies that require expert domain knowledge. This suggests that ESNs may prove beneficial in situations where predictive models must be trained rapidly and without the benefit of domain knowledge, for example on high-dimensional data produced by wearable medical technologies. This application area is emphasized with a case study of Parkinson’s disease patients who have been recorded by wearable sensors while performing basic movement tasks.
Artificial Intelligence in Medicine, 2018.

This work presented an interactive web application for building multi-state models of disease pathways. The app is flexible, allowing for both parametric and semi-parametric models, with transition-specific distributions. The presentation won the award for Best Presentation.
Survival Analysis for Junior Researchers 2017, 2017.

A survey of possible ways to evaluate survival models that are intended for prognostic, rather than inferential aims. The work was demonstrated on a clinically motivated data set of Follicular Lymphoma. This presentation won the Best in Session Award.
Survival Analysis for Junior Researchers 2016, 2016.

Ensembles are groups of classifiers which cooperate in order to reach a decision. Conventionally, the members of an ensemble are trained sequentially, and typically independently, and are not brought together until the final stages of ensemble generation. In this paper, we discuss the potential benefits of training classifiers together, so that they learn to interact at an early stage of their development. As a potential mechanism for achieving this, we consider the biological concept of mutualism, whereby cooperation emerges over the course of biological evolution. We also discuss potential mechanisms for implementing this approach within an evolutionary algorithm context
Information and Processing in Cells and Tissues, 2015.

Projects

A Multi-State Modelling web app

A web app built in Shiny to visualise Multi-State Models

Predictaball

A machine learning sports prediction bot

rprev

An R package for estimating disease prevalence from registry data