“Two Cultures” One aspect of statistical modeling which can be taken for granted by those with a bit of experience, but may not be immediately obvious to newcomers, is the difference between modeling for explanation and modeling for prediction. When you’re a newbie to modeling you may think that this only has an effect on how you interpret your results and what conclusions you’re aiming to make, but it has a far bigger impact than that, from influencing the way you form the models, to the types of learning algorithms you use, and even how you evaluate their performance.
I’ve tinkered around with Predictaball a bit recently in an effort to increase its accuracy, with the overall goal of beating Paul Merson and Lawro so that I can claim ‘human competitiveness’. I’ve mentioned in previous posts that I envisage 2 potential ways to achieve this.
Include more player data Incorporate bookies odds Adding more player data (such as a variable for each player indicating whether they are in the squad or not) would allow the model to account for situations when a player who is strongly associated with the team winning is now injured - for an example see City’s abysmal record when Kompany isn’t playing.
It’s been a while since I’ve posted anything as I’ve spent my summer in a thesis related haze, which I’m starting to come out of now so expect more frequent updates - particularly as I work my way through the backlog of ideas I’ve been meaning to write about.
I’ll start with assessing Predictaball’s performance last season. Just to summarise, this was a classification task attempting to predict the outcome (W/L/D) of every premier league match from the end of September onwards.
Receiver Operating Characteristics (ROC) are becoming increasingly commonly used in machine learning as they offer a valuable insight into how your model is performing that isn’t captured with just log-loss, facilitating diagnosis of any issues. I won’t go into much detail of what ROC actually is here, as this post is more intended to help navigate people looking for a MAUC Python implementation. If however you are looking for an overview of ROC then I’d recommend Fawcett’s tutorial here.
The majority of my work is involved with machine learning using biologically inspired techniques, focusing on classification problems. I run my algorithms on benchmark datasets to test their validity and the effect of various parameters, and then these are used in real life medical applications. Trials can take a long time to prepare, and the data collection process can be somewhat challenging. The group I’m involved with researchs Neurodegenerative Diseases, particularly Parkinson’s Disease.