software development

rdes: Discrete event simulation in R for estimating transition probabilities from a multi-state model

I’ve just released an R package for estimating transition probabilities from multi-state models onto Github, found at It’s not a package with a large potential audience, so I don’t intend to release it onto CRAN, rather it has a highly specific role that I developed for my own use and thought it could prove useful for someone else. Essentially, it extends the simulation functionality offered by the fantastic flexsurv package for obtaining predicted outcomes from multi-state models.

epitab - Contingency Tables in R

I’ve just released a new package onto CRAN and while it doesn’t perform any complex calculations or fit a statistical niche, it may be one of the most useful everyday libraries I’ll write. In short, epitab provides a framework for building descriptive tables by extending contingency tables with additional functionality. I initially developed it for my work in epidemiology, as I kept coming across situations where I wanted to programmatically generate tables containing various descriptive statistics to facilitate reproducible research, but I could not find any existing software that met my requirements.

Building a 2D No Man's Sky - NASA Space Apps

I’ve never really been much of a hacker, I much prefer to think my projects through entirely and plan them out on pen and paper before starting to write any code. As such I’ve never really had much interest in a hackathon. With a bit of apprehension then I participated in my first one over the weekend. The particular event was NASA Space Apps, where NASA provide lots of data and offer challenges related to modelling certain natural phenomena, providing data visualisation, or prototype hardware tools that fit a particular niche.

An interactive Multi-State Modelling Shiny web app

In the last couple of months I’ve been teaching myself about multi-state survival models for use in an upcoming project. While I found the theoretical concepts relatively straight forward, I started having issues when I began to start implementing the models in software. There are many considerations to be made when building a multi-state model, such as: Convert the data into a suitable long format Deciding whether to use either parametric or semi-parametric models Different subsets of the available covariates can be selected for each of the transition hazards In addition, covariates can be forced to have the same hazard ratio on every transition There’s a choice to be made between clock-forward or clock-reset (semi-Markov models) time-scales The Markov assumption can be further violated by including the state arrival times as part of the transition hazard; this often has theoretical justification The baseline hazards can be kept stratified by transition, or certain ones can be assumed to be proportional Needless to say, actually building a model was very time consuming.

Guide to publishing R packages on CRAN

I recently give a talk at my university’s R User group on how to publish packages to CRAN (slides here). This isn’t an easy topic to distill into a 60 minute slot, and so I had to abandon my original idea of a hands on workshop with examples in favour of a condensed summary of the main challenges in the submission process. This mostly focused on the issue of Namespaces, since this is a rather complex topic to understand if you’re coming from a non-software engineering background, as it doesn’t come up in day-to-day statistical analysis.

An R package for estimating disease prevalence by simulation: rprev

At ECSG (Epidemiology and Cancer Statistics Group), we primarily work with myeloid and lymphoid disease registries. Resulting from our successful collaborative research project - HMRN (Haematological Malignancy Research Network) - we have access to a large observational dataset of haematological malignancies across Yorkshire. From this we can estimate various measures of interest, such as the effect of standard demographic factors (mainly age and sex) on incidence rates, any longitudinal incidence trends, in addition to numerous statistics related to survival, for example noting any clinical or demographic factors associated with a high risk level.

Creating a Peep Show quote bot

After watching Peep Show recently I realised there just isn’t enough of it in my life. The writing is a particular highlight, with brilliant characters (that are somewhat relatable) and each episode is filled with hilarious quotes. If you haven’t seen it before I highly recommended checking it out on Netflix, which has all 8 seasons. Each season is only 6 episodes long so it won’t take long to watch them all.

USB IO: Or how I learned to stop worrying and love Java NIO

This may not be very relevant to many people, but if you’re unsure of when to use the NIO classes, or are having problems interfacing with USB in Java then I hope it helps a little. I was tasked with debugging an annoying error this week, where a position sensor connected to a laptop via USB froze approximately once every 70 attempts. The IO was performed in Java, accessing the USB port as though it were a local file and reading in data byte by byte using a BufferedInputStream.

Premature optimisation is the root of all evil...

I’ve found this saying to be particularly true these last couple of weeks. Having passed my confirmation viva I wanted to spend a bit of time cleaning up my code that has got rather bloated with all the recent additions to it. Since I’ve never been formally taught OO programming I decided to brush up on this as well. I bought the “Gang of Four” classic Design Patterns book and decided to implement as much as I could of it in my program.