Uplift Modelling Wins Hillstrom E-Mail Analytics Challenge
This, it transpires, is an uplift problem.
Kevin published a dataset describing 64,000 etail customers, one third of whom (randomly chosen) had received one marketing email (the "Men's Email"), another (random) third of whom had received a second email (the "Women's Email") with the final third receiving neither email. (Note that the email refers to the email, not the sex of the customer.)
The challenge was to analyse the data to work out which campaign did better, and then (the real challenge) to analyse which customers responded most positively and least positively to the emails. The first of these, of course, is a simple post-campaign reporting analysis. But the second involves (formally or informally) modelling the incremental impact of the two emails on each customer and to allow selection of the best and worst targets. That's what uplift models do.
The challenge was interesting in all sorts of ways, and in my write-up I tried to reflect the detail of the process I actually had to go through to build a set of useful models. This included some false starts, some serious worries about sample size, and some run-ins with unhelpful anti-correlations.
If you're interested, you can read all about it in the resulting submission/white paper, which won the challenge and which is now available from Stochastic Solutions at http://stochasticsolutions.com/etailPaper.html.
I was particularly pleased to make progress with this problem because whereas, in most cases, the problems I've tackled from the financial services and telecoms industries have almost always yielded to the uplift modelling approach, there seem to be some characteristics of retail problems that sometimes make the problems harder to crack. I think I learned some useful lessons tackling this problem, and I've tried to document most of those in the paper.
Thanks to Kevin for organizing the challenge.
Labels: text etail retail uplift