‘Things I learned’ – blogging about the bad things I did while doing public analytics so that you don’t have to do them

A few weeks ago, I got a tweet from someone (which I now can’t find but will add back in if I do) asking about whether making a ‘defensive score’ from various defensive statistics would be a thing that might work.

‘Probably not’ was my answer, given that I spent far too long trying to, fairly arbitrarily, make one.

The above might look familiar to you and that’s because, yes, I (cringingly) swiped the idea from Ted Knutson. At the time, I was – in the public analytics world – basically the only person, to my knowledge, looking at defenders.

That’s because defensive stats are difficult – or more that attackers are easy. My current belief is that public analytics got lucky with the fact that a striker’s job is basically only to shoot, and to do that they basically just need to get in positions to shoot, and that we tend only to care about goals and the people who score them.

Anywhere else on the pitch, so much of the statistical output is dependent on the role the player’s been told to play. Sebastien Chapuis pointed this out to me long, long before I had the sense to take notice of it.

Anyway, the defensive score.

Back in the day, I tried putting together stats into one score. Initially, I was arbitrarily weighting the stats; then began to weight them by how strongly correlated they were with preventing opponents getting shots.

Neither conquered the problem that player role brings into it. I also briefly tried weighting actions by the position on the pitch, using a very basic Expected Goals model to say ‘well, if the opponent took a shot from here, how dangerous would that be’.

All variations rewarded action, and penalised players who were sheltered behind good defences (and who may well have been very good defenders themselves, eg John Terry in a circa 2015 Chelsea). But it was fun, at least.

Mistakes and lessons

So, the most glaring mistake was to arbitrarily weight the statistics, saying ‘well, I think an interception is more important than a block by this much’. Analytics is supposed to be scientific, and this was not scientific.

The second was ignoring the fact that role plays a factor. Part of me still believes that there is some Holy Grail – that, if you put tackles and interceptions and whatever else into a pot and mixed them just right, you’d get the formula. But if I remember my schooling correctly, the Holy Grail is, indeed, a fictional object.

Next. I wouldn’t say that the blatant copy-catting of Ted’s radars was a mistake, although they do make me feel like the annoying wannabe-hero kid in The Incredibles. (Happily, Ted did not tell me I was stupid, I did not grow into an adult who resented him, and did not become an evil supervillain named Syndrome).

They held some value, I think, looking back. Even though I was kind of ignoring the impact of role on what a defender does, I was still grouping stats on the radars in a way that made some types of player apparent.

They were also, more generally, a period of exploration. Not necessarily good, or efficient, exploration, but exploration nonetheless. I’m glad I did it, even if it was a waste of a tremendous amount of hours. If we measure a striker’s efficiency by their expected goal value per shot they take, I’d have been taking shots from 40 yards.

But still…

I may have said, at some point in this piece or in other conversations, “don’t do this, because it doesn’t work” – what I should really say is “I tried this; it didn’t work for me, and here are the reasons why it didn’t”.

If you want to try and combine a bunch of defensive stats into a single score to try and rate centre-backs, you’re free to try, and although I assume that it wouldn’t work, in some ways I encourage it – the exploration, the trying, the getting to know the data. I’m just a believer that learning from your mistakes is important, but learning from other peoples’ makes the process a lot less difficult.

I figured I’d put this down somewhere as I’ve taken most of it offline because it was bad and I didn’t want it to be seen as an example to follow. Hope, one way or another, it’s helped.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s