The Summer of 2013 in football analytics was dominated by strikers shots performance and their conversion%'s. Colin Trainor and Constantinos Chappas created their Expected goals metric (link), the 11tegen11 was the first to look at expected goals (link) and, more recently, Devin Pleuler has begun tweeting out information on expected chance quality.
Added to those smart pieces of work was information on shot quality and shot location (Colin Trainor) and all in all it was a tremendous leap forward in the investigation of quantifying striker performance. The one question I have with some of this work is repeatability or test-retest reliability over a number of seasons.
Now, I know it's not an entirely fair question as a lot of the data that these new metrics are built with is relatively new. We don't have 4 years of shot location data or shot placement data. Still, I wanted to know anything about the ability of a striker to repeat a previous seasons performance. I have a little historical data on strikers so I thought I would take a look at which aspects of a strikers performance are repeatable and which are not.
These are the metrics I will be focusing on:
- Scoring% (goals/SoT)
- Shooting Accuracy% (SoT/Total Shots)
- Goals Per 90
- Assists Per 90
- Shots Per 90
- Shots On Target Per 90
In short, are players who post a high scoring% in year 1 likely to repeat that performance in year 2. There are some interesting results. I'll start with the metrics that have the strongest year to year correlation.
Shots Per 90. R2=0.435
Shots per 90 has an R2 of 0.435 and although the correlation isn't all that impressive, it's the strongest correlation between year 1 and year 2 of any of our metrics listed in the introduction. If you are going to pick out any aspect of a strikers performance that may be repeatable yer-on-year then shots per 90 should be the metric you use.
Shot On Target Per 90. R2=0.234
Shots on target per 90 is is the second most repeatable aspect of a strikers performance, although the correlation between year 1 and year 2 has dropped sharply. SoT per 90 has a correlation of 0.242, and although that number is far from impressive, it's out of this world compared to some of the correlations you will see shortly.
Goals Per 90. R2=0.048
Outliers have been stripped out.
This chart is simply all over the place, the year-to-year correlation is virtual non-existent. There are obviously outliers here, players who can reproduce goals per 90 year-on-year but those players are mighty rare, even in the Premier League.
Assists Per 90. R2=0.033
Again, I removed the outliers who didn't record an assists in yr 1 or yr 2.
This is a pretty similar chart to the Goals per 90 one featured above. It's a mess, there's little repeatability in comparison to shots or shots on target.
Shooting Accuracy%. R2=0.0165
This is shooting accuracy%/SoT%. It's another scattered set of data of points with virtually zero repeatability. This is the one metric that stunned me a little for i was always under the impression that a striker had some control, some for of skill in getting a certain percentage of his total shots on target. Obviously not.
Finally, we get to scoring% which is goals/shots on target. I removed the extreme scoring% outliers by controlling for a minimum number of shots.
There is zero relationship between a striker managing to convert his shots on target into goals in year 1 to year 2. Scoring% is random, it's true in football and it's true in hockey:
Such a great graph showing how shooting percentage is a crapshoot: pic.twitter.com/RLqacdnfNm— mc79hockey (@mc79hockey) August 18, 2013
There isn't one metric that we use to evaluate strikers that has a particularly high level of repeatability from one year to the next. But if we are to choose any of the metrics to try and predict future performance then shots per90 and shots on target per 90 are clearly the two we should use.
The percentage metrics - shooting accuracy% and scoring% - are crapshoots. We know these metrics are predominantly luck driven and it would be folly to predict the future performance of a striker using either of those metrics. Would controlling for the location of shots, say in box shots only, strengthen the correlation between yr 1 and yr 2 scoring%? Probably, but it'll be a long time before we have a sufficient year on year data to test that theory.
The most repeatable aspects of a strikers performance are his shots and shots on target numbers. All the other metrics we use to evaluate strikers have very little repeatability. Unsustainable performance and regression toward the mean are two huge factors in the lack of year-on-year sustainability in scoring% and shooting accuracy.