Thursday, December 20, 2007

Why aren't there more cricket statistics? Part 5: An attempt at revising batting average

As you might remember, over a year and a half ago I started a series of posts on cricket statistics (here, here, here, and here. Before the series fell dormant, I discussed the problem with batting average. It's question is pretty straightforward, really. What to do with not-outs? I don't know if anyone's ever done the statistical legwork, but I'd be willing to hazard a guess that ending up not-out is largely a function of position in the batting order. So some sort of adjustment needs to be made to accurately capture what's going on with the not-out innings. I suggested a number of possibilities, including adjusting batting average based on a batsman's typical batting position.

Ananth Narayanan of the new Cricinfo It Figures blog has recently offered up another solution. Narayanan's idea is to extend all not-out innings to their expected conclusion. He calculates this based on a batsman's recent form. So if I a batsman has averaged 30 in his last ten innings, assume that he'll add 30 runs to his not-out score and consider that to be his completed innings score.

This is an intriguing solution, since it focuses on the not-out innings, which is where the problem with batting average arises. But there are a number of problems with it. First, is it really safe to assume, as Narayanan does, that since "Kumar Sangakkara ... has scored 984 runs in his last 10 innings at an innings average of 98.4," a 32 not-out in his next innings can be extended by 98 runs to 130? Not all innings are created equal. It might turn out to be the case that, once Sangakkara reaches the 30s, he normally goes on to score 150. The toughest runs to score, of course, are the first ones. Analyzing a player's typical score after reaching a set number of runs seems a far better approach and would incorporate the fact that well-set players can be practically impossible to dislodge.

Another problem arises with the assumption that recent form is the best predictor of future performance. This is the sort of empirical question that baseball sabermetricians excel at answering. Sadly, I completely lack the statistical chops to even take a stab at it. But plenty of research into the "hot hand" in basketball has shown that recent rates of success do no better at predicting future performance than overall rates of past success. In other words, there's no such thing as the hot hand (a recent paper argued that "feeding the hot hand" is still a good idea, since a player who has made several consecutive shots is probably a good player to begin with).

There's no guarantee that batting innings follow the same pattern, of course. But budding cricket statisticians should take note. This is a key question that, as far as I know, hasn't been answered. Does recent performance accurately predict future performance? Or is "underlying talent" (represented by overall past performance) a better predictor?

So while Narayanan is on the right track, he makes a few assumptions that need further examination. The question remains unsolved: what to do with not-outs?