Saturday, June 03, 2006

New template

As you can probably tell, I'm fiddling with the template. The old one was, well, old. I'm stripping it down to nothing and building it up from there, so right now it's minimal. There are all sorts of problems with the individual post pages. The colors are screwy. In short, it needs a lot of work.

But I'm done for the weeekend. I'm home with my family and my sister's high school graduation party is tomorrow. So I'll get back to the blog some time this week...

Why aren't there more cricket statistics? Part 4: A re-evaluation of batting average

Batting average in cricket is calcuated by dividing the total number of runs a batsmen has scored by the number of innings they have completed (that is, the number of times that they have been out). On the surface, this seems like an excellent measure of a batsman's ability to score runs. the failing of batting average comes in the simple fact that at the end of a side's innings, one batsmen remains not out (thus contradicting the description of the side as "all out."). The runs scored by the stranded batsman contribute to the numerator of his batting average, but nothing is added to the denominator, thus inflating the batsman's batting average.

An extreme example illustrates this problem. Assume that a tailender comes to the crease 99 times, scores a single run each time, and then remains not out out when his partner is bowled out. The hundredth time this batsman comes to bat, he scores 1 run (as always), then foolishly tries for another, getting run out in the process. What is his batting average?

It's not 1, even though he was remarkably consistent in achieving his single. No, his average is an astounding 100, besting the greatest batsman of all time, Donald Bradman, by 0.06. This is clearly not right.

In the real world, absurdities like this don't come up. But the fact remains that lower-order batsmen that frequently remain not out at the end of an innings have inflated batting averages. Shaun Pollock, for instance, currently averages 31.95 in Test cricket. In 37 of his 147 Test innings, Pollock was not out. Pollock typically comes in at #8, with only the specialist bowlers to follow. As a result, he has a decent chance of finishing his innings not out. If Pollock came in at, say, #5, his number of innings not out would decrease. If, instead of 37 innings not out, Pollock had only 20 innings not out, his average would drop by over 5 runs. To a certain extent, batting average is a function of a batsman's normal place in the batting lineup.

Batting average, therefore, needs a correction to help account for this failing. One possibility is to diregard not-out innings enitrely. This is clearly not right, as it would fail to properly credit higher-order batsmen that achieve long innings and effectively ignore the skill of lower-order batsmen who often remain not out. Including all innings, out and not out, in the denominator has a similar problem. Ignoring the achievement of remaining not out (and treating innings of, say, 40 and 40 not out the same thing) is a mistake.

What is necessary, then, is some way of recognizing the achievement of being not out at the end of an innings while still recognizing that being not out is due, in large part, to one's spot in the batting order. Three possible solutions come to mind:

- Increasing the denominator in batting average by some increment for each not-out innings.
- Adding an adjustment factor to batting average based on a batsman's typical (median? mean?) spot in the batting order.
- Some combination of the first two, increasing the denominator by an increment for each not out innings that is based on the not out batsman's position in the batting lineup.

All of these are plausible manipulations. I suspect that the third approach would be the most fruitful. I envision something like this: a #11 batsman who remains not out has the numerator in his batting average increased by the number of runs scored and the denominator in his batting average increased by some fraction (say 0.5). A #10 batsman who remains out out has the denominator in his batting average increased by a slightly higher fraction (say 0.55). A #1 batsman who remains not out would have his denominator increased by a fraction close to 1 (say 0.9) since he has had the time to score lots of runs. Obviously, extensive statistical analysis is necessary to determine what value of a not out innings from each lineup position.

In any case, the ideal formula for batting average would yield an accurate measure of cricketers' ability to score runs, regardless of their spot in the batting lineup. An accurate formula for batting average would allow direction comparison between two players whose batting position differed significantly.

Next up: The combination of batting average and strike rate into an overall batting statistic.