How many times have you seen it written or heard somebody say:
“Wins is a useless way to evaluate a starting pitcher”
We have heard it so many times, that we have started wondering why we still track Wins for pitchers.
There are certainly reasons to believe this. How many times have we seen a pitcher give up less than 3 runs in 8+ innings and not win the game? Or how often do we see a great pitcher on a bad team win 12 games or fewer?
But does this mean Wins is a completely useless statistic? Over time, shouldn’t a a good pitcher win more games than a bad pitcher, regardless of other factors?
To answer this question, we looked at every pitcher over the last four seasons (2006-2009) with at least 600 innings pitched (150ip/season). We then removed anybody that had more than 10% of their appearances in relief. We ended up with a list of 51 pitchers. We tallied up their wins (as a starting pitcher) in those four seasons and compared it to their ERA+*.

What we see is a very clear trend. As a pitcher’s ERA+ goes up (bigger values are better, 100 is average), their win total goes up. Are there exceptions? Of course. Every statistic has exceptions. But even in the face of contradictions, we still see a decent correlation (r-squared = 0.51**).
Of course, a pitcher’s win total will be affected by the number of starts they make. So, instead of wins, let’s see if ERA+ can be used to predict a pitcher’s win percentage, and vice versa.
Now we see an even stronger correlation (r-squared=0.54) indicating that wins is actually a very good indicator of how good a pitcher is. Quite simply, better pitchers win more games.
The problem with Wins as an evaluator of starting pitchers is not that it is useless statistic. It is simply a matter of sample size. In a single game, a win or no win is not a good indicator. Why? Small sample size (n=1). However, ERA, for example, is a per inning stat. So in a single game, a pitcher’s ERA will have 5-9 data points (n>>1). Over the course of a full season, stats like ERA+, FIP and tRA have a sample size of 150-220 for each pitcher.
Can we use Wins to evaluate a pitcher over the course of one season? Maybe. We are talking about 28-33 starts. That is still a small sample size considering the number of factors that are involved. But we can be relatively certain that an 18-game winner is better than a 5-game winner (with similar number of starts). The other variables should be less of a factor in that case. However, when comparing two pitchers with a similar number of wins, those other factors (team defense, scoring, ballpark, etc.) become much more important.
So should we use Wins when voting for the All-Star teams or the Cy young Award? Probably not. Stats like ERA+, FIP and tRA are still better measures of how good a pitcher is (although we have minor quibbles with each). However, that does not mean Wins is a useless category. Over the course of several seasons or even a career we should be able to get a decent idea of how good a starting pitcher has been based on how often they win games.
TOMORROW: We will take a look at the individual pitchers that do deviate from the trend and have either been very lucky or very unlucky. You might be surprised how small the list is.
Notes on the above post are found after the jump
Read the rest of this entry »