With Cork doing all these excellent evaluative posts on the team, I figured it was a good time to update one of my favorite pitching statistics.  You won’t find it on Fangraphs, or anywhere else for that matter, because a user over at Draysbay named Matthan developed a way to estimate the percentage of batters that a pitcher should be striking out.  Here is the formula that he came up with:


Where CIStr%= Called Strike%, InZSwStr%= Swinging strikes in the zone, and OZSwStr%=Swing strikes out of the zone

You can see that the biggest weight here is for OZSwStr%, indicating that the best strike that a pitcher can get is to get a batter to whiff out of the strike zone.  The worst thing that can happen is to give up a ball in play.  For those that are curious this regression has an Adjusted R-Squared of 91.4%, extremely strong.  I updated this metric over the off-season for AL starting pitchers with at least 300 expected outs which you can find here.  You can see the AL East breakout, but probably want to click on the link to the Google document with the whole workbook.

Shut that yawn down for a sec, because I’m getting on with it, just wanted to give a little background.  In this iteration I took a look at all AL pitchers with at least 10 innings.  This does not include the Royals, because I couldn’t access the data when I was compiling everything, and it’s the Royals, so who cares?  Here’s a look at just the Rays:

Rank is where each pitcher came in out of 167, eK% is the metric we’re looking at, aK% is the actual K% (SO/PA), and Delta is the difference between the two.  So wow, Joaquin Benoit has the highest eK% in the American League, quite the feat, but on the other side of the coin Lance Cormier has one of the lowest in the league.  The simple average of all these pitchers is 19.3%, so everyone but Ekstrom, Sonny, Niems, and Lance are above-average according to this.  You can use the Delta to see which guys are over-achieving (Dan Wheeler) and which are under-achieving (Wade Davis).  I find it interesting that James Shields leads the starting pitchers, even though it’s generally thought that Garza and Price have better stuff.  For those that have missed it, and as Cork showed capably yesterday, James is having a really, really good year even if he’s not getting the pub.  How good does the Benoit signing look at this point?

I’ll give a link at the end to a Google document that will have everybody since I don’t want to take up too much space here, but let’s take a look at the top and bottom of the list to see who else might be crushing it (or not).  Here’s the Top-19:

Well this pretty much confirms that you’d expect a reliever to have higher strikeout ratios than a starter (outside of Morrow, Hughes, and Weaver).  We see Benoit again and a trio of White Sox led by the sensational Matt Thornton.  J.J. Putz has Mets fans shaking their heads at what could have been and several other familiar names of very, very good pitchers that can sit ‘em down.  Interestingly, Papelbon is grossly under-performing what you would expect.  Let’s take a look at the bottom-20 to see if that also passes the smell test:

Again, we can see a lot of strike throwers that don’t get a ton of whiffs in this group, including a bunch of Orioles (wonder why they’re bullpen is turrible and they’re the worst in the Majors?), and there’s Lance.  It should not be interpreted that these guys are necessarily bad, as Buehrle is pretty good as is Blackburn, but these guys are never going to get a ton of strikeouts.

Now let’s take a look at just the top 30 of just starters:

I’d be interested in any thoughts that the discerning reader might have, but I did want to add one more comment.  Jeff Niemann’s success on the mound is so absolutely fun to watch because he doesn’t seem to have the best stuff out there and he isn’t going to be a huge strikeout guy, but batters look so frustrated against him it’s comical.  Perhaps it’s the way he mixes his pitches, but he never seems to give up solid contact and always seems to get a key pop-up or double play ground ball.

Lastly, as mentioned above, here is a LINK to the Google document that has the entire list as well as the component statistics for those that are curious.



  1. Jim says:

    Good stuff, Jason. Obviously Benoit is nasty, nice to see the stats backing that up.

  2. Steve says:


    Opinions on the Rays ranking? Feel free to comment

  3. Jordi says:

    Great job, however, a few things:

    I’m not a math major, what is an Adjusted R-squared? Is that reliability? Please make things like that a little more clear.

    I would also like to see you make the estimated K% a negative number, so if a pitcher is garnering a positive on-the-field result it comes out as a positive number (i.e Benoit is striking out 5% more than estimated – that’s a good thing).

    Is there anyway to tie the Called strike stat to umpires? .9 is a little high considering most umpires strike zones differ by a good 5%.

    My dang work internet filter won’t let me see the whole chart, but there is only one pitcher outside of the Rays I am curious about and that is Edwin Jackson. I think he was around 5-6 Ks per 9 last year, which given his stuff is ungodly low. Everyone knows he throws 100mph but straight.

    Actually, I care about Zack Grienke, especially with Posnanski saying his stuff is slowing down this year. Soria might not be too bad to find about either.

    Last thing: is there any starter whose Estimated K% goes up during the game? Who is getting nastier as the game goes on?

    • Jason Hanselman says:

      Adjusted R-Squared is a statistical concept that looks at how much of a change in Y is explained by a change in X. In this example, 91.4% of any change in eK% can be attributed to the five variables that we are looking at here. You might want to check out this for more information:

      I’ll put together the numbers for the pitchers you mentioned. It could be an interesting look to see this on an inning-by-inning basis, but I don’t think any of the findings would be statistically significant due to the granularity of the data.

      • Jason Hanselman says:

        Here’s a look at these two:

        Row Labels eK% aK% Delta
        Edwin Jackson 21.3% 20.3% 1.0%
        Zack Greinke 16.7% 18.6% -1.9%

        With components:

        Row Labels CIS % Foul% In Play% InZSwStr% OZSwStr%
        Edwin Jackson 16.3% 15.7% 18.6% 3.3% 7.9%
        Zack Greinke 17.8% 19.7% 19.3% 2.3% 3.8%

        A huge thing to remember here is that Edwin pitches in the NL, which is why I didn’t lump the NL into this report. Facing pitchers and generally weaker batters will make anyone look better than they are.

