Thursday, February 18, 2010

In Defense of the Run Batted In

The RBI is the new Emmanuel Goldstein. The RBI is the new Osama bin Laden. The RBI is Lumbergh, the one thing that represents everything that is wrong with the world. Especially in the city of Boston after December 6, 2006 (the day they signed Drew and Lugo), we have heard all about how the RBI is overvalued, the RBI is an antiquated stat, the RBI is not taken into consideration by the front office, and, according to Keith Law the other day, the RBI is "totally useless."

Not true, and the anti-RBI crusade is just the latest example of sabermetricians, both professional and amateur, taking numbers too far and using them irresponsibly. It's not completely unlike the global warming activists, who have their hands on data for the first time so they will blow the data they have completely out of proportion. Now that I have the weekly How Youz Doin Baseball overt conservative comment out of the way, here's why the sabermetrians are missing the mark and are acting irresponsibly by going execution-style on the RBI.

We are always hearing about the "sample size police," and how people are discounting playoff, clutch, and early-season statistics because of the small sample size. Pat and I are both guilty of that, and the sample size police is now like the Gestapo. We are now looking at everything in the aggregate, and it's going too far. The two most glaring examples are the romanticization of OPS and the Pythagorean Win-Loss Record.

It is true that when zooming way, way out over the course of a season or especially several seasons, Pythagorean W-L and OPS are good indicators of (respectively) what the team's won-loss is going to be and what the team's run production is going to be. I'm not going to deny that at all. But despite the pleadings of guys like Billy Beane, Nate Silver, and Law, the MLB standings this year are going to be determined by the number of games a team wins. Go tell Jeremy Shockey, he'll love that. It will not be determined by run differential in the first or second power, so Pythagorean W-L is just a stat that indicates a pattern, not a stat that indicates how many games a team should win.

Have an underwhelming offense and/or a good closer (hello 2008 Angels) means you will outperform your Pythagorean. It's not an indication that you're "lucky." It's an indication that you don't win blowouts as often as most teams who have your record do. Deal with it.

And just because you have the second-highest OPS of all AL outfielders doesn't mean you're a good player. We said on this blog this week that the non-JD Drew players with a high OPS also--surprise!--had a lot of RBIs. As in they go up to the plate and hack, and that's why pitchers walk them. Because they're afraid of them. A high OPS is often an indicator that you're responsible for a lot of run creation. And in most cases, even in Drew's case more often than not, it tells a lot of the story.

OPS is an aggregate (I'll call it "macrosabermetric") stat that is useful across the entire season or longer, and those with high OPSes are oftentimes also responsible for a lot of runs being produced. But once again, it does not tell the whole story. Baseball games are being played nine innings at a time, not 1458 innings at a time. Looking at aggregate stats makes general managers execute stuff like the 2003 "closer by committee" experiment. And while theoretically it makes sense because of macro trends, it does not tell the whole story. That's why a few years after reading the book, I have a big problem with VORP, WARP, and player value calculations (like the ones in JC Bradbury's The Baseball Economist). They are based on macro stats in what can sometimes be a micro game.

Macrosabermetrics fail to tell the whole story. Macrosabermetrics will tell us that if there is a runner on third base and one out, and the batter walks, in aggregate situations, the team will score more runs. However, what if that batter is Drew? What if that on-deck hitter is Jason Varitek or Mike Lowell? Drew's walk probably decreases the chances that the run scores due to the high probability that the next batter will ground into a double play. If Drew goes out there and hacks, he could get a hit (scoring the run), fly out (scoring the run), or even possibly score the run on a weak ground ball on his 281st weak ground ball to the right side. What about a .300 hitter up with a man on second, two outs, and a guy hitting .219 on deck? Walking enhances his OPS, it enhances sabermetricians' adoration of the player, but does not enhance the team's ability to score that run. RBI GUYS GET THAT RUN IN.

I will not deny that this school of thought is counterproductive if you're trying to construct a big inning, because going for one run instead of getting guys on base will result in one-run innings instead of six-run innings when terrible Baltimore relievers give up six walks in a row. But guess what? This Red Sox team is probably not going to put up many big innings because many of their hitters suck. Guys like Youkilis, Pedroia, and even the departed Alex Gonzalez know they need to put the game in their own hands because the other guys can't hit. Walking means the end of the inning is delayed one more batter, and you have to go around the order one more time before you can see the guy who can hit again. Guys like Youkilis, the former Greek God of Walks, is going up there and hacking. This is evident by looking at his --gasp!--RBI totals.

Especially with the Red Sox pitching staff this year. Theoretically, the staff will not give up many runs, so every one run scored counts that much more. This team might win a lot of one-run games and Pythagoras will jump out of his bathtub naked and ruin the World Series. (I'm sorry, was that Archimedes? I don't care.) But once again, you make it to the top of the standings with your wins and losses. Not your Pythagorean wins and losses. And that's why RBIs, while they're certainly not THAT useful of a stat, are important.


Anonymous said...


Excellent post. You and Pat have mentioned many times that stats helpful when looked at together, but overemphasizing/de-emphasizing a particular stat is not productive. Clearly, we've reached that point with sabermetrics.

Speaking to Keith Law's argument about RBI, I think he's right to an extent--it does matter who bats in front of you and where you are in the order. But none of that helps JD Drew at all. Last year he batted behind Youkilis and Bay. Last time I checked, those guys were on base 40% of the time between the two of them. And yet, JD Drew only drove in 68 runs last year? Why is that? Because he's an outlier. He gets on base a lot. Occasionally he hits for pop. But as you've mentioned, with second and third and two out, he'll walk. It helps his OPS, but it doesn't help the team. That's why a guy like Derek Jeter is a hall of famer and JD Drew is a one-time all-star.

Lastly, I'd just like to mention how annoying stat-head's have become. Stats are important, but the attitude that we seem to get from guys like Keith Law and Rob Neyer is one of condescension and dismissiveness. As if people who think that baseball is about hitting the ball, catching the ball, and scoring runs know nothing about anything. When in reality, baseball is about hitting the ball, catching the ball, and scoring runs.

--the Gunn

Anonymous said...


I think a good rule on stats are- Could I talk about this stat with someone while watching a game at a bar without appearing like a giant d-bag?

If the answer is no, then take a hike.


the gm at work said...


You nailed it in your entire comment. The real problem here with the "stat-heads" is the fact that they're like the Harvard student in Good Will Hunting. They just freshly read something, but once they read it, they take it as an exact science and they just tell people with a different opinion they're wrong until they see something else on or The Baseball Cube. OPS has a correlation but not a causation of runs being scored, just as Pythagorean has a correlation with wins and losses.

They have stats, and the three RBI-lovers left in the world have stats. Neither of their stats are perfect. The RBI people might be able to admit that. The sabermetricians will not admit that.

the gm at work said...

It really is a lot like the people who believe in global warming. If they are presented with an opinion different from theirs, even if the different opinions are backed up by data, the global warming believer's data is right and the other guy's data is lying.

Patrick said...

gm -

great post. my general theory on stats, as i've gone into in depth in a few posts this off season, is that they are extremely helpful guide, not a rule. i say extremely helpful because that's what they are. i love stats. i think they are critically important in player evaluation. they just aren't the whole thing. it needs to be balanced with other things, most notably scouting analysis.

the good thing about all of this is, while stat heads may not be coming around just yet, general managers are. for a while it seemed like the pendulum was swung so far towards stats for many front offices because it was new and it had worked for some teams. but as teams learn more it seems like they are going back to a more balanced approach. and that's part of it, learning. all of this stuff is new, and nobody is going to get it right immediately. you've been very critical of theo epstein, gm, and while some of his comments publicly don't lend to him totally coming around, he has directly talked about the importance of a balanced approach. this is a good thing for you, me, gunn, and everyone out there that is tired of stat heads. again, it isn't that we don't like stats. in fact, i would say all three of us love them. we cite them, debate players based on them, and discuss them all the time in this very space. we just understand that they aren't the only thing.

gunn -

this is related to what i just said, but your last paragraph really resonates for me. as important as stats are or may become, they cannot uproot the most basic fundamental concepts of baseball. there are times when i feel like some people are going to generate an argument that it's not about who scores more runs to win a game. that hasn't happened, but i feel like that might be one of the only thresholds we haven't crossed. and as you said, in addition the idea that any non-stat based argument can just be dismissed by stats, or that any argument can be proven or disproved only by stats is just not on point in my opinion. we've gone way too far down that road, and as i said it's great to see so many coming back.