Serial Correlation in Baseball Pitch Sequences

Via Vinnie Bergl, here is a post which examines pitch sequences in Major League Baseball, looking for serial correlation in the pitch quality, i.e. fastball, changeup, curve, etc. The motivating puzzle is the typical baseball lore that. e.g. the changeup “sets up” the fastball. If that were true then the batter knows he is going to face a fastball next and this reduces the pitcher’s advantage. If the pitcher benefits from being unpredictable then there should be no serial correlation. The linked post gives a cursory look at the data which shows in fact the opposite of the conventional lore: changeups are followed by changeups.

There is a problem however with the simple analysis which groups together all pitch sequences from all pitchers. Not every pitcher throws a changeup. Conditional on the first pitch being a changeup, the probability increases that the next pitch will be a changeup simply because we learn from the first pitch that we are looking at a pitcher who has a changeup in his arsenal. To correct for this the analysis would have to be carried out at the individual level.

Should we expect serial independence? If the game was perfectly stationary, yes. But suppose that after throwing the first curveball the pitcher gets a better feel for the pitch and is temporarily better at throwing a curveball. If pitches were serially independent, then the batter would not update his beliefs about the next pitch, the curveball would have just as much surprise but now slightly more raw effectiveness. That would mean that the pitcher will certainly throw a curveball again.

That’s a contradiction so there cannot be serial independence. To find the new equilibrium we need to remember that as long as the pitcher is randomizing his pitch sequence, he must be indifferent among all pitches he throws with positive probability. So we need to offset the temporary advantage of a curveball this is achieved by the batter looking for a curveball. That can only happen in equilibrium if the pitcher is indeed more likely to throw a curveball.

Thus, positive serial correlation is to be expected. Now this ignores the batter’s temporary advantage in spotting the curveball. It may be that the surprise power of a breaking pitch is reduced when the batter gets an earlier read on the rotation. After seeing the first curveball he may know what to look for next and this may in fact make a subsequent curveball less effective, ceteris paribus. This model would then imply negative serial correlation: other pitches are temporarily more effective than the curveball so the batter should be expecting something else.

That would bring us back to the conventional account. But note that the route to “setting up the fastball” was not that it makes the fastball more effective in absolute terms, but that it makes it more effective in relative terms because the curveball has become temporarily less effective.

The latter hypothesis could be tested by the following comparison. Look at curveballs that end the at bat but not the inning. The next batter will not have had the advantage of seeing the curveball up close but the pitcher still has the advantage of having thrown one. We should see positive serial correlation here, that is the first pitch to the new batter should be more likely (than average) to be a curveball. If in the data we see negative correlation overall but positive correlation in this scenario then it is evidence of the batter-experience effect.

(Update: the Fangraphs blog has re-done the analysis at the individual level and it looks like the positive correlation survives. One might still worry about batter-specific fixed effects. Maybe certain batters are more vulnerable to the junk pitches and so the first junk pitch signals that we are looking at a confrontation with such a batter.)

5 comments

Comments feed for this article

October 19, 2011 at 12:16 pm

David Pinto (@StatsGuru)

I believe the fastball sets up the change up, although some pitchers do pitch backwards. The idea is that you throw a batter two or three fastballs. The batter develops a pattern recognizer for the fastball. You then throw a change up, which has the same arm speed and release point as the fastball, but due to the grip, travels slower. The batter recognizes a fastball and swings early, missing. Since it takes time to unlearn the fastball recognizer, a pitcher can get away with two change ups in a row.

Pitching is all about building a pattern recognizer in the batter, then screwing with it.

October 19, 2011 at 12:56 pm

Matt

I believe there is also inertia in the timing adjustment of the swing which is independent of beliefs about the next pitch. Even when the batter knows a changeup is coming after a fastball, pitchers with a good changeup know the pitch will get hit but rely on the swing being slightly early to get a foul ball.

October 19, 2011 at 1:29 pm

Bryce

The other way in which the game state changes is that there’s now another strike or ball in the count. If a fastball results in a strike more often than a curveball, we should expect different distributions to follow each.

October 20, 2011 at 11:07 am

Axel Anderson

One can drive serial dependence in strategies by assuming stage payoffs in period t depend on realized strategies in period t-1, but one cannot conclude immediately that

If ” after throwing the first curveball the pitcher gets a better feel for the pitch and is temporarily better at throwing a curveball.” then “the pitcher is indeed more likely to throw a curveball” [following a curveball].

Here’s a simple suggestive example:

Click to access CKLearning.pdf

Please excuse the use of tennis as the motivating example and the fact that thise notes are very rough. This was a “proof of concept” example that I created last year and just quickly edited for public consumption for this blog. Lones Smith and I are currently working with a richer model and tennis data.

October 20, 2011 at 2:12 pm

I should have summarized the result in that simple example:

What matters is not whether throwing a curveball on pitch t-1 makes for a better curveball on pitch t, but whether this advantage is larger when the batter is prepared for a curveball or not prepared for a curveball.

In one case you get positive serial correlation, and in the other negative.

The same holds if we assume it is the batter who gets better at hitting a curveball, again it depends on whether the extra boost is bigger when prepping for a curve or not.

Serial Correlation in Baseball Pitch Sequences

Top Posts

Tags

Subscribe via RSS

Jeff’s Twitter Feed

Email Subscription

5 comments

Leave a reply to Matt Cancel reply

Serial Correlation in Baseball Pitch Sequences

talk cheaply

Related

Top Posts

Tags

Subscribe via RSS

Jeff’s Twitter Feed

Email Subscription

5 comments

Leave a reply to Matt Cancel reply