Via Vinnie Bergl, here is a post which examines pitch sequences in Major League Baseball, looking for serial correlation in the pitch quality, i.e. fastball, changeup, curve, etc.  The motivating puzzle is the typical baseball lore that. e.g. the changeup “sets up” the fastball.  If that were true then the batter knows he is going to face a fastball next and this reduces the pitcher’s advantage.  If the pitcher benefits from being unpredictable then there should be no serial correlation.  The linked post gives a cursory look at the data which shows in fact the opposite of the conventional lore:  changeups are followed by changeups.

There is a problem however with the simple analysis which groups together all pitch sequences from all pitchers.  Not every pitcher throws a changeup.  Conditional on the first pitch being a changeup, the probability increases that the next pitch will be a changeup simply because we learn from the first pitch that we are looking at a pitcher who has a changeup in his arsenal.  To correct for this the analysis would have to be carried out at the individual level.

Should we expect serial independence?  If the game was perfectly stationary, yes.  But suppose that after throwing the first curveball the pitcher gets a better feel for the pitch and is temporarily better at throwing a curveball.  If pitches were serially independent, then the batter would not update his beliefs about the next pitch, the curveball would have just as much surprise but now slightly more raw effectiveness.  That would mean that the pitcher will certainly throw a curveball again.

That’s a contradiction so there cannot be serial independence.  To find the new equilibrium we need to remember that as long as the pitcher is randomizing his pitch sequence, he must be indifferent among all pitches he throws with positive probability.  So we need to offset the temporary advantage of a curveball this is achieved by the batter looking for a curveball.  That can only happen in equilibrium if the pitcher is indeed more likely to throw a curveball.

Thus, positive serial correlation is to be expected.  Now this ignores the batter’s temporary advantage in spotting the curveball.  It may be that the surprise power of a breaking pitch is reduced when the batter gets an earlier read on the rotation.  After seeing the first curveball he may know what to look for next and this may in fact make a subsequent curveball less effective, ceteris paribus.  This model would then imply negative serial correlation:  other pitches are temporarily more effective than the curveball so the batter should be expecting something else.

That would bring us back to the conventional account.  But note that the route to “setting up the fastball” was not that it makes the fastball more effective in absolute terms, but that it makes it more effective in relative terms because the curveball has become temporarily less effective.

The latter hypothesis could be tested by the following comparison.  Look at curveballs that end the at bat but not the inning.  The next batter will not have had the advantage of seeing the curveball up close but the pitcher still has the advantage of having thrown one.  We should see positive serial correlation here, that is the first pitch to the new batter should be more likely (than average) to be a curveball.  If in the data we see negative correlation overall but positive correlation in this scenario then it is evidence of the batter-experience effect.

(Update:  the Fangraphs blog has re-done the analysis at the individual level and it looks like the positive correlation survives.  One might still worry about batter-specific fixed effects.  Maybe certain batters are more vulnerable to the junk pitches and so the first junk pitch signals that we are looking at a confrontation with such a batter.)