In many situations, such reinforcement learning is an essential strategy, allowing people to optimize behavior to fit a constantly changing situation. However, the Israeli scientists discovered that it was a terrible approach in basketball, as learning and performance are “anticorrelated.” In other words, players who have just made a three-point shot are much more likely to take another one, but much less likely to make it:
What is the effect of the change in behaviour on players’ performance? Intuitively, increasing the frequency of attempting a 3pt after made 3pts and decreasing it after missed 3pts makes sense if a made/missed 3pts predicted a higher/lower 3pt percentage on the next 3pt attempt. Surprizingly [sic], our data show that the opposite is true. The 3pt percentage immediately after a made 3pt was 6% lower than after a missed 3pt. Moreover, the difference between 3pt percentages following a streak of made 3pts and a streak of missed 3pts increased with the length of the streak. These results indicate that the outcomes of consecutive 3pts are anticorrelated.
This anticorrelation works in both directions. as players who missed a previous three-pointer were more likely to score on their next attempt. A brick was a blessing in disguise.
The underlying study, showing a “failure of reinforcement learning” is here.
Suppose you just hit a 3-pointer and now you are holding the ball on the next possession. You are an experienced player (they used NBA data), so you know if you are truly on a hot streak or if that last make was just a fluke. The defense doesn’t. What the defense does know is that you just made that last 3-pointer and therefore you are more likely to be on a hot streak and hence more likely than average to make the next 3-pointer if you take it. Likewise, if you had just missed the last one, you are less likely to be on a hot streak, but again only you would know for sure. Even when you are feeling it you might still miss a few.
That means that the defense guards against the three-pointer more when you just made one than when you didn’t. Now, back to you. You are only going to shoot the three pointer again if you are really feeling it. That’s correlated with the success of your last shot, but not perfectly. Thus, the data will show the autocorrelation in your 3-point shooting.
Furthermore, when the defense is defending the three-pointer you are less likely to make it, other things equal. Since the defense is correlated with your last shot, your likelihood of making the 3-pointer is also correlated with your last shot. But inversely this time: if you made the last shot the defense is more aggressive so conditional on truly being on a hot streak and therefore taking the next shot, you are less likely to make it.
(Let me make the comparison perfectly clear: you take the next shot if you know you are hot, but the defense defends it only if you made the last shot. So conditional on taking the next shot you are more likely to make it when the defense is not guarding against it, i.e. when you missed the last one.)
You shoot more often and miss more often conditional on a previous make. Your private information about your make probability coupled with the strategic behavior of the defense removes the paradox. It’s not possible to “arbitrage” away this wedge because whether or not you are “feeling it” is exogenous.