Let’s say I want to know how many students in my class are cheating on exams. Maybe I’d like to know who the individual cheaters are, maybe I don’t but let’s say that the only way I can find out the number of cheaters is to ask the students themselves to report whether or not they cheated.  I have a problem because no matter how hard I try to convince them otherwise, they will assume that a confession will get them in trouble.

Since I cannot persuade them of my incentives, instead I need to convince them that it would be impossible for me to use their confession as evidence against them even if I wanted to.  But these two requirements are contradictory:

  1. The students tell the truth.
  2. A confession is not proof of their guilt.

So I have to abandon one of them.  That’s when you notice that I don’t really need every student to tell the truth.  Since I just want the aggregate cheating rate, I can live with false responses as long as I can use the response data to infer the underlying cheating rate.  If the students randomize whether they tell me the truth or lie, then a confession is not proof that they cheated.  And if I know the probabilities with which they tell the truth or lie, then with a large sample I can infer the aggregate cheating rate.

That’s a trick I learned about from this article.  (Glengarry glide: John Chilton.)  The article describes a survey designed to find out how many South African farmers illegally poached leopards.  The farmers were given a six-sided die and told to privately roll the die before responding to the question.  They were instructed that if the die came up a 1 they should say yes that they killed leopards.  If it came up a 6 they should say that they did not.  And if a 2-5 appears they should tell the truth.

A farmer who rolls a 2-5 can safely tell the researcher that he killed leopards because his confession is indistinguishable from a case in which he rolled a 1 and was just following instructions.  It is statistical evidence against him at worst, probably not admissible in court.  And assuming the farmers followed instructions, those who killed leopards will say so with probability 5/6 and those who did not will say so with probability 1/6.  In a large sample, the fraction of confessions will be a weighted average of those two numbers with the weights telling you the desired aggregate statistic.