Bayesian statistics gives you a posterior distribution. What you do with that di...

mjw · on Aug 31, 2014

This isn't just about a difference in the choice of loss function to optimise. It's a difference in what sort of guarantees you seek about that loss function.

Bayesian analysis seeks an estimator which minimises posterior expected loss, conditioning on the data and with the expectation taken over the parameters under a particular prior.

A frequentist analysis might seek an estimator for which uniform bounds on the worst-case expected loss are available, which hold in expectation over the data, given any value of the parameters.

Both approaches fit into a decision theoretic framework and there are good reasons why you might care about frequentist properties when making decisions. I agree that this isn't only about average case vs worst case -- as you point out it's also about whether you take expectations over data given params or over the params given data, and that's important too. But I think the average case vs worst case aspect of this is an important part of what this is all about and gets to the heart of what the trade-offs are when choosing between these methods.

I disagree that the sampling distribution is "irrelevant for making decisions", that's quite an extreme view which I don't think many applied Bayesian statisticians would take. Frequentist properties are something people often validly care about when deciding on a statistical procedure to use in an experimental design context, i.e. before collecting the data -- and especially if you're choosing an estimator which you intend to use many times for many experiments, even if they're not all exact replicates of each other.

keithwinstein · on Aug 31, 2014

That's a good example to demonstrate one of the major criticisms of the frequentist tools.

But there's no free lunch here. We can flip the example around and produce an example that demonstrates one of the criticisms of the Bayesian tools.

"Suppose there are two villages, Frequentistburg and Bayesianville, harvesting berries grown in a field between them. There are two types of berries: edible and poisonous. Suppose 86% of berries in the field are edible and float in water, 9% are edible and sink in water, 4% are poisonous and float, and 1% are poisonous and sink. Both towns are interested in devising a decision rule where a citizen measures some property of the berry (let's say we look at whether it floats or sinks) and the procedure should let them decide whether they can eat that berry with at least 80% certainty that it is edible.

"In Bayesianville, the town leaders announce this procedure in the newspaper: 'Take the berry out of the wrapper and see if it floats in water. Given this observation, calculate the posterior probability that the berry is edible, and if that number is more than 80%, eat away. If everybody follows this procedure, on average only 20% of our town will get poisoned by their morning berry.' Is this a good decision rule for the town? Not really. In practice, the town's citizens will end up eating ALL berries. (p(edible|floats) = 86 / (86 + 4) = 95.6% and p(edible|sinks) = 9 / (9 + 1) = 90%). The town's faraway enemy subscribes to their newspaper, learns the decision rule, and exploits a vulnerability: they rearrange the berry crops on the field so that the berries closest to the Bayesianville harvesters are all the poisonous crops. The next day, 100% of the citizens will do the experiment, 100% of the citizens will conclude that they have a <= 10% chance of getting poisoned by their morning berry, 100% of the citizens will eat that berry, and 100% of the citizens will get poisoned by it.

"In Frequentistburg, the town leaders announce a different procedure in the newspaper: 'We have devised a hypothesis test to reject the hypothesis that your morning berry is poisonous. If the berry sinks, then with p = 0.2, you can reject the hypothesis that the berry is poisonous. If the berry floats, then with p = 0.8, you can reject the hypothesis that the berry is poisonous.' A citizen who uses a tolerance for false positive (mistaken eating) of alpha=20% will end up eating the berry if and only if it sinks. The Bayesianvillagers regard this behavior as bizarre: it's the floating berries that have a higher posterior probability of being edible! But in this procedure, because of the minimax criterion, there is no similar vulnerability that be exploited by an enemy town -- the procedure will preserve 80% of the citizenry even if all of their morning berries are somehow manipulated to be the poisonous kind. (Of course, the procedure also ends up discarding 90% of the berries.)

Frequentistburg sees all of Bayesianville's citizens get poisoned by a bad harvest and replies to your critique: "BOTH towns are caring about 'things that did not happen.' Here in Frequentistburg, we constructed our hypothesis test by caring about observations (e.g. float/sink) that did not happen. Your citizens in Bayesianville calculated their posterior by doing a weighted average over values of the parameter (e.g. edible/inedible) that did not happen."

Moved by the painful experience, the neighboring towns met for a joint summit in a neutral location and explained their desiderata to each other in terms of the common language of decision theory and then they all lived happily ever after.

(In my first link above, I show the same basic problems using a uniform prior among four options.)

jules · on Aug 31, 2014

All you've shown here is that if you optimize one loss function (average number of people dying), you may do badly on another loss function (maximum number of people dying). Or if you are completely wrong about your prior, then you may do badly too. It's a classic "garbage in, garbage out" situation. This reminds me of the Charles Babbage quote:

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

Furthermore, the frequentist method doesn't do well either, they just aren't eating most of the berries (and the berries that they are eating, are the wrong ones!). If apparently eating berries isn't worth much to you, but dying has a big negative cost, you should give that to the Bayesian loss function, and it too will be conservative about eating berries. I'm very surprised that you seriously consider a method that lets you eat the more poisonous berries, simply because they are rarer, a valid criticism of Bayesianism! If you had given the correct loss function to the Bayesian, he would simply only let 20% of the people eat berries, but that 20% would be eating the mostly edible berries, and not the mostly poisonous berries of course.

Saying that Bayesians also care about things that did not happen because one of edible/inedible is something that did not happen is a bad comparison. It is unknown whether the berry is edible/inedible, so it makes sense that we consider both. On the other hand, it is known that the berry is blue, so why would we care about what if it was red?

kgwgk · on Aug 31, 2014

Where do the rejection rules used in Frequentistburg come from? And what prevents the faraway enemies (from Machine Learning City?) from selecting the poisonous/sinking kind to kill all the Frequentistburg population as they did in Bayesianville?