Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Make a Fair Coin from a Biased Coin (2018) (xarg.org)
95 points by leonry on March 16, 2022 | hide | past | favorite | 91 comments


Von Neumann's solution is clever, but extremely inefficient in terms of how many flips you need per random bit produced.

More interesting methods produce a stream of unbiased random bits out of a stream of flips; and you can look at the asymptotic efficiency.

A simple method to run better than von Neumann's method:

Produce 1,000 coin flips. Say you produce k heads and 1,000-k tails.

Produce a table of all possible sequences of length 1,000 with k heads and 1,000-k tails.

Look up the position of your actual realized sequence in that table. The index of that position is a uniform random integer in a range from 1 to the size of the table.

Use your favourite method to extract unbiased bits from that unbiased integer.

The only requirement to make the above method work is that coin flips are i.i.d., ie independent and identically distributed.

(In practice you probably don't want to actually construct the table, but instead compute the index directly as-if you had constructed the table.

You might also want to work with arbitrary number of flips, instead of a fixed 1,000; or even adopt the method to an arbitrary length stream of coin flips that you don't know in advance.

Also in practice, you don't need to store the whole sequence. What you do is keep a count of how many heads and tails you've seen so far, but feed the individual coin flips into an 'entropy pool'. Your head/tail count will inform your estimate of how many bits of entropy you have in your pool. (It basically feeds into the formula for how many possible orders of arrangement you have, similar to the fixed-size method suggested above.)

You generate your unbiased bits by draining the from the entropy pool.

The method of entropy pools described here is pretty much how /dev/random works on Linux.)


Interestingly, your method is a generalization of Von Neumann's method with more than two coin flips. With TT and HH, the length of the list is one (i.e., no bit of information), and with TH and HT, the length is two (exactly one bit of information).


That's an interesting way to look at it, yes!

I don't claim originality here either; the method is pretty well known.


There is a generalization allowing asymptotically optimal entropy extraction from Markov chains traces that doesn't require transition weights knowledge (like Von Neumann trick doesn't require bias knowledge).

https://ieeexplore.ieee.org/document/1684974

Shameless plug of a paper on related topics: https://link.springer.com/article/10.1007/s00446-017-0311-5


I don't see how this works.

> The index of that position is a uniform random integer in a range from 1 to the size of the table. Use your favourite method to extract unbiased bits from that unbiased integer.

Why is that position unbiased and random? If I'm using a coin that produces heads every single time, then I'm always going to land on table position 0, so therefore the table index is biased.


Well, if your coin always lands heads, your table will only have a single entry. So you won't extract any bits from it.

Von Neumann's method won't extract any randomness from this coin either.

Do keep in mind that, crucially, you only build the table _after_ you figured out how many heads or tails you got in this particular instance of 1000 flips.

The randomness we extract is not in how _many_ heads or tails we got, but how those heads and tails are shuffled in your sequence.

For the math, have a look at https://news.ycombinator.com/item?id=30697559


I misinterpreted your algorithm. So you're creating a table of all permutations given the number of heads and tails you got, and then looking it up in the table.

This does feel like a generalization of Von Neumann's method from 2 tosses to 1000


Yes, you can view it like that. Basically we have 1001 tables, for each amount of heads that are possible.

If you are sufficiently clever, you can generalize it to an indefinite stream of tosses, and produce output as you go along. See some of the papers mentioned elsewhere in the comments.


> Produce a table of all possible sequences of length 1,000 with k heads and 1,000-k tails.

This is more efficient than Von Neumann?


This method generates approximately

  log_2(sqrt(1/(2*pi*p*(1-p))) - 1000*log_2(p^p * (1-p)^(1-p))
bits of entropy from 1000 coin flips, where "p" is the probablity of flipping heads.

Neumann's method generates 1000/(p(1-p)) bits of entropy from 1000 coin flips. The theoretical maximum is

  -1000*log_2(p^p * (1-p)^(1-p))
but it requires knowing "p" exactly. This method is close to the theoretical maximum. I didn't plug in numbers, or analyise further, but it's far better than Neumann's method.

One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.


> One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

Definitely. Though you could fix that problem relatively easily.

I think you might even be able to run von Neumann's method first, but store the coin flips; and then once you've got enough stored, extract a few more bits from the already used flips.

Perhaps like this:

When you do two flips, you add one of three possible tokens to your list:

'double-heads', 'double-tails' or 'mixed'.

Crucially, you only store 'mixed' and not whether it was 'head-tails' or 'tails-heads' because that information was already used to produce the von-Neuman-bit.

After your list has 1000 entries, you run an algorithm a bit like what I originally described to extract bits. The complication is that the table you construct has all possibilities of shuffling a fixed number of 'double-heads', 'double-tails' or 'mixed' tokens.


It feels like an awkward middle ground. Like this method has a limited entropy output for the first 2000 coin flips (first 1000 double-heads/double-tails/mixed entries), and then suddenly it adds back a ton of lost entropy.

An other commenter linked to some papers for asymptotically optimal entropy generation, I wonder if there is more of a streaming method there. It feels like there has to be, even maybe after a slow start. My naive intuition is that after 1000000 coin flips you have a good idea what p is, and then you can basically do arithmetic coding from there. Of course a theoretically correct method can't do exactly this, but it might asymptotically approach it.


Oh, if you want something practical, the approach that Linux's /dev/random takes is probably the one to go.

/dev/random being unbiased relies on some assumptions about cryptography; but in practice these assumptions are at least as well-founded as our assumption that our coin flips are independent.

Look at some of the papers mentioned in other comments on this submission. There are (near) optimal streaming methods.


More efficient if coin flips are your scarce resource, less efficient if computation is scarce. But nowhere near as inefficient as it sounds - you don't actually have to materialize the table.


As with so many algorithms, he's trading compute time for memory and pre-compute time. Generating that table is costly, but once you have that table it's pretty efficient.


You don’t need to store a big table with the table method, though. As long as there’s a pattern to it, you can figure out your position algorithmically.

For example, if H=1 and T=0 then you do something like “any result above 11010 is a real heads, others are tails.”

Edit: never mind, that’s a different method than OP. (Doesn’t require exactly k heads.) Doesn’t it get the best of both worlds though?

Edit2: Double never-mind, the problem assumes you don’t know the level of bias. But I think the OP threw that out as well.


The method I described it a bit weird and round-about, exactly because it has to work around not knowing the bias.

Essentially, it generates a thousand flips, and then carefully extracts the randomness only out of how those flips are shuffled, but now how many heads there are.

However, the closer to 500/500 your distribution is, the more entropy the method I describes can extract. Conversely, if you randomly hit 1000 heads and 0 tails, the method can't extract anything, and you'll have to sample again.

In contrast, a method that knows the bias and can exploit it, can extract the same number of random bits, no matter what you actually flipped. Eg in the special case of a known unbiased coin, you always get 1000 random bits for 1000 flips. Even if those flips happen to be 1000 heads.


But the point remains, your method shouldn’t need a full table cached in advance, right? You should be able to use a formula to compute its “effective index”.


Yes, definitely. I just didn't mention that, because I thought it was obvious that we agree on that.

And instead of computing the "effective index" as an abstract number and then extract bits from that, you can work out directly how to generate the unbiased bits.


Yes. Though in practice you wouldn't want to generate that specific table for 1000 flips. It's far too big.


More efficient in how many flips you consume to produce an unbiased bit.

Less efficient in terms of runtime (if you implement it as naively as I have described here).


> Produce a table of all possible sequences of length 1,000 with k heads and 1,000-k tails. > Look up the position of your actual realized sequence in that table. The index of that position is a uniform random integer in a range from 1 to the size of the table.

I don’t see why it would be. The normal way to produce such a table is as hhh…hh, hhh…ht, hhh…th, … ttt…ht, ttt…th, ttt…tt. If heads is more likely than tails, the index is more likely to be in the lower half than the upper half.

Reading https://blog.cloudflare.com/ensuring-randomness-with-linuxs-..., I also don’t think that’s how Linux’ entropy pool works.


The parent comment was saying that when you fix the number of heads to be k in advance, the set of head positions you get is random. That sounds correct.


> Reading https://blog.cloudflare.com/ensuring-randomness-with-linuxs-..., I also don’t think that’s how Linux’ entropy pool works.

Where does the description in that article differ materially from what I described in the parenthetical remark?

(Of course, it does differ from what I described in the main text of the comment.)


Cool idea! I wonder what you mean concretely by “extract unbiased bits from unbiased integer”?

Let’s say my experiment results in an index of 42, how do you convert that integer into information about a string of unbiased coin flips?



Arithmetic coding is the industrial strength way to do this, yes!

However, you can do much simpler and still be pretty efficient (and unbiased).

Suppose your range was numbers between 1 to 49 (inclusive).

If you get a number between 1 to 32, you interpret that as 5 bits. If you got a number between 33 to 48, you interpret that as 4 bits (because you have 16 == 2^4 possibilities between 33 to 48). If you hit 49, you just drop everything and sample again.

So if you got 42, like j7ake suggested, that would be 42 - 32 = 10 = 0b1010.

You can generalize this scheme to other numbers of possibilities.


despite all the theoretical work, it's not physically possible to bias a coin to any notable degree. Unlike a die you can't weight one of the faces, you can bend the coin but it takes about a 90 degree bend in a coin to be able to say that a coin is biased with statistical confidence. Or in other words, if the coin appears reasonably flat, it's fair enough that you won't notice.

https://izbicki.me/blog/how-to-create-an-unfair-coin-and-pro...

I get the idea as shorthand for a binary outcome, but the way you make a coin fair is to flip it.


How are we defining a notable degree? Even a 1% skew is tremendous - entire casino empires are built on that probability margin.


You could calculate the likelihood over a certain distribution, in your case 1%. But when you do that you find that the third coin has a higher likelihood of being a fair coin than the first coin. That one has a pretty big bend in it. The ones after have a less than 4% chance of being fair. But to the OP's point, seeing that the third coin is more likely to be fair than the flat one (and is past a 45 degree angle) I think their claim makes sense with pretty good bounds.


To be fair, a 1% bias would probably not be detected in a hundred trials. CI for Bernoulli is +/- z * sqrt(p * (1-p) / n), so for a 95% confidence interval smaller than a percentage point, we have

1.96 * sqrt(p * (1-p) / n) = 0.01

0.5^2 / n = (0.01/1.96)^2

n = (1.96/0.01 * 0.5)^2

Or about 9604. So you need roughly 10k trials to determine with 95% confidence a bias of 1%.


Even a fair coin has a 51/49 skew in a normal flip.

https://epubs.siam.org/doi/abs/10.1137/S0036144504446436?jou...

But bending/weighting won't change that.


It depends how you flip it. If you spin a coin on the table, it seems like it should be fair, but small weight differences make a huge difference. If you try with an Australian 20c it's nowhere near 50:50, closer to 70:30 (heavy is the head that wears the crown apparently - from memory its the queen's head side that end up down and the platypus comes out on top.)


And what if you flip it over turf, e.g. at a large stadium during a huge televised sporting event?

The implications on all of it are pretty big.


Like in this CBS news report that caught a coin landing on its edge in a soccer game? https://www.youtube.com/watch?v=Sz3VwRLsfX0

There very well be refs who can take advantage of coin flips over time.


The theoretical possibility of generating unbiased random bits from biased ones has critical practical implications. In computers we need unbiased random bits all the time but often only have biased sources.


Definitely. And the commenter you replied to knows that.

See https://news.ycombinator.com/item?id=30696269 for how you can improve on von Neumann's technique and still stay provably unbiased in your output.


Interesting stuff!

Instead of biasing the coin, consider unrandomizing the flip.

Here’s one of many articles on Perci Diaconis trying just this

https://www.ams.org/publicoutreach/math-history/hap7-fifty-o...


> despite all the theoretical work, it's not physically possible to bias a coin to any notable degree.

> I get the idea as shorthand for a binary outcome, but the way you make a coin fair is to flip it.

That's a case of working from false assumptions that aren't necessarily true in practice.

The way to make a coin flip fair is to control as many variables as possible during the flip and landing - because that is what is done by people who bias the flip. Only one of these variables is the coin itself.

Bias sometimes exists in the apparatus around the coin. After all some historical coins have ferrous metal in them, as the simplest example.

Bias can also be introduced by the people performing the flips. For example, there are people who can flip a coin onto its edge or with an outcome biased towards heads or tails. A flip onto the coin's edge can even be done by an amateur. [1]

After all, even without bias a coin can land on its edge. See this CBS News video from a Soccer game's ref flipping a coin. [2]

[1] https://www.youtube.com/watch?v=M0I-xm7iCBU.

[2] https://www.youtube.com/watch?v=Sz3VwRLsfX0


If the coin is thinner on one side, then it will fall on its edge more often? If the coin has an offset hole in the middle? A triangular coin? With an offset hole in the middle? Include thinner condition on the last three.


It also assumes that the two "coins" are biased in the same way. If the two ostensibly-50/50 random events are not equally biased, von Neumann's algorithm doesn't work.


I believe the steps said to toss the same coin twice.


Yep. The assumption is that two flips are independent and identically distributed but that's a fair one: your flips don't change the coin, it does not have memory, etc.


Yes, flips do change the coin: they change which side is "up" at the moment of release. This could change the outcome of the flip.


Exactly. The input is the issue, despite this method's claim to fix bias in the input. As you point out, this method works as a limited thought experiment but not in the real world.

To make an extreme example, if someone can cause the sequence to contain strings of HHHT, this algorithm would absolutely bias towards returning H. So, the key to overcoming this algorithm is to inject a sequence that, regardless of which modulo 2 positition is A or B in the code, will tend to fall on such an input bias. The point is that people trust code that says it fixes these sorts of things, so it is often easier than not to break the code by controlling the input.

There are other methods used in the real world when ingesting random bit sequences.


Yes. Von Neumann's method is definitely not safe against adversarial input.


Correct, though this is often improperly left out of the threat model.


> Unlike a die you can't weight one of the faces

You can just glue a weight to one side of the faces at an offset.


You and your article are wrong. Place extra metal on one side near and edge and run the experiment again.

All because you can't imagine a way to do it doesn't mean it can't be physically done. This is a lesson kids. Randos on the internet can be wrong. Even with high conviction


A similar cute trick for making a coin with arbitrary bias p out of a fair coin: go through the (possibly infinite) binary expansion of p, flipping once for each digit. If you flip tails on a 1 or heads on a 0, go to the next digit and repeat. If you flip heads on a 1 or tails on a 0, stop and take that as your flip. (And if you get to the end of the expansion, you can think of it as only 0s now, so interpret it as tails)


Intuitively, you're building a random real number in the range [0, 1) bit by bit, and returning 0 or 1 once you're sure that number is above or below threshold p.


That's basically how arithmetic coding works, isn't it?

https://en.wikipedia.org/wiki/Arithmetic_coding


My impression is that arithmetic coding is the inverse. You have N symbols with different known probabilities, and you turn it into a stream of 1/2 probability random bits.

edit: On second thought this is probably a bijection and you can call it "arithmetic coding" in either direction.


Agreed.

As far as I can tell, arithmetic coding needs both a compressor and a decompressor. Otherwise it's relatively useless in eg your fancy video file format.


This doesn't seem quite right, unless I'm misunderstanding. If your bias was 128 (10000000) by your algorithm you would get heads at least 50% of the time (heads on the first flip) and tails at least 25% of the time (heads then tails). I think instead of taking the "wrong" flip result when they don't match, you should treat "getting to the end" as heads and failing before that as tails (or vice versa).


p is a probability in range [0, 1]. So if p = 1/128, the expansion is 0.0000001.

In this case the algorithm says you return tails unless you flip 7 heads in a row (which happens with probability 1/128).


Ah thanks for the explanation. Makes a lot more sense with the fractional expansion.


But this requires you know p.


Well, if you want to go backwards, you kind of have to know p?

Though I can see an interesting problem:

You have a known unbiased coin, and a biased coin with unknown p.

Your task is to create a stream of coin flips with bias p. You want to minimize how often you have to flip the biased coin; and somehow work out a way to make use of the unbiased coin to 'stretch' your sequence of biased coin flips.

I am not sure if this is possible.


That's useful when you want random bits from a pulse noise source, such as radioactive decay. Count pulses over a period of time, and take the last bit of the count. Do this twice. 10 => 1, 01 => 0, 00, 11 => retry.

It seems to be a corollary that you cannot get random bits at a guaranteed rate that way.


Doesn't that assume the distribution at time 1 is also the distribution at time 2, which isn't entirely true since the source is already sightly less radioactive then?


Still works, with a slight variation: instead of taking two measurements, you can take four successive measurements [A,B,C,D] over equal periods of time, and then do the aforementioned procedure with the values A+D and B+C as input.

If the activity is decreasing at a linear rate, then A+D and B+C should have exactly the same distribution. In reality, they will be ever so slightly different because the decay is exponential rather than linear, but over short periods of time (compared to the half-life) the linear approximation is extremely good.



Coin flips aren't fair either, because gravity will change slightly between successive flips.


OK, but if you toss the coin higher, relativistic time effects come into play.


And if you're flipping near the Earth's surface, there will be aerodynamic effects as well. Which is more streamlined: The Queen's side or the Platypus' side?


That's a good point. The effect is small but nonzero.


Original Von Neumann paper, the method is described in the second paragraph:

https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf


Makes me think of some sort of symmetrisation.

So for a 6 sided dice, you roll it 6 times in a row. You discard your 6 rolls unless you get a permutation of {1,2,3,4,5,6}, and in that case pick one of the rolls consistently, eg the first or the last.


You're going to need a lot of rolls though. It'll be quicker to decide e.g. Heads is 123, tails is 456 and use Von Neumann's approach. If you already have a rough estimate of the bias you could even pick different faces for heads/tails to get p as close to 0.5 as possible before you start, again reducing the number of throws.


The point was to make the die roll unbiased, not to convert it into a unbiased coin toss.


Fair point

Though I suspect still quicker to convert to unbiased coin toss and use those as the bits of a number in the desired range than to wait for a permutation of 123456


Especially if eg 5 had an extremely low probability of showing up. Or perhaps even 0% probability.


For certain P, optimizations are possible.

For example, with P=1/3, the general method discards TT (P=1/9) and HH (P=4/9) and repeats, while keeping HT and TH (P=2/9 each). One could optimize this by only discarding TT while returning 0 for TT (P=4/9) and returning 1 for HT and TH (P=2/9 each, together also 4/9).

Generally we just want to partition the event space into three parts such that two are equally likely while the third one is used to "scratch that, just repeat". Keeping HT and TH guarantees that, but if you want to minimize retries, better solutions are possible.


I have a generally-applicable approach, i.e., a coin with any number of sides and any probabilities. Here we go, in the spirit of brute force:

0. Train your wrist, 'cause you'll be flipping this a gazillion times (say just N times)

1. Enumerate all possibilities, i.e., sample space = {H, T}^N

2. Compute the probabilities of each.

3. Partition the sample space into two such that the total probability (sum of probabilities) is equal to 0.5.

4. Call one element of the partition heads and the other tails.

5. Want a "multi-sided-coin"? Go back to step 3 and create a finer partition.


How do you partition the sample space in step 3 without knowing the bias of the coin?

If you squint right, I described a method in https://news.ycombinator.com/item?id=30696269 for how to do just that. But I wonder whether you have a different method in mind?


See also the paper "Streaming Algorithms for Optimal Generation of Random Bits"

https://arxiv.org/pdf/1209.0730.pdf


Ha, I should have read your comment before typing out my home-grown method in https://news.ycombinator.com/item?id=30696269

Thanks for the link!


What if the coin's bias changes from flip to flip, perhaps influenced by its earlier results? Is there a solution for this?


This makes me think – is a coin with a truly random bias from flip to flip any different from a perfectly fair coin?


Depends on what you mean by 'truly random'. A distribution can be skewed, in which case the bias would itself be biased and then clearly different from a perfectly fair coin.

Assuming you mean a uniform distribution, we can calculate this. Been a while since I've done stats, so bear with me...

We can first argue from symmetry that we should expect as many heads as tails. But this doesn't mean that this is necessarily the same distribution as a perfectly fair coin (Bernoulli with p = 0.5).

Anyway, the distributions we have here are a Bernoulli distribution and a uniform distribution

X ~ {0, 1}; 0 with probability 1-q, 1 with probability q

q ~ [0, 1], uniformly

The expected value of X is

E[X] = 0 * p(0) + 1 * p(1) = p(1)

I think we could prove p(1) = E[q], but I'm honestly not sure, and don't know if that holds up in general. But from the above, we know it's 0.5 anyway. Now, we need higher moments:

E[X^n] = 0^n * p(0) + 1^n * p(1) = p(1) = 0.5

The moment generating function of a probability distribution is defined as M_X(t) = E[e^(tX)]. There's a theorem in probability that two distributions with the same MGF are the same. You can see from the above that this has the same moments as a Bernoulli distribution with p = 0.5. Hence, a biased coin with a random uniform bias should be identical to a Bernoulli distribution.

In fact, I believe the above argument works for not just uniform bias, but any random bias that is symmetric about p = 0.5.


Minor application:

In Dungeons and Dragons, a challenge is typically resolved by taking some number that's a difference between difficulty of the challenge and your character's skill, and comparing it to the roll of a twenty sided die. (Higher is better for you.)

Some dungeons and dragons game masters sometimes can't decide how hard to make a challenge. So they roll one die to set a random challenge, and then proceed as normal with a second die roll.

With the same logic as in your comment above, I think you should be able to show that this 'randomized challenged' behaves exactly the same as a challenge of difficulty 10?


There isn't. No matter how clever your cleanup/filter algorithm is, you can always more cleverly biased "coin" that defeats it.

(Assuming the "coin" is a complex device)

That's why proving that a hardware RNG is secure is very difficult.


That depends!

Look at the following setup:

First, you pick a cleanup/filter algorithm.

Second, your adversary gets to study your algorithm and hand you a dastardly clever 'coin'. [0]

Third, your algorithm runs, but has access to a second unbiased source of randomness. [1]

Under those circumstances, your algorithm has a chance!

Basically, you use your extra source of randomness to create a secret key; then use that key to encrypt your adversary's bit-stream from their "coin". If your encryption algorithm has 'ciphertext indistinguishability under chosen-plaintext attack' https://en.wikipedia.org/wiki/Chosen-plaintext_attack this scheme should roughly work, unless your attacker has exponentially more computing power than you do.

[0] To make the challenge fair, assume that our adversary has an obligation to put at least a bit of entropy into their "coin". Eg by having an unrelated third-party pick an arbitrary bit-stream that our adversary has to transmit via their "coin". How the adversary encodes that is up to them; and they can be as wasteful as they like. They just have to be able to decode it, too.

[1] You have to use that source sparingly to make it a challenge.


There are two problems with that:

1) If you expected the filter algorithm to detect and warn on bad/malicious randomness, this doesn't work

2) More importantly, if you have a CSPRNG and a source of randomness as part of your filter you can simply throw away the "coin" because just implemented your own RNG.


I don't expect (1). For (2), yes, it's essentially that; but you add some extra mechanisms around to make use of any entropy sources you can find, even if some of them are tainted.

Basically what /dev/random does in Linux: they still use the potentially tainted 'coins' despite having a CSPRNG.


> Basically what /dev/random does in Linux: they still use the potentially tainted 'coins' despite having a CSPRNG.

Yes, this is well known. Any reseeding of the CSPRNG is good if you steer it in with XOR.


Mostly, but not completely.

If your adversary knows your secret state, they can tailor their to-be-XOR-ed data to steer your state wherever they want to.

The threat model where this is realistic is when your adversary built your hardware random number generator that you use to add entropy to your pool. (And that's assuming that this piece of hardware can see the other entropy sources that go into your pool, but can not directly exfiltrate that information into the outside world.)


Without knowing what sort of influence it has — it’s difficult to construct a general method, though, please check my idea and see if this achieves the purpose.

First you find a gambler, any gambler and ask them to show you the lowest denomination coin they have, and get them to verify for you that it is a fair coin. Now demonstrate your weird coin to them, and when they are amazed by its properties, swap it with them for the fair coin. Done.


I would say that the second coin is unfair but it's not "unbiased" in the statistics sense. It has an expected mean and taking the sample mean of flips will converge to it.


I thought this was about _crypto_ coins and their inherent unfair distribution.

I guess I need to go out more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: