Dissecting Lemire's nearly divisionless random number generator

Reelin · on Oct 5, 2020

I'm a bit confused. Looking at the original paper (https://arxiv.org/pdf/1805.10941.pdf) this doesn't seem to be a RNG itself but rather a method to efficiently transform the output of a RNG onto an arbitrary interval while maintaining a uniform distribution. (This is at odds with the current title.) See in particular "Algorithm 5" which specifies retrieving a random integer on lines 1 and 7. In section 5, the authors specify that they use a linear congruential generator for their experiments.

If my understanding is correct, this is an important distinction because it means you can plug different RNGs into this algorithm depending on your needs (in particular the capabilities of the underlying platform).

Related question: For the ultimate in minimal state, would a CBPRNG such as Philox or Threefry (http://www.thesalmons.org/john/random123/papers/random123sc1...) be safe to use here or would the fact that it can be invoked multiple times for a single call (and thus state sequences might end up overlapping between calls) be likely to introduce subtle statistical issues?

Slightly off topic - from Lemire's paper:

> may not be applicable to specialized processors such as Graphics Processing Units (GPUs) that lack support for the computation of the full multiplication

This statement seems to be at odds with the fact that implementations of Philox are provided for both AMD and Nvidia GPUs but Philox relies on efficient mulhi and mullo being available. I didn't bother to look into it further yet though.

BeeOnRope · on Oct 5, 2020

Your understanding is correct. Lemire's algorithm is not an RNG, but a way to map uniformly random bits (from some other RNG) to a non-power of two[1] range 1..N (or 0..N-1) efficiently on average.

---

[1] Well, it works for power-of-two too, but you certainly wouldn't use this if you wanted a power of two range: you can just extract the required number of bits directly.

hinkley · on Oct 5, 2020

The last time I saw someone talking about LCGs, they warned that some people might be tempted to use them as an RNG but they are rarely fit for that purpose.

For one, they tend to behave more like shuffle than like random. With two dice rolls there is a 1/6 chance of getting the same output twice in a row, and so games lacking a good RNG would truncate the output of an LCG to get reasonable output.

If the seed is PRNG, and regenerated every time, then I suppose that avoids this problem.

vanderZwan · on Oct 5, 2020

Just use PCG or XorShift* (in non-cryptographic contexts), they avoid almost all of the pitfalls while still having almost all of the low-overhead benefits

https://www.pcg-random.org/

hinkley · on Oct 5, 2020

I think it’s important to address the fact that there is a different answer for different contexts. There is no, “just”. It’s always a decision tree.

The comparison table there has some good food for thought, but it doesn’t cover the entire space. There are other CSPRNG classes not listed here, which have been used in various programming languages to generate TLS session keys, for instance.

If I’m creating a computer game for a console, then I probably don’t care too much about most of these (until someone has figured out how to cheat).

vanderZwan · on Oct 5, 2020

Sure, but I was specifically replying to the context where one might consider using LCGs. I can think of few cases where the performance requirements are so tight that one would consider those but PCG or XorShift* is out of the question as an alternative.

> If I’m creating a computer game for a console, then I probably don’t care too much about most of these (until someone has figured out how to cheat).

If there is any place where people care a lot about eking out the last bit of performance (not to mention predictability of pseudorandomness with specific seeds) it's game design

im3w1l · on Oct 5, 2020

Not using a CSPRNG for everything and anything sounds like premature optimization to me. I mean sure, if you need randomness in a very tight loop, make a tradeoff.

kwillets · on Oct 5, 2020

You are correct; it's a transformation from an RNG to a range. I have a version (actually 3 versions) that works with std::random here: https://github.com/KWillets/range_generator . That framework is a bit more explicit about the distinction between RNG's and distributions (eg uniform within a range).

Veedrac · on Oct 5, 2020

Here's a simpler, but more technical, attempt at an explanation.

---

    while (true) {

Treat your initial 64 bit random value as a fixed point fraction in the range from 0 to just under 1.

        u64 fract = random64();

Convert it to a 128 bit type, still treating it as a fraction, and multiply it by another 64-bit nonzero integer, S. The result is a value between 0 and S (excluding S), with the fractional part in the lower 64 bits and the integer part in the upper 64 bits.

        u128 upto_s = (u128)fract * (u128)s;
        u64  fract_part = (u64)upto_s;
        u64  int_part   = (u64)(upto_s >> 64);

Because we multiplied by S, the possible resulting fraction bits increase by S between each sequential random value. This affects the number of possible fractions for each integer part. For example, with S=3, you could pack like

    . X . . X . . X . . X . . X . .

or

    X . . X . . X . . X . . X . . X

This is NOT the case if the number of possible fractions (here 2⁶⁴) is a multiple of S. We can therefore reject some of our possible fractions to make it a multiple of S. This can be done by rejecting the first 2⁶⁴ % s fractional values, and thus only accepting fractional values at least 2⁶⁴ % s.

        # Faster early test to avoid modulo
        # s > (2⁶⁴) % s
        if (fract_part >= s) {
            return int_part;
        }

        # 2⁶⁴ % s == (2⁶⁴ - s) % s == (-s) % s
        u64 min_fract = (-s) % s;
        if (fract_part >= min_fract) {
            return int_part;
        }
    }

javajosh · on Oct 5, 2020

Great explanation. A couple of things. So in C

    -s == 2⁶⁴ - s

? That seems very strange!

Also, there's still a modulo in there, (-s) % s; does the compiler not use a division for that line or something?

BeeOnRope · on Oct 5, 2020

> That seems very strange!

One way of thinking about it is that in arithmetic modulo M you can add or subtract any number of M terms without changing the result.

So x and M + x and 2M + x are all the same.

So 2^64 - s is the same as 2^64 - s - 2^64 = -s.

> Also, there's still a modulo in there

Yes, but it is guarded by the earlier check, so is executed only rarely (very rarely for most moderate range values << 2^64).

IMO this is half of key insight that makes Lemire's approach fast: that the modulo can be avoided most of the time by the coarser check and for wide (e.g., 2^64) input values, this means the modulo might as well not exist, performance-wise.

Someone · on Oct 5, 2020

But the price you pay is having input values that are a lot wider than you need, so you waste random input bits. Generating those can be costly, too.

BeeOnRope · on Oct 5, 2020

Yes, although it doesn't necessarily have to be "a lot".

E.g., if you want random values between 1 and 1,000,000 that takes ~20 bits of entropy, but if you use 25 bits and Lemire's technique you'll be running at almost full speed because of the 5 extra bits (you fall into the slow case about 3% of the time).

TTPrograms · on Oct 5, 2020

That's called "Two's Complement" - here for 64 bit integers.

https://en.wikipedia.org/wiki/Two%27s_complement

Say x = (2^64 - s) in a 64 bit integer. Then (x + s) = 2^64 = 0 (as 2^64 rolls over to 0). So for the purposes of addition etc. you can say -s = 2^64 - s.

BeeOnRope · on Oct 5, 2020

Lemire (and the article) are generally using unsigned values in their examples, so I don't think two's complement matters much here: just the even simpler "non-negative arithmetic modulo M".

david2ndaccount · on Oct 5, 2020

I find Lemire’s original blog post way more edifying than this piece. It also makes clear that his contribution is just in the reduction part (and after reading his blog I applied it to my simulation code and got a nice speed-up).

https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-...

userbinator · on Oct 4, 2020

The average programmer, including someone straight out of college, should be able to understand

Absolutely strongly disagree, especially given what's happened to the "average programmer" over time. The only way you can maintain (and improve) skill is by NOT continually pandering to the dumbing-down crowd.

In other words, what someone might call "more readable" can actually be "cruft that gets in the way" to someone else.

Related: https://www.linusakesson.net/programming/kernighans-lever/in...

generalk · on Oct 5, 2020

I didn't get the context of your quote, and had to go grab the complete quote:

  > The average programmer, including someone straight out of 
  > college, should be able to understand and work on a well 
  > put-together codebase.

I think straight out of college is definitely a little squishy -- someone just out of college may be just out of a theory-heavy CS degree, or a very product-focused CIS degree, or maybe not at all a tech-oriented degree.

But let's focus on average programmer.

If I have a solid developer, with a year or two in their field, and they can't make heads or tails of the codebase put in front of them? That says more about the codebase than it does of the dev reading it.

  > especially given what's happened to the "average 
  > programmer" over time

And what, indeed, has happened to the "average programmer" over time? From my view, more people today are making computers do new things than ever before; from "secretaries" (oof, that term) maintaining excel spreadsheets that run entire small businesses, to bootcamp graduates who can (with a little guidance) ship fully functional MVPs to prove a line of business.

If something's beyond the skillset of a newer developer that's reasonable, but the goal should be to continually make that not the case.

atomicity · on Oct 5, 2020

What happened to the "average programmer" over time? They seem to be building things that make more money now. Isn't that the main reason companies pay for software engineers?

I believe the "decline" is due to the fact that being a software engineer nowadays is more like being a sysadmin or DBA than a programmer of the past. You spend 80% of your time using frameworks and systems that you don't fully understand (or even understand a bit).

imglorp · on Oct 5, 2020

Sussman and Abelson got tired of that decline, calling it "programming by poking", and quit teaching SICP.

I call it glue and duct tape. Everything is integration these days. Mixed blessing of that resusability we were promised and the poking that goes with it.

https://web.archive.org/web/20160504164044/http://www.poster...

Discussion at the time https://news.ycombinator.com/item?id=11628080

genewitch · on Oct 5, 2020

I'm unsure if the complaint about the timing attack makes sense in the uint64 version. I know that you're supposed to use "constant time" algorithms for security, so i'd be interested to hear if my thinking on this is wrong. Does the uint3/uint6 timing attack work with uint64?

I've "greyed" bits coming off of random radio waves from the atmosphere in order to test randomness; the drivers i was using for the particular radio didn't allow on-the-fly amplifier adjustments, so if there was, say, a lightning strike within 50 miles, the front-end would get overloaded and "blank" - send mostly zeros, even with incredibly fast AGC control. So entropy wasn't guaranteed unless i could guarantee there wasn't lightning within some predetermined distance of the receiver (as an aside, i have this capability now) - thus for long (96 hour) runs to gather entropy, i'd have to fiddle the output to ensure uniform distribution.

Now, with all that said, there was a side channel attack; if there was a lightning storm or other sort of high energy event near my receiver, an "attacker" could determine that the algorithm to ensure uniformity was contributing more to the bits than the atmospheric noise. Possibly.

colmmacc · on Oct 5, 2020

OP here. If parameter s is big, then the branches can be taken quite often, but they also happen up to 2^63 of the times, so it's essentially a single bit bias that is measurable via timing (maybe, if you're very lucky). If s is very small and incongruent then there's the timing channel can leak values, but it can only happen every order 1/2^64 attempts.

Of the two I'd be more concerned about the former; biases of that size have led to things being broken before. Technically the latter means the security is less than the 2^64, which isn't very strong for cryptography purposes. But the channel is so hard to measure I wouldn't lose sleep over either, but it is the kind of the team that will get caught in some compliance testing.

kwillets · on Oct 5, 2020

One thought on timing is that, when generating a whole batch of numbers, it makes sense to calculate the modulo exactly once in advance rather than on-demand.

ahh · on Oct 5, 2020

I'm rather proud of Abseil's implementation: https://github.com/abseil/abseil-cpp/blob/887d0eee6bab358472...

As I recall another colleague wrote the original version (I do not remember who.) I bitbashed and optimized it, and added the comments, mainly because I wanted to make sure I understood the reasons it was right.

banachtarski · on Oct 5, 2020

It's uncanny but I literally looked at this an hour ago (prior to seeing this posting on HN).

Thanks for the comments/annotation!

phoe-krk · on Oct 4, 2020

And here I am, after reading the article, wondering about the Lisp version that took the third place. If you are reading this, dear author - please share your code, even if anonymously.

stephc_int13 · on Oct 5, 2020

"Code readability is the most important pillar of code correctness and maintainability."

100% agree.

This is not related only to the importance of code reviews or team work, but directly to the way software is made, that is organic growth over time.

Software is too complex to be designed and built in one step, at least for us mere mortals.

For this reason we need many steps, where software has to be readable, compilable and testable.

The_rationalist · on Oct 4, 2020

Prizes for code readability improvements is a great idea

lilyball · on Oct 4, 2020

The code comment uses variables ‘n’ and ‘s’ without defining them. From context I gather ‘s’ is the upper bound, but what’s ‘n’? And when it says (n * s) numbers contains n values, I’m not sure what that’s supposed to mean either.

shmageggy · on Oct 4, 2020

> Lemire uses variable names such as “s”, “t”, “l” which seemingly don’t correspond to much. In such a short piece of code, this actually isn’t too big an offense. I kept these terms in my own implementation, because I want to be consistent with the paper

lilyball · on Oct 5, 2020

Yes I saw, but they still need definitions. I shouldn’t have to read the paper to know what the comment is trying to explain.

edflsafoiewq · on Oct 4, 2020

> Any contiguous range of (n * s) numbers will contain exactly n values where x % s is 0, n values where x % s is 1, and so on, up to n values where x % s is (s - 1).

n is any number.

lilyball · on Oct 5, 2020

n is any arbitrary number? Oh. Also I just realized I was reading this previously as x and s being fixed values and saying “if x % s is 0, (n * s) has n values”. Hence the confusion.

zeristor · on Oct 5, 2020

Perhaps too much an aside, from the about page of the blog:

...Influencer, Oxford comma enthusiast and Thought Leader Colm MacCárthaigh.

Yikes! One assumes then that he's a Thought leader when it comes to Oxford comma enthusiasts? This is doing my head in.

"Welcome to the very serious blog of Senior Principal Engineer, Keynote presenter, Touring Musician and producer, Cryptography Person, Open Source enthusiast, Philosopher, Influencer, Oxford comma enthusiast and Thought Leader Colm MacCárthaigh."

https://veryseriousblog.com/about

AndrewStephens · on Oct 5, 2020

It is hard to say, but I think that might be a deliberate joke.

zeristor · on Oct 6, 2020

Probably, and I was a bit slow on picking that up...

LandR · on Oct 5, 2020

What does thought leader even mean?

tzs · on Oct 5, 2020

This is a general random number generator, in the sense that it takes an integer N and gives you a random integer in [0, N). It's also a stateless random number generator, except for any state that the underlying [0, 2^64) generator it is built on top of keeps.

This suggests some questions that might be interesting.

1. For a given application, N is often fixed or at most has a small number of predetermined values. For what values of N can you find a hard coded [0, N) generator that is faster and still reasonably simple?

For example when N is a power of 2, then you can just call the underlying generator and then AND with the appropriate mask (possibly after shifting if the low bits of your underlying generator are not suitable).

If N is a power of 2 and not very large, and the low bits are good, you could take your underlying 64-bit number, split it by masking and shifting into multiple numbers in [0, N), return one of them and save the rest in an array. On subsequent calls return the numbers from the array until you are out.

If you were generating random bytes, say, this would cut your calls to the underlying random number generator by 87.5%.

For small N that are not powers of 2 this could still work. If N were 1000, say, you could treat the 64-bit random number as 6 numbers 10-bit number plus 4 leftover bits. If none of those 6 numbers is in [0, 1000), loop until you get a set of 6 that does have at least one in [0, 1000). (This will happen about once every few billion calls). Return the first one that is in [0, 1000), and store any others in an array to return on subsequent calls.

2. Can you write a random number generator generator? You call it with an N, and it returns a random number generator for [0, N) that is at least as fast as the fast general generator from the article?

I've looked at some special cases, but only for small N and small random number generators. Specifically, generating a dN with a dM, and aimed at something a human could do, by having a state diagram that a human could print out and follow as they roll the dice (or flip coins). For example, here is how to roll a d5 if all you have is a d8, and how to roll a d8 if all you have is a d5 [1]. Kind of neat, and great for a human, but wasteful for a computer, and if you try anything big the number of states grows rapidly. Even just rolling a d19 with a d32 has 325 states.

[1] https://imgur.com/a/XuDprvs

Someone · on Oct 5, 2020

In the limit, that’s arithmetic decoding (https://en.wikipedia.org/wiki/Arithmetic_coding).

An intuitively simple way (but likely not the way this is done in real life) to do this is

- treat your stream of random bits as a binary float F in [0,1)

- when you need an integer in [0,n), do: F = nF # F is in [0,n) now r = floor(F) # desired integer in [0,n) F = F - r # restores the invariant that F is in [0,1) return r

Since F is infinitely long, you have to do this lazily. You can cheat by only do the multiplication up to the bits you need now*, and fill in others by random bits later (that’s an error in the multiplication, but doesn’t matter for this algorithm)

sujayakar · on Oct 5, 2020

For Question 1, you can turn division by a known constant into multiplications + bitshifts using a library like `libdivide`[1]. Then, we could use the regular rejection sampling algorithm mentioned in the annotated code [2].

I'd be curious how the naive rejection sampling algorithm's performance compares to Lemire's algorithm's when both are updated to use libdivide...

[1] https://libdivide.com/

[2] https://github.com/colmmacc/s2n/blob/7ad9240c8b9ade0cc3a403a...

hinkley · on Oct 5, 2020

Given how scarce entropy sources tend to be, it always torqued me a bit that if I want to simulate a dice roll I had to grab a whole float worth of data from the entropy source instead of something smaller.

But we don’t generate 1-6 via division, we generate it by multiplying a random number between 0.0 and 1.0 times 6 and add a 1. I’m a bit fuzzy on this division comment. In mathematics division and multiplication are inverse operations, but in CPUs multiply has traditionally been much faster.

yorwba · on Oct 5, 2020

> Given how scarce entropy sources tend to be, it always torqued me a bit that if I want to simulate a dice roll I had to grab a whole float worth of data from the entropy source instead of something smaller.

Grabbing a float worth of data and reducing it to a dice roll is no different from getting the bits for the dice roll directly. Entropy doesn't decrease just by getting bits from an RNG, it decreases by an attacker learning what those bits are.

If you throw away bits without looking at them, an attacker doesn't get to look at them either, so entropy doesn't decrease. Otherwise, running an RNG in a loop would decrease entropy all the way to zero (meaning its values become fully predictable) no matter what you do with the output. But RNGs that are actually used have a finite state space they cycle through. Completing a full (long) cycle returns to the original state and how could entropy decrease between two identical states?

Entropy measures how hard it is for an external observer to predict the next random value. If you run an RNG for a full cycle and tell someone all the values, then they can perfectly predict the next cycle just by looking at their recording (entropy is zero). But if you don't let them know anything, they're none the wiser (entropy remains unchanged).

> we don’t generate 1-6 via division, we generate it by multiplying a random number between 0.0 and 1.0 times 6 and add a 1

You can do that, but you don't get each value with the same probability. E.g. if the floating point type can represent eight values between 0.0 and 1.0, you might think that's enough, since you only need six, but actually

  >>> [1+int((i/8.)*6) for i in range(8)]
  [1, 1, 2, 3, 4, 4, 5, 6]

1 and 4 end up twice as likely as other numbers.

Double precision floating point numbers have enough different possibilities for this effect not to matter much, but if you want to eliminate it completely, you need to detect it, reject the current value and generate a new one. That extra check may be implemented using a division, but can be optimized (thus "nearly divisionless").

hinkley · on Oct 5, 2020

“Entropy source” is a phrase that has a specific meaning.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

PRNGs, even CSPRNGs, are not an entropy source. They’re an algorithm used to deal with the lack of a high throughput entropy source.

If someone is trying to do an analysis attack on /dev/random (and they aren’t nuts) then we are all in some serious trouble.

yorwba · on Oct 5, 2020

Yes, PRNGs are not entropy sources and using a PRNG does not generate new entropy. But using a PRNG also doesn't reduce the entropy it already has. You only lose entropy by transmitting information, and the entropy lost is equal to the information transmitted.

For example, if an adversary with perfect analytical ability learns the outcome of a simulated dice roll you performed using /dev/random, they gain log₂(6) bits of information and you lose the same amount of entropy, irrespective of how many bits you actually read from /dev/random.

hinkley · on Oct 5, 2020

You have made assumptions that I did not, and you're talking past me (at length). I didn't say anything about sticking a PRNG between Lemire and the entropy source.

If I'm generating an RSA key pair I'm going to pull kilobytes of data from the entropy source, not a PRNG, and that data volume will be a fixed multiple of some word size that means that every bit I take has a purpose.

If I need a non-multiple of the word length worth of entropy from the entropy source, no standard library I know of handles that. C, Java, a few others always fetch 32 or 64 bits of data and then throw most of it away, even if the system call might allow finer increments.

If I need 6 bits or 200 bits, I have to stick a CSPRNG in there or throw entropy away. If I do the latter (which people often do by default) I'm robbing bits/second of entropy from something that may really need it, like TLS or especially SSH. I've found myself in testing situations where it was faster to regenerate a VM/reflash a Pi microSD than wait for /dev/random to catch up, due to iterating on a writing a script that uses scp or keygen calls. I haven't seen a production issue in a while, and never that severe (eg, HTTPS throttling), but it's happened to somebody I'm sure.

rzk · on Oct 5, 2020

Link to the post's author's annotated version of the algorithm: https://github.com/colmmacc/s2n/blob/7ad9240c8b9ade0cc3a403a...

MauranKilom · on Oct 4, 2020

After reading the intro of the article, I wanted to see whether I could get the gist of the algorithm just from looking at the first code of it that I come upon. I thus took the first such link from the article (https://arxiv.org/pdf/1805.10941.pdf) while trying to avoid any spoilers. There was no way to unsee the comment at the top of the article, but I don't think it helped much.

I did skim the paper to find which of the listed algorithms is the one in question (Algorithm 5), which inevitable primed me a bit (primarily by having the context of the two other rejection sampling algorithms). I also looked at the pseudo-code instead of the real code, so I don't think this is a fair comparison with whatever colleagues of the author had to face, but I had no chance to make it rigorous anyway.

The algorithm in question, copied straight from the paper (and ASCII-fied):

    ALGORITHM 5: An efficient algorithm to generate unbiased random integers in an interval.
    Require: source of uniformly-distributed random integers in [0, 2^L)
    Require: target interval [0, s) with s in [0,2^L)

    x := random integer in [0, 2^L)
    m := x * s
    l := m % 2^L
    if (l < s) then // Application of the rejection method
      t := (2^L−s) mod s // 2^L mod s = (2^L − s) mod s
      while (l < t) do
        x := random integer in [0, 2^L)
        m := x * s
        l := m mod 2^L
      end while
    end if
    return m / 2^L

Looking only at the part outside the if, I felt the overall idea quite intuitive - it's essentially the equivalent of

    return floor(s * rand(0, 1))

except you're working with L bit wide fixed point numbers.

The remaining question is "when do you have to actually rejection sample". This part is not as obvious.

Thinking about the ranges of the variables involved, it becomes clear that m can take values in [0, s * 2^L) that are steps of s apart, and we want to bucket this space into segments of length 2^L. Some of these segments might contain one more of those possible m values than the others. What the `if` therefore has to do is "shorten" the segments so that each one covers the same number of possible values (at least that is my assumption for now - see below). The dead space in between is then where we have to actually do the "rejection" part of rejection sampling.

This seems to be the core of the idea - trim the fat between the segments, so that they can all end at a 2^L boundary, thereby eliminating the non-power-of-two division for figuring out which segment m fell in.

At this point I'm slightly stumped why the condition is different in the while. We compare with t instead of s, where t < s, but is this required for equiprobability or just for performance reasons?

I am left to wonder whether the results for all "not taking the if branch" rolls are equiprobable.

If they are, why not rejection-sample on the if directly, completely avoiding any non-trivial remainder calculation in the first place?

If they are not equiprobable, that means the results of taking the if branch have to exactly undo the bias of the if-not-taken cases. I assume this is the case, since the goal was to eliminate as many "hard" remainder calculations as possible. On the other hand, it would be very odd for the probabilities involved to work out just right. I mean, if rejection sampling on the if alone has bias, and you only enter the if with a certain probability, and then need to correct that bias while influenced by the conditional probability - that just seems odd.

In other words: Does this algorithm have bias? If yes, it's indeed freaky how the part within the if above compensates that.

    x := random integer in [0, 2^L)
    m := x * s
    l := m % 2^L
    if (l < s) then
      start over
    end if
    return m / 2^L

I'll investigate this more deeply (probably by actually reading the full article and paper), but only after some sleep.

MauranKilom · on Oct 6, 2020

Just to add the realization I had immediately after lying down: l < s will of course be true for exactly one possible m per segment. All segments are shortened by s (or, equivalently, one "hit").

This is therefore indeed just a performance optimization: The real check that is required to make all segments have the same number of possible m inside (and that is rejection-sampled around) is l < t, but we don't even need to compute t if l >= s.

The algorithm would work fine if you completely removed the `if (l < s) then` and `end if` lines, it would just compute a remainder more often. Significantly more often for small s.

The_rationalist · on Oct 4, 2020

I'd like to see more examples of clearly commented out clever algorithms

baggy_trough · on Oct 4, 2020

This is interesting from a pedantic point of view. From an engineering point of view, I'd not look very kindly on the person who checked in a random number algorithm that required 300 lines of comments to understand. There would have to be an awfully good reason to use it, and here there really isn't.

edflsafoiewq · on Oct 4, 2020

I think any algorithm (even the usual one with division) would take hundreds of lines of comments if written in this style. There are thirty lines just to explain -x % x. Myself, I'd rather see

    uint64_t t = -s % s;  // = (2^64 - s) % s = 2^64 % s

tuananh · on Oct 5, 2020

yep, this is a lot faster to digest and still juts as clear

shmageggy · on Oct 4, 2020

Plenty of algorithms require a lot of background knowledge to understand. Should we ban crypto algorithms because they require hundreds of lines of comments explaining elliptic curves? Of course not, we just assume people can read the papers.

detaro · on Oct 4, 2020

> There would have to be an awfully good reason to use it, and here there really isn't.

Crypto primitives generally come with very good reasons why we want to use them (and even among them, understandability at least of some aspects of them is highly valued). The GP comment specifically challenges that this trade-off is not the case here. "Should we ban crypto algorithms" is attacking an argument they haven't made.

ggm · on Oct 4, 2020

Division is expensive and requires out calls in chips without an integrated FPU. Kernels require fast, tractable random numbers early in boot time. The Unix kernel resident rng had a lot of work done on it over the years for a very small fragment of code.

The algorithms come from mathematicians. Maybe, they're doing it for innate satisfaction of things done in the integer number field?

klyrs · on Oct 4, 2020

You'll love my approach: exactly one comment with a reference to the original paper. Maybe with another line explaining that the reference pseudocode and implementation differ in dropping the 2^64

But that approach would fail the readability challenge, I think (hope)

etangent · on Oct 5, 2020

Random number generation powers many quantitative applications that perform expensive Monte Carlo experiments and like. Even minor speed ups here are very interesting IMO.

IncRnd · on Oct 4, 2020

If you are willing to suffer a slight statistical bias, you can generate a floating-point values in [0,1) and multiply the result by the size of the interval.

It is not suggested to use an rng with a known statistical bias, when a random number is required.

monocasa · on Oct 4, 2020

For crypto maybe, but there's plenty of uses for RNGs with statistical biases.if they have other interesting tradeoffs.

Like, all LFSRs are RNGs with distinct statistical biases.

IncRnd · on Oct 4, 2020

Correct.

You wouldn't want to use an rng with a bias in picking cards for a game of poker. Using an LFSR is also a solved probem. You could transform an LFSR using a markov chain to remove some of the properties inherent in the LFSR. All that takes is some table lookups.

monocasa · on Oct 4, 2020

Or just don't transform it and just be ok with the bias, like how pretty much every LFSR gets used in practice.

Enginerrrd · on Oct 4, 2020

Can the bias not be transformed out?

Naively, I'd assume that if you give me a biased RNG with a known distribution, I'd think I could just unroll the distribution back to something uniform.

This needn't even involve division per se. A simple lookup table should suffice.

IncRnd · on Oct 4, 2020

Yes, as you rightly point out, the bias doesn't need to be introduced. This is a well known issue with divisions(modulo) in generating random numbers. This is literally why there is an injunction for people not to create their own crypto.

Of course, the meeaning isn't that people aren't to work on crypto but that people shouldn't create RNGs like this thinking they are better than the prior-art, when they actually regress with already solved probems.

This is the exact different between arc4random() and arc4random_uniform().

  arc4random_uniform() is recommended over constructions like
  arc4random() % upper_bound as it avoids "modulo bias" when
  the upper bound is not a power of two.

Unfortunately when I google for ios random number The very first return says

  We can use the C function rand() for this:
  int i = rand()%10+1;

MauranKilom · on Oct 4, 2020

Curious how you would do the lookup if you have an RNG with probabilities 2/7, 2/7, 3/7 (or the same but with significantly larger denominators).

tape_measure · on Oct 5, 2020

How about this? In rare circumstances, it may loop forever.

  def uneven_random():
      # return 0 with 2/7 probability, 1 with 2/7 probability, and 2 with 3/7 probability
      n = random.random()
      if n < 2.0/7: return 0
      if n < 4.0/7: return 1
      return 2

  def transformed_random():
      # return 0 or 1 with 50% probability
      N = uneven_random()
      while N == 2 : N = uneven_random()
      return N