Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cracking random number generators (xoroshiro128+) (lemire.me)
109 points by johndcook on Aug 22, 2017 | hide | past | favorite | 56 comments


Author's title should be "Cracking PSEUDO-random number generators" - We should all basically assume that any PRNG will be easily cracked like this and not use them for anything important to security!

Always use a cryptographic RNG for important code!


This shouldn't have been downvoted because it is exactly correct.

Cryptographic generators don't work like PCG and xoroshiro and Mersenne Twister. They're generally built by taking a cryptographically secure cipher or hash core, "keying" it with secret entropy, and running it in a streaming configuration (like CTR mode).

A properly designed CSPRNG can only be "cracked" in a few specific scenarios:

1. The primitive it's built on (or the streaming construction it's configured in) is broken, in which case the news for cryptography as a field is significantly bigger than the fact that an RNG has a flaw.

2. An attacker has exploited a systems flaw to directly disclose the contents of the memory the CSPRNG is operating out of, in which case you have bigger problems than your CSPRNG.

3. The secrets that key the generator have become predictable. This is in practice the only way CSPRNGs get broken (unintentionally), and, in practice, always means the CSPRNG wasn't initialized properly (the "cold start entropy problem").

You should use the getrandom() system call, or read from /dev/urandom, to the exclusion of all other mechanisms.


I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.

A CSPRNG is surely a type of PRNG. Is that not right?

Their comment doesn't really seem correct to me. The title is "Cracking random number generators (xoroshiro128+)" which seems pretty accurate to me. The article definitely doesn't seem to say it's breaking anything other than a very specific, flawed random number generator.


I'd have added "Cryptographically secure" and not capitalized "pseudo", but that's small-stakes stuff. The point he's making is the most important safety point on this topic.


Duly noted!

I'm not in this field, but I know enough to know what not to do (most of the time).


>I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.

PRNGs produce numbers that seem hard to predict. CSPRNGs product numbers that actually are hard to predict, assuming P != NP (kind of).


> A CSPRNG is surely a type of PRNG. Is that not right?

Yes. In the same way the POTUS limousine is a car


FWIW you rarely hear the term CSPRNG in crypto I find. I always call these PRNGs but I can see how having a naming distinction could help prevent misuse in the applied world.

Edit: thinking a bit more about it. I guess it wouldn't make sense to call anything "crypto" in crypto. It's like calling fries "french fries" in France. There they're just fries. I know this is a bad example because french fries are probably not from France :o)


But, it's important to make the decision because a "crypto" psudorandom number generator may be significantly slower than an insecure generator.

This is critical for performance-sensitive operations. For example, certain audio and video codecs need to simulate noise. In these cases, high performance is much more important than cryptographic security.


In the overwhelming majority of cases, cryptographic random bit generation performs perfectly adequately. Insecure random number generation is wildly overused in our industry. CSPRNGs should be the default application RNGs on most platforms, and should always be the default choice for developers.

Which makes stuff like PCG even weirder! Because in most cases, what you want is a somewhat slower generator that has better failsafe behavior.


Many microbenchmarks intended to measure other things become benchmarks of your RNG if you use anything slower than an LCG. This biases a lot of places towards using the poorest RNG they can get away with.

This is made worse by many purchasing decisions made based upon microbenchmarks with the requirements of "default settings" so defaulting to insecure is a sound business decision in more cases than you might think.


This is indeed a tragedy, because it could have been easily avoided by including LCG in microbenchmarks. LCG is less than ten lines, so even for very short microbenchmarks including RNG is very feasible. Alas, I guess such reasonable people don't write microbenchmarks in the first place.


I understand the "broken benchmarks" problem and I acknowledge that there are some cases that are so demanding and have such low security sensitivity that it makes sense to have an LCG in the standard library.

But I stand by my argument that the default platform RNG should be a CSPRNG, and that developers should reach for a CSPRNG by default.

Which makes all the attention we've been giving to stuff like xoroshiro128+ and PCG pretty confusing to me. It feels like people arguing very earnestly about non-problems, while ignoring a huge problem in our standard libraries.


>stuff like PCG

PCG is cryptographically secure, though. Or at least, it is as cryptographically secure as any other PRNG in the sense that nobody actually knows how to predict it, many have tried, nobody has succeeded, but nobody has proved it impossible.


So, it's "cryptographically secure" in the "sci.crypt proposal" sense.


Aren't cryptographic random number generators, still PRNGs.

It sounds a fun problem, predicting the future random numbers, going to have to have a play later at trying it.


Yes. GP is mistaken here; this is novel work that is somewhat concerning -- mostly in how it might apply to other similarly state-based RNGs.


I wouldn’t say this work is novel in the general case of “PRNGs are not CSPRNGs”. You can throw a constraint solver at most any PRNG and given sufficient output determine the state fairly easily. As a datapoint, doing this for xoroshiro took me half an hour: https://gist.github.com/karanlyons/805dbcc9e898dbd17e06f2627...


Heh, that sounds cool. I'll save opening that link for later.


Don’t worry, it’s safe: I didn’t put the actual solver, just proof that I solved it. Wouldn’t want to spoil the fun for anyone else :)


I made no comment on the work done here, it is novel and concerning if you use the outputs for important things. My comment is that non-cryptographic random number generators should not be used for security-critical functions.

These functions are specifically built for speed, not security.


Strong crypto RNGs use PRNGs but combines sources of entropy, environmental noise from devices such as the number of CPU cycles between user keystrokes. T̶h̶a̶t̶'̶s̶ ̶t̶h̶e̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶c̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶/̶d̶e̶v̶/̶r̶a̶n̶d̶o̶m̶ ̶a̶n̶d̶ ̶/̶d̶e̶v̶/̶u̶r̶a̶n̶d̶o̶m̶ ̶i̶n̶ ̶L̶i̶n̶u̶x̶.̶


I was wondering how you managed to strike out part of your comment when https://news.ycombinator.com/formatdoc doesn't mention any markup for that, but then I realized you were using the "combining long stroke overlay" Unicode character ( ̶). Nice trick.


After an initial seeding the only thing additional entropy adds is limiting the damage from a compromise of the internal state of the PRNG. And if the OS's internal PRNG state is compromised, what makes you think your process isn't?


What if you're using several PRNGs XORed together and reseeded frequently?


Why would you want to do that?


That would make much more difficult (if not impossible) to guess the internal state of all RNGs.


You can't guess the internal state of a CSPRNG based on the output. That's what makes it CS. The only way to get the internal state is to break the OS protection and look at the memory directly. And if the attacker can do that, then they can do it for the multiple PRNG version too.


I misunderstood the context in your replies. We were kind of talking about different topics.


No, that difference (between /dev/random and /dev/urandom) does not exist, has never existed and will never exist. Please don't spread those myths.


As I am uninformed on the subject, could you tell me the difference between /dev/random and /dev/urandom?


You're right, that was too short and thus too harsh. Please accept my apologies.

So a short roundup:

/dev/random and /dev/urandom used to be exactly the same (on Linux), except that /dev/random did some voodoo "entropy estimation" that the Linux kernel guys are totally in love with, but everyone else doesn't trust anyway. Even if there was a plausible model how to estimate entropy, which there isn't.

In the meantime things have changed quite a bit. But the main thing to know is the same: /dev/urandom is the device you want to use for cryptographic randomness. /dev/random is an oddity that will be there forever because Linux takes backwards compatibility (for user space) extremely seriously.

(On other Unixoid platforms you also want /dev/urandom)

If you can use syscalls and don't need a device, use getrandom(2) over /dev/urandom. It's better.

Oh, and please note that the Linux man pages have been updated! They now state clearly that /dev/urandom is suitable for cryptographic use. Of course, lots of old man pages floating around on the web.


Quite a long read, but I think it explains the situation quite well: https://www.2uo.de/myths-about-urandom/


Unfortunately, the article isn't in the best shape right now.

Back when it was written, things were clear: random and urandom are the same. Then came getrandom as a distraction. Now urandom is based on chacha. So it's different (but not worse – still, harder to explain).

The article's structure couldn't easily accomodate those changes, and time was and is in short supply, and so it's not wrong, but much less forceful and clear than it used to be. I hope it shapes up soon, but don't promise anything!

Still, I don't know a more up-to-date article. Maybe Thomas Pornin has something newer on StackOverflow?


> Now urandom is based on chacha

Did Linux follow the example set by OpenBSD?


I think so, yes. getrandom(2) is also an OpenBSD thing (getentropy), but of course they had to rename it and add a few options...



But there IS a difference. You should correct me by saying "both use entropy sources but /dev/random blocks (or used to block) unnecessarily when the kernel considers there's not enough entropy".

By your answers I don't know if still blocks or not.


Unless Quantum Uncertainty holds true, and your RNG uses Quantum randomness then all RNG are pseudo.


Can you crack this PRNG without knowing the seed?

    f(1) = 1 // seed

    f(n) = sha512(f(n-1) . f(1))


This is similar to Yarrow / Fortuna (internal state is a counter, output is the hash of the state) so I'm guessing it's not breakable, at least not trivially.


I guess it depends what you mean by “crack”. Given f(1), which I assume is public, you can predict all future outputs.


I said without knowing the seed, so f(1) is not public, only f(n) formula is.


f(1) is the first batch of output.


That is not what we mean by "crack". Read the article.


"Always use a cryptographic PSUEDO-RNG for important code!"

Just because it's "cryptographic" doesn't mean it's not pseudo-random.


The article is a serious waste of time.

It can be summarized as "Non cryptographic PRNGs can be predicted! Look, I cracked this one! I'm not going to tell you how I did it though."

There's no exposition describing non cryptographic PRNGs, nor any evidence given for why they're not sound beyond the author's assertion that he cracked one.

To be clear, non cryptographic PRNGs are often predictable, and shouldn't be used if that's a problem, but if you're interested in learning more about that, this article isn't going to help you much.

Skip the read.


Hey, author of the SMT attack here. There is probably a clever way to go after XorShift128+ as well, symbolic execution using an SMT solver is basically a brute-force solution.

I'll have to give this challenge a shot later.


As someone who first learned how to program by implementing PRNGs but never really digging deeper into it, I found this post very interesting to read. I do have an idea about some (small portion) of the things behind it, but I have no background in cryptography.

Looking at the other posts, it seems like most PRNGs are fine for non-cryptographic applications, but what are other ways to make PRNG's though? Everything I've learned (mostly simple stuff; Linear Congruential, Midsquare, etc.) seem to need to store a state to work, because otherwise, wouldn't you just output the same thing over and over again? I know there's stuff like /dev/random (though I'm unsure how that works), but that doesn't seem like a good idea for getting a lot of numbers.


Professor O'Neill (mentioned in the article) has written a PRNG [1]. The jury is still out on how powerful it is in general. There continue to be fights between what it means to be random for cryptographic purposes vs. numerical analysis purposes.

That said, the PDF on that site that serves as a writeup for PCG contains a nice discussion of the links between the size of the state held and the strength of the algorithm, including a discussion of the state of the art for crypto- and non-crypto- PRNGs.

[1] http://www.pcg-random.org/


There is in fact no real debate about what's required for an RNG to be suitable for security purpose. The standard for security is cryptographic. To design a new secure RNG, you effectively need to design a new cryptographic primitive (most likely, a new native stream cipher).

There may indeed be some debate about the requirements for non-security numerical analysis applications.

Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice. Neither PCG nor xorshiro128 are examples of these.


> Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice.

I was curious about this statement. So I did some research.

This page (http://vigna.di.unimi.it/xorshift/) indicates that xoroshiro128+ generates 64-bits in 0.81ns on a modern 3.6GHz CPU.

If I'm reading this page correctly (https://bench.cr.yp.to/results-stream.html) ChaCha20 gets about 0.8 cycles per byte these days on modern CPUs.

Running the math we get 9.88 GB/s for Xoroshiro128+ and 5.14 GB/s for ChaCha20 (assuming a 3.6GHz modern CPU for both).

Actually a _lot_ closer than I thought. It never occurred to me that a CSPRNG could compete, performance wise, with a non-CS PRNG.

I'm sure there's variation here. Sometimes CSPRNGs will have re-keying cycles, and probably most implementations aren't going to use the highly optimized version we see in the benchmark. I'm not sure if the Xoroshiro128+ benchmark I found used a version utilizing all the SIMD functionality of the CPU (like the ChaCha20 benchmark does). I'm also not sure if Xoroshiro128+ is the fastest PRNG or not.

But I have to say, if these numbers are accurate ... you're just plain right. There's no reason to default to a non-CSPRNG. CSPRNG is a safer default, and in the rare scenario that a developer needs more performance they can go seek out a specific PRNG for their needs.


I'm not even saying you should never use an LCG. Go ahead, if you're absolutely sure you need it, in the very specific places that you actually need it. But not only are CSPRNGs performance competitive on modern machines, but most places that need RNGs aren't in the performance hot-spot anyways.


This problem can be solved using Z3: https://gist.github.com/zb3/c59cf596ce80c501db5ca16c31a1c3a7

I don't know whether autor used solver or some other magic method... Solutions should be available to those who want to see them.


Does anyone know how the constants in xoroshiro128+ were chosen?


Ha ha! By going to your predictions page I can crack you! Great post.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: