Author's title should be "Cracking PSEUDO-random number generators" - We should all basically assume that any PRNG will be easily cracked like this and not use them for anything important to security!
Always use a cryptographic RNG for important code!
This shouldn't have been downvoted because it is exactly correct.
Cryptographic generators don't work like PCG and xoroshiro and Mersenne Twister. They're generally built by taking a cryptographically secure cipher or hash core, "keying" it with secret entropy, and running it in a streaming configuration (like CTR mode).
A properly designed CSPRNG can only be "cracked" in a few specific scenarios:
1. The primitive it's built on (or the streaming construction it's configured in) is broken, in which case the news for cryptography as a field is significantly bigger than the fact that an RNG has a flaw.
2. An attacker has exploited a systems flaw to directly disclose the contents of the memory the CSPRNG is operating out of, in which case you have bigger problems than your CSPRNG.
3. The secrets that key the generator have become predictable. This is in practice the only way CSPRNGs get broken (unintentionally), and, in practice, always means the CSPRNG wasn't initialized properly (the "cold start entropy problem").
You should use the getrandom() system call, or read from /dev/urandom, to the exclusion of all other mechanisms.
I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.
A CSPRNG is surely a type of PRNG. Is that not right?
Their comment doesn't really seem correct to me. The title is "Cracking random number generators (xoroshiro128+)" which seems pretty accurate to me. The article definitely doesn't seem to say it's breaking anything other than a very specific, flawed random number generator.
I'd have added "Cryptographically secure" and not capitalized "pseudo", but that's small-stakes stuff. The point he's making is the most important safety point on this topic.
>I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.
PRNGs produce numbers that seem hard to predict. CSPRNGs product numbers that actually are hard to predict, assuming P != NP (kind of).
FWIW you rarely hear the term CSPRNG in crypto I find. I always call these PRNGs but I can see how having a naming distinction could help prevent misuse in the applied world.
Edit: thinking a bit more about it. I guess it wouldn't make sense to call anything "crypto" in crypto. It's like calling fries "french fries" in France. There they're just fries. I know this is a bad example because french fries are probably not from France :o)
But, it's important to make the decision because a "crypto" psudorandom number generator may be significantly slower than an insecure generator.
This is critical for performance-sensitive operations. For example, certain audio and video codecs need to simulate noise. In these cases, high performance is much more important than cryptographic security.
In the overwhelming majority of cases, cryptographic random bit generation performs perfectly adequately. Insecure random number generation is wildly overused in our industry. CSPRNGs should be the default application RNGs on most platforms, and should always be the default choice for developers.
Which makes stuff like PCG even weirder! Because in most cases, what you want is a somewhat slower generator that has better failsafe behavior.
Many microbenchmarks intended to measure other things become benchmarks of your RNG if you use anything slower than an LCG. This biases a lot of places towards using the poorest RNG they can get away with.
This is made worse by many purchasing decisions made based upon microbenchmarks with the requirements of "default settings" so defaulting to insecure is a sound business decision in more cases than you might think.
This is indeed a tragedy, because it could have been easily avoided by including LCG in microbenchmarks. LCG is less than ten lines, so even for very short microbenchmarks including RNG is very feasible. Alas, I guess such reasonable people don't write microbenchmarks in the first place.
I understand the "broken benchmarks" problem and I acknowledge that there are some cases that are so demanding and have such low security sensitivity that it makes sense to have an LCG in the standard library.
But I stand by my argument that the default platform RNG should be a CSPRNG, and that developers should reach for a CSPRNG by default.
Which makes all the attention we've been giving to stuff like xoroshiro128+ and PCG pretty confusing to me. It feels like people arguing very earnestly about non-problems, while ignoring a huge problem in our standard libraries.
PCG is cryptographically secure, though. Or at least, it is as cryptographically secure as any other PRNG in the sense that nobody actually knows how to predict it, many have tried, nobody has succeeded, but nobody has proved it impossible.
I wouldn’t say this work is novel in the general case of “PRNGs are not CSPRNGs”. You can throw a constraint solver at most any PRNG and given sufficient output determine the state fairly easily. As a datapoint, doing this for xoroshiro took me half an hour: https://gist.github.com/karanlyons/805dbcc9e898dbd17e06f2627...
I made no comment on the work done here, it is novel and concerning if you use the outputs for important things. My comment is that non-cryptographic random number generators should not be used for security-critical functions.
These functions are specifically built for speed, not security.
Strong crypto RNGs use PRNGs but combines sources of entropy, environmental noise from devices such as the number of CPU cycles between user keystrokes. T̶h̶a̶t̶'̶s̶ ̶t̶h̶e̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶c̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶/̶d̶e̶v̶/̶r̶a̶n̶d̶o̶m̶ ̶a̶n̶d̶ ̶/̶d̶e̶v̶/̶u̶r̶a̶n̶d̶o̶m̶ ̶i̶n̶ ̶L̶i̶n̶u̶x̶.̶
I was wondering how you managed to strike out part of your comment when https://news.ycombinator.com/formatdoc doesn't mention any markup for that, but then I realized you were using the "combining long stroke overlay" Unicode character ( ̶). Nice trick.
After an initial seeding the only thing additional entropy adds is limiting the damage from a compromise of the internal state of the PRNG. And if the OS's internal PRNG state is compromised, what makes you think your process isn't?
You can't guess the internal state of a CSPRNG based on the output. That's what makes it CS. The only way to get the internal state is to break the OS protection and look at the memory directly. And if the attacker can do that, then they can do it for the multiple PRNG version too.
You're right, that was too short and thus too harsh. Please accept my apologies.
So a short roundup:
/dev/random and /dev/urandom used to be exactly the same (on Linux), except that /dev/random did some voodoo "entropy estimation" that the Linux kernel guys are totally in love with, but everyone else doesn't trust anyway. Even if there was a plausible model how to estimate entropy, which there isn't.
In the meantime things have changed quite a bit. But the main thing to know is the same: /dev/urandom is the device you want to use for cryptographic randomness. /dev/random is an oddity that will be there forever because Linux takes backwards compatibility (for user space) extremely seriously.
(On other Unixoid platforms you also want /dev/urandom)
If you can use syscalls and don't need a device, use getrandom(2) over /dev/urandom. It's better.
Oh, and please note that the Linux man pages have been updated! They now state clearly that /dev/urandom is suitable for cryptographic use. Of course, lots of old man pages floating around on the web.
Unfortunately, the article isn't in the best shape right now.
Back when it was written, things were clear: random and urandom are the same. Then came getrandom as a distraction. Now urandom is based on chacha. So it's different (but not worse – still, harder to explain).
The article's structure couldn't easily accomodate those changes, and time was and is in short supply, and so it's not wrong, but much less forceful and clear than it used to be. I hope it shapes up soon, but don't promise anything!
Still, I don't know a more up-to-date article. Maybe Thomas Pornin has something newer on StackOverflow?
But there IS a difference. You should correct me by saying "both use entropy sources but /dev/random blocks (or used to block) unnecessarily when the kernel considers there's not enough entropy".
By your answers I don't know if still blocks or not.
This is similar to Yarrow / Fortuna (internal state is a counter, output is the hash of the state) so I'm guessing it's not breakable, at least not trivially.
It can be summarized as "Non cryptographic PRNGs can be predicted! Look, I cracked this one! I'm not going to tell you how I did it though."
There's no exposition describing non cryptographic PRNGs, nor any evidence given for why they're not sound beyond the author's assertion that he cracked one.
To be clear, non cryptographic PRNGs are often predictable, and shouldn't be used if that's a problem, but if you're interested in learning more about that, this article isn't going to help you much.
Hey, author of the SMT attack here. There is probably a clever way to go after XorShift128+ as well, symbolic execution using an SMT solver is basically a brute-force solution.
As someone who first learned how to program by implementing PRNGs but never really digging deeper into it, I found this post very interesting to read. I do have an idea about some (small portion) of the things behind it, but I have no background in cryptography.
Looking at the other posts, it seems like most PRNGs are fine for non-cryptographic applications, but what are other ways to make PRNG's though? Everything I've learned (mostly simple stuff; Linear Congruential, Midsquare, etc.) seem to need to store a state to work, because otherwise, wouldn't you just output the same thing over and over again? I know there's stuff like /dev/random (though I'm unsure how that works), but that doesn't seem like a good idea for getting a lot of numbers.
Professor O'Neill (mentioned in the article) has written a PRNG [1]. The jury is still out on how powerful it is in general. There continue to be fights between what it means to be random for cryptographic purposes vs. numerical analysis purposes.
That said, the PDF on that site that serves as a writeup for PCG contains a nice discussion of the links between the size of the state held and the strength of the algorithm, including a discussion of the state of the art for crypto- and non-crypto- PRNGs.
There is in fact no real debate about what's required for an RNG to be suitable for security purpose. The standard for security is cryptographic. To design a new secure RNG, you effectively need to design a new cryptographic primitive (most likely, a new native stream cipher).
There may indeed be some debate about the requirements for non-security numerical analysis applications.
Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice. Neither PCG nor xorshiro128 are examples of these.
> Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice.
I was curious about this statement. So I did some research.
Running the math we get 9.88 GB/s for Xoroshiro128+ and 5.14 GB/s for ChaCha20 (assuming a 3.6GHz modern CPU for both).
Actually a _lot_ closer than I thought. It never occurred to me that a CSPRNG could compete, performance wise, with a non-CS PRNG.
I'm sure there's variation here. Sometimes CSPRNGs will have re-keying cycles, and probably most implementations aren't going to use the highly optimized version we see in the benchmark. I'm not sure if the Xoroshiro128+ benchmark I found used a version utilizing all the SIMD functionality of the CPU (like the ChaCha20 benchmark does). I'm also not sure if Xoroshiro128+ is the fastest PRNG or not.
But I have to say, if these numbers are accurate ... you're just plain right. There's no reason to default to a non-CSPRNG. CSPRNG is a safer default, and in the rare scenario that a developer needs more performance they can go seek out a specific PRNG for their needs.
I'm not even saying you should never use an LCG. Go ahead, if you're absolutely sure you need it, in the very specific places that you actually need it. But not only are CSPRNGs performance competitive on modern machines, but most places that need RNGs aren't in the performance hot-spot anyways.
Always use a cryptographic RNG for important code!