Cracking random number generators (xoroshiro128+)

pbnjay · on Aug 22, 2017

Author's title should be "Cracking PSEUDO-random number generators" - We should all basically assume that any PRNG will be easily cracked like this and not use them for anything important to security!

Always use a cryptographic RNG for important code!

tptacek · on Aug 22, 2017

This shouldn't have been downvoted because it is exactly correct.

Cryptographic generators don't work like PCG and xoroshiro and Mersenne Twister. They're generally built by taking a cryptographically secure cipher or hash core, "keying" it with secret entropy, and running it in a streaming configuration (like CTR mode).

A properly designed CSPRNG can only be "cracked" in a few specific scenarios:

1. The primitive it's built on (or the streaming construction it's configured in) is broken, in which case the news for cryptography as a field is significantly bigger than the fact that an RNG has a flaw.

2. An attacker has exploited a systems flaw to directly disclose the contents of the memory the CSPRNG is operating out of, in which case you have bigger problems than your CSPRNG.

3. The secrets that key the generator have become predictable. This is in practice the only way CSPRNGs get broken (unintentionally), and, in practice, always means the CSPRNG wasn't initialized properly (the "cold start entropy problem").

You should use the getrandom() system call, or read from /dev/urandom, to the exclusion of all other mechanisms.

IanCal · on Aug 22, 2017

I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.

A CSPRNG is surely a type of PRNG. Is that not right?

Their comment doesn't really seem correct to me. The title is "Cracking random number generators (xoroshiro128+)" which seems pretty accurate to me. The article definitely doesn't seem to say it's breaking anything other than a very specific, flawed random number generator.

tptacek · on Aug 22, 2017

I'd have added "Cryptographically secure" and not capitalized "pseudo", but that's small-stakes stuff. The point he's making is the most important safety point on this topic.

pbnjay · on Aug 22, 2017

Duly noted!

I'm not in this field, but I know enough to know what not to do (most of the time).

alphaalpha101 · on Aug 22, 2017

>I'd have called that a PRNG, because to me there were only two main categories. Pseudo-random, where it's designed to be unpredictable, and actually random where it is based on an external hardware source of true random information.

PRNGs produce numbers that seem hard to predict. CSPRNGs product numbers that actually are hard to predict, assuming P != NP (kind of).

raverbashing · on Aug 22, 2017

> A CSPRNG is surely a type of PRNG. Is that not right?

Yes. In the same way the POTUS limousine is a car

baby · on Aug 22, 2017

FWIW you rarely hear the term CSPRNG in crypto I find. I always call these PRNGs but I can see how having a naming distinction could help prevent misuse in the applied world.

Edit: thinking a bit more about it. I guess it wouldn't make sense to call anything "crypto" in crypto. It's like calling fries "french fries" in France. There they're just fries. I know this is a bad example because french fries are probably not from France :o)

gwbas1c · on Aug 22, 2017

But, it's important to make the decision because a "crypto" psudorandom number generator may be significantly slower than an insecure generator.

This is critical for performance-sensitive operations. For example, certain audio and video codecs need to simulate noise. In these cases, high performance is much more important than cryptographic security.

tptacek · on Aug 22, 2017

In the overwhelming majority of cases, cryptographic random bit generation performs perfectly adequately. Insecure random number generation is wildly overused in our industry. CSPRNGs should be the default application RNGs on most platforms, and should always be the default choice for developers.

Which makes stuff like PCG even weirder! Because in most cases, what you want is a somewhat slower generator that has better failsafe behavior.

aidenn0 · on Aug 22, 2017

Many microbenchmarks intended to measure other things become benchmarks of your RNG if you use anything slower than an LCG. This biases a lot of places towards using the poorest RNG they can get away with.

This is made worse by many purchasing decisions made based upon microbenchmarks with the requirements of "default settings" so defaulting to insecure is a sound business decision in more cases than you might think.

sanxiyn · on Aug 22, 2017

This is indeed a tragedy, because it could have been easily avoided by including LCG in microbenchmarks. LCG is less than ten lines, so even for very short microbenchmarks including RNG is very feasible. Alas, I guess such reasonable people don't write microbenchmarks in the first place.

tptacek · on Aug 22, 2017

I understand the "broken benchmarks" problem and I acknowledge that there are some cases that are so demanding and have such low security sensitivity that it makes sense to have an LCG in the standard library.

But I stand by my argument that the default platform RNG should be a CSPRNG, and that developers should reach for a CSPRNG by default.

Which makes all the attention we've been giving to stuff like xoroshiro128+ and PCG pretty confusing to me. It feels like people arguing very earnestly about non-problems, while ignoring a huge problem in our standard libraries.

alphaalpha101 · on Aug 22, 2017

>stuff like PCG

PCG is cryptographically secure, though. Or at least, it is as cryptographically secure as any other PRNG in the sense that nobody actually knows how to predict it, many have tried, nobody has succeeded, but nobody has proved it impossible.

tptacek · on Aug 23, 2017

So, it's "cryptographically secure" in the "sci.crypt proposal" sense.

anfractuosity · on Aug 22, 2017

Aren't cryptographic random number generators, still PRNGs.

It sounds a fun problem, predicting the future random numbers, going to have to have a play later at trying it.

hlieberman · on Aug 22, 2017

Yes. GP is mistaken here; this is novel work that is somewhat concerning -- mostly in how it might apply to other similarly state-based RNGs.

karanlyons · on Aug 22, 2017

I wouldn’t say this work is novel in the general case of “PRNGs are not CSPRNGs”. You can throw a constraint solver at most any PRNG and given sufficient output determine the state fairly easily. As a datapoint, doing this for xoroshiro took me half an hour: https://gist.github.com/karanlyons/805dbcc9e898dbd17e06f2627...

anfractuosity · on Aug 22, 2017

Heh, that sounds cool. I'll save opening that link for later.

karanlyons · on Aug 22, 2017

Don’t worry, it’s safe: I didn’t put the actual solver, just proof that I solved it. Wouldn’t want to spoil the fun for anyone else :)

pbnjay · on Aug 22, 2017

I made no comment on the work done here, it is novel and concerning if you use the outputs for important things. My comment is that non-cryptographic random number generators should not be used for security-critical functions.

These functions are specifically built for speed, not security.

DiThi · on Aug 22, 2017

Strong crypto RNGs use PRNGs but combines sources of entropy, environmental noise from devices such as the number of CPU cycles between user keystrokes. T̶h̶a̶t̶'̶s̶ ̶t̶h̶e̶ ̶d̶i̶f̶f̶e̶r̶e̶n̶c̶e̶ ̶b̶e̶t̶w̶e̶e̶n̶ ̶/̶d̶e̶v̶/̶r̶a̶n̶d̶o̶m̶ ̶a̶n̶d̶ ̶/̶d̶e̶v̶/̶u̶r̶a̶n̶d̶o̶m̶ ̶i̶n̶ ̶L̶i̶n̶u̶x̶.̶

yorwba · on Aug 22, 2017

I was wondering how you managed to strike out part of your comment when https://news.ycombinator.com/formatdoc doesn't mention any markup for that, but then I realized you were using the "combining long stroke overlay" Unicode character ( ̶). Nice trick.

bradleyjg · on Aug 22, 2017

After an initial seeding the only thing additional entropy adds is limiting the damage from a compromise of the internal state of the PRNG. And if the OS's internal PRNG state is compromised, what makes you think your process isn't?

DiThi · on Aug 23, 2017

What if you're using several PRNGs XORed together and reseeded frequently?

bradleyjg · on Aug 23, 2017

Why would you want to do that?

DiThi · on Aug 23, 2017

That would make much more difficult (if not impossible) to guess the internal state of all RNGs.

bradleyjg · on Aug 23, 2017

You can't guess the internal state of a CSPRNG based on the output. That's what makes it CS. The only way to get the internal state is to break the OS protection and look at the memory directly. And if the attacker can do that, then they can do it for the multiple PRNG version too.

DiThi · on Aug 24, 2017

I misunderstood the context in your replies. We were kind of talking about different topics.

Tomte · on Aug 22, 2017

No, that difference (between /dev/random and /dev/urandom) does not exist, has never existed and will never exist. Please don't spread those myths.

andruby · on Aug 22, 2017

As I am uninformed on the subject, could you tell me the difference between /dev/random and /dev/urandom?

Tomte · on Aug 22, 2017

You're right, that was too short and thus too harsh. Please accept my apologies.

So a short roundup:

/dev/random and /dev/urandom used to be exactly the same (on Linux), except that /dev/random did some voodoo "entropy estimation" that the Linux kernel guys are totally in love with, but everyone else doesn't trust anyway. Even if there was a plausible model how to estimate entropy, which there isn't.

In the meantime things have changed quite a bit. But the main thing to know is the same: /dev/urandom is the device you want to use for cryptographic randomness. /dev/random is an oddity that will be there forever because Linux takes backwards compatibility (for user space) extremely seriously.

(On other Unixoid platforms you also want /dev/urandom)

If you can use syscalls and don't need a device, use getrandom(2) over /dev/urandom. It's better.

Oh, and please note that the Linux man pages have been updated! They now state clearly that /dev/urandom is suitable for cryptographic use. Of course, lots of old man pages floating around on the web.

filleokus · on Aug 22, 2017

Quite a long read, but I think it explains the situation quite well: https://www.2uo.de/myths-about-urandom/

Tomte · on Aug 22, 2017

Unfortunately, the article isn't in the best shape right now.

Back when it was written, things were clear: random and urandom are the same. Then came getrandom as a distraction. Now urandom is based on chacha. So it's different (but not worse – still, harder to explain).

The article's structure couldn't easily accomodate those changes, and time was and is in short supply, and so it's not wrong, but much less forceful and clear than it used to be. I hope it shapes up soon, but don't promise anything!

Still, I don't know a more up-to-date article. Maybe Thomas Pornin has something newer on StackOverflow?

snakeanus · on Aug 22, 2017

> Now urandom is based on chacha

Did Linux follow the example set by OpenBSD?

Tomte · on Aug 22, 2017

I think so, yes. getrandom(2) is also an OpenBSD thing (getentropy), but of course they had to rename it and add a few options...

snakeanus · on Aug 22, 2017

Also an interesting and related article https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...

DiThi · on Aug 22, 2017

But there IS a difference. You should correct me by saying "both use entropy sources but /dev/random blocks (or used to block) unnecessarily when the kernel considers there's not enough entropy".

By your answers I don't know if still blocks or not.

csomar · on Aug 22, 2017

Unless Quantum Uncertainty holds true, and your RNG uses Quantum randomness then all RNG are pseudo.

imaginenore · on Aug 22, 2017

Can you crack this PRNG without knowing the seed?

    f(1) = 1 // seed

    f(n) = sha512(f(n-1) . f(1))

mdpopescu · on Aug 22, 2017

This is similar to Yarrow / Fortuna (internal state is a counter, output is the hash of the state) so I'm guessing it's not breakable, at least not trivially.

sdevlin · on Aug 22, 2017

I guess it depends what you mean by “crack”. Given f(1), which I assume is public, you can predict all future outputs.

imaginenore · on Aug 22, 2017

I said without knowing the seed, so f(1) is not public, only f(n) formula is.

benchaney · on Aug 22, 2017

f(1) is the first batch of output.

IshKebab · on Aug 22, 2017

That is not what we mean by "crack". Read the article.

VikingCoder · on Aug 22, 2017

"Always use a cryptographic PSUEDO-RNG for important code!"

Just because it's "cryptographic" doesn't mean it's not pseudo-random.

nolemurs · on Aug 22, 2017

The article is a serious waste of time.

It can be summarized as "Non cryptographic PRNGs can be predicted! Look, I cracked this one! I'm not going to tell you how I did it though."

There's no exposition describing non cryptographic PRNGs, nor any evidence given for why they're not sound beyond the author's assertion that he cracked one.

To be clear, non cryptographic PRNGs are often predictable, and shouldn't be used if that's a problem, but if you're interested in learning more about that, this article isn't going to help you much.

Skip the read.

carom · on Aug 22, 2017

Hey, author of the SMT attack here. There is probably a clever way to go after XorShift128+ as well, symbolic execution using an SMT solver is basically a brute-force solution.

I'll have to give this challenge a shot later.

xoroshiro · on Aug 22, 2017

As someone who first learned how to program by implementing PRNGs but never really digging deeper into it, I found this post very interesting to read. I do have an idea about some (small portion) of the things behind it, but I have no background in cryptography.

Looking at the other posts, it seems like most PRNGs are fine for non-cryptographic applications, but what are other ways to make PRNG's though? Everything I've learned (mostly simple stuff; Linear Congruential, Midsquare, etc.) seem to need to store a state to work, because otherwise, wouldn't you just output the same thing over and over again? I know there's stuff like /dev/random (though I'm unsure how that works), but that doesn't seem like a good idea for getting a lot of numbers.

andrewla · on Aug 22, 2017

Professor O'Neill (mentioned in the article) has written a PRNG [1]. The jury is still out on how powerful it is in general. There continue to be fights between what it means to be random for cryptographic purposes vs. numerical analysis purposes.

That said, the PDF on that site that serves as a writeup for PCG contains a nice discussion of the links between the size of the state held and the strength of the algorithm, including a discussion of the state of the art for crypto- and non-crypto- PRNGs.

[1] http://www.pcg-random.org/

tptacek · on Aug 22, 2017

There is in fact no real debate about what's required for an RNG to be suitable for security purpose. The standard for security is cryptographic. To design a new secure RNG, you effectively need to design a new cryptographic primitive (most likely, a new native stream cipher).

There may indeed be some debate about the requirements for non-security numerical analysis applications.

Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice. Neither PCG nor xorshiro128 are examples of these.

fpgaminer · on Aug 23, 2017

> Most development platforms should be defaulting to secure random number generators, and most developers should be reaching for secure random number generators as their default choice.

I was curious about this statement. So I did some research.

This page (http://vigna.di.unimi.it/xorshift/) indicates that xoroshiro128+ generates 64-bits in 0.81ns on a modern 3.6GHz CPU.

If I'm reading this page correctly (https://bench.cr.yp.to/results-stream.html) ChaCha20 gets about 0.8 cycles per byte these days on modern CPUs.

Running the math we get 9.88 GB/s for Xoroshiro128+ and 5.14 GB/s for ChaCha20 (assuming a 3.6GHz modern CPU for both).

Actually a _lot_ closer than I thought. It never occurred to me that a CSPRNG could compete, performance wise, with a non-CS PRNG.

I'm sure there's variation here. Sometimes CSPRNGs will have re-keying cycles, and probably most implementations aren't going to use the highly optimized version we see in the benchmark. I'm not sure if the Xoroshiro128+ benchmark I found used a version utilizing all the SIMD functionality of the CPU (like the ChaCha20 benchmark does). I'm also not sure if Xoroshiro128+ is the fastest PRNG or not.

But I have to say, if these numbers are accurate ... you're just plain right. There's no reason to default to a non-CSPRNG. CSPRNG is a safer default, and in the rare scenario that a developer needs more performance they can go seek out a specific PRNG for their needs.

tptacek · on Aug 23, 2017

I'm not even saying you should never use an LCG. Go ahead, if you're absolutely sure you need it, in the very specific places that you actually need it. But not only are CSPRNGs performance competitive on modern machines, but most places that need RNGs aren't in the performance hot-spot anyways.

zb3 · on Aug 23, 2017

This problem can be solved using Z3: https://gist.github.com/zb3/c59cf596ce80c501db5ca16c31a1c3a7

I don't know whether autor used solver or some other magic method... Solutions should be available to those who want to see them.

edflsafoiewq · on Aug 23, 2017

Does anyone know how the constants in xoroshiro128+ were chosen?

aj7 · on Aug 22, 2017

Ha ha! By going to your predictions page I can crack you! Great post.