A comment on Hacker News complained that the code in my previous blog post does not seed the
std::mt19937
random number engine properly. The code was taken directly from a CppCon presentation, so I don’t want to take the blame, but the comment is right — the initialization code can be improved.State size and seeding
The initialization in the blog post was done asstd::random_device rd; std::mt19937 gen(rd());which seeds the
std::mt19937
random number engine with a random 32-bit value. The problem with this is that that the Mersenne twister has 19968 bits of internal state so it can generate \(2^{19968}\) streams of random values, but we can only reach \(2^{32}\) of those states when initializing with a 32-bit value.This is not necessarily a problem. Let’s say the random numbers are used for generating input data in unit tests. The test suite is probably not run more than a few thousand times, so it does not matter that it only can create \(2^{32}\) different test runs. But there are use-cases where this is a problem.
The random number engine can be seeded with more data by using
std::seed_seq
, and the code below seeds the std::mt19937
with the same number of bits as are in the statestd::random_device rd; std::array<int, std::mt19937::state_size> seed_data; std::generate_n(seed_data.data(), seed_data.size(), std::ref(rd)); std::seed_seq seq(std::begin(seed_data), std::end(seed_data)); std::mt19937 gen(seq);
std::random_device
One other potential problem is the quality of the seed values. The idea behind std::random_device
is that it returns non-deterministic random numbers, but it is allowed to return deterministic values (e.g. if a non-deterministic source is not available to the implementation). I’m not a big fan of this functionality — it either does exactly what you want (generates non-deterministic values) or it does the opposite (generates deterministic values), and there is no way you can determine which.1This is probably not a problem when developing for the big platforms, but there may be surprises when running the code in other environments — at least old versions of libstdc++ on MinGW always return the same sequence of values...
1. The
std::random_device
can return an estimate of the entropy, and it is required to return 0 if the values are generated deterministically. But it is not required to return non-zero for the non-deterministic case, and e.g. libstdc++ is conservative and always estimates the entropy as 0, even when /dev/urandom
or the x86 RDRND instruction are used.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.