[EDIT: If you're interested in this sort of thing, then you might want to read my blog.]
Jytter is a true random number generator (TRNG) which conforms to the reflexive density constant and runs in X86 and X64 userspace, but could easily be adapted to other processor architectures. It uses no IO and, apart from saving and restoring registers at the beginning and end of the code, no memory whatsoever. Whereas existing TRNGs tend to use elaborate physical phenomena, Jytter uses only the CPU core timer, in such as a way as to ensure the accrual of a certain level of entropy before returning its result.
If you're looking for a very fast pseudorandom number generator which also conforms to the reflexive density constant, have a look at LMD2 and its long nonzero sequence adaptation.
The text below was adapted from a white paper I wrote on October 15, 2011, but didn't get around to publishing until now.
1. Introduction
A TRNG converts random physical information to random virtual information, ideally, producing uniformally distributed random numbers.
Historically, TRNGs have tended to measure macroscopic phenomena, such as network states. Such operations can require substantial time due to operating system calls and IO requirements. Worse, they can suffer from high predictability during quiescent periods, for example, shortly after cold boot, or during idle hours of the night. Ultimately, they may be impossible to run under certain scenarios due to hardware malfunctions or lack of execution privilege.
Jytter has been designed from the opposite philosophical perspective: find the simplest, most accessible source of randomness available to userspace, and exploit it in the simplest possible algorithm which will yield randomness of a certain quality. The point is to be able to characterize the randomness under conditions of maximum quiescence -- not to create an algorithm so complex that it appears to produce high quality randomness under typical use scenarios.
2. Timer Phase and Interrupt Duration Vectors
A computer may be crudely modeled as a CPU, a memory, and a set of external periodic timers. A "timer" may actually be a harddrive, which interrupts every 128 ms in order to inform the CPU that another block has been read into memory. Or perhaps it's a network device, which interrupts every 71 ms to indicate that another packet has been received. (Of course these devices are not so precisely periodic in real life, but that fact only improves randomness quality, so let's ignore it.)
We can think of many of these "timers": (1) harddrives, (2) network devices, (3) time-of-day clocks, (4) video devices, (5) sound devices, (6) USB controllers, etc.
Imagine that each timer is firing with perfect regularity. Suppose, furthermore, that the CPU contains a timer (an actual clock, as opposed to one of the devices above) which ticks once every ms, that period being shorter than any of the external timer periods. Now, the only thing that we don't know is each external timer's phase.
Back to the harddrive that "ticks" every 128 ms. Suppose we take note of the time, and wait for the tick. After 29 ms, it interrupts the CPU. Bingo! We have a true random number, which is the phase of the harddrive interrupt. In this case, the phase would be 99 ms. (99+29=128.) Furthermore, it's good for 7 bits of randomness, because the period is 128 ms.
Realistically, this doesn't quite work, because we don't know the expected period. In userspace, all we see is some lost time due to the interrupt occurring at the kernel level; we don't know which device actually interrupted us because the OS prevents us from knowing that. How can we use this interrupt, then, to provide some randomness?
Well, there are 3 things that we do know from userspace: (1) the timestamp counter (TSC) value when we entered the TRNG, (2) the TSC value when we last observed it (before the interrupt), and (3) the TSC value now (after the interrupt).
First, consider the difference between TSC values (3) and (2). This difference is a determinant of whether or not we have observed this particular interrupt before. Specifically, if we've seen this same difference before, then it's possible that we've seen this same interrupt before, in which case, we ignore it. But if the difference has never been observed before, then there is something new about this interrupt. It may in fact be due to the same physical source as a prior interrupt (e.g. the harddrive), but traffic congestion on the way to the CPU has caused it to take longer to complete, which in itself adds randomness -- maybe only an extra bit, but more randomness nonetheless. And by the way, many of the shorter-duration events will be due to sources other than quasiperiodic device interrupts, such as or memory line loads or CPU pipeline stalls. We don't care. The same reasoning applies.
Secondly, think back to the phases of the timers above. While we don't know the phase of this interrupt (because we don't know the period), we do in fact have equivalent information, which is the difference between TSC values (3) and (1). If we evaluate the product of (TSC at (3) minus TSC at (1)) and (the interrupt duration just measured above, which is TSC at (3) minus TSC at (2)), then we have an intermediate true random value incorporating timer phase and interrupt duration of one particular "timer". In other words, we consider (TSC at (1)) to be 0. (In practice, we don't bother to subtract it from (TSC at (3)) because doing so is a waste of time. But when characterizing Jytter's randomness quality, it's paramount to do so, because we don't want to mistake the continuously changing TSC values for actual randomness.)
It is the linear combination of products of the form (TSC at (3) minus TSC at (1)) and (TSC at (3) minus TSC at (2)) which ultimately serves as the output of the TRNG.
Jytter is a bit more complex, but essentially, it's evaluating the dot product of the phases (TSC at (3) minus TSC at (1)) of several quasiperiodic timers, with their respective interrupt durations (TSC at (3) minus TSC at (2)). In particular, it doesn't do exactly the above; it actually takes a hash of the products before summing them into a dot product, because doing so eliminates the need for all memory access whatsoever, or even conditional execution ("if") paths of different lengths, and thus reduces the incidence of pseudorandom timing masquerading as true random timing, due to its own execution path variances. This means that it does at least as much work as is required in order to obtain randomness of the quality of the above algo. We explore this more below.
3. Implementation
Realistically, Jytter can't properly evaluate the above dot product exactly as proscribed. One reason is that, due to practical limits of characterization, it can only produce 32 bits at a time, but the TSC alone is 64 bits. Another reason is that we don't want Jytter to use memory at all, lest we create internal timing complexity which persuades us that Jytter is less predictable than it actually is; the lack of memory means that we are limited to looking at the low 5 bits of interrupt duration (due to the limitations of a 32-bit bitmap implemented in a single CPU register), so occasionally we confuse 2 different interrupts for one and the same (but again, that only improves randomness because it makes Jytter run longer than necessary).
In its usual implementation, it waits for the occurrence of at least 16 unique interrupt sources (i.e. 16 bits set in the register bitmap) before issuing a result. Thus each such unique event is valued at merely 2 bits of randomness, which is very conservative. (Some events may be worth less than 2 bits of randomness, but the average is probably in excess of 4 bits, so we're safe in this assumption.) Furthermore, it integrates a hash of the observed TSC value on every iteration of its loop, whether or not a new event was observed. This is done in such a way that the exact order of observed interrupt durations matters, even if we're seeing the same 3 previously observed events occur over and over again, while waiting for a new unique interrupt duration to appear. Even very low-level phenomena such as memory loads tend to occur at slightly unpredictable rates, which allows Jytter to benefit from this ordering sensitivity.
Best of all, it can be run in userspace because it merely reads the TSC, and takes only as much time as is required to observe 16 unique interrupt durations (or a few more, on account of the occasional hash collision).
No comments:
Post a Comment