Friday, August 24, 2012

Characterization

This post won't make sense unless you've read the previous one.

Randomness quality is ultimately a subjective consideration, as it's always possible that nonobvious patterns lurk below the surface of a TRNG which make it appear much better than it is, or that obvious patters would emerge during periods of atypical quiescence in the underlying physical system. As such, the quality of any TRNG should be evaluated under circumstances of maximum quiescence, e.g. just after cold boot, during periods of maximum device idleness, etc.

Jytter is designed to compensate for quiescence by accumulating entropy for as long as necessary to ensure quality (most importantly, negligible deviation from the reflexive density constant). Indeed, it runs much faster in a virtual machine under Windows (high timing entropy) than on a minimal Linux installation (low timing entropy). For a cool demonstration of this systemic entropy response effect, compare the time taken to generate 1000 random numers with and without a video playing in the background.

Before I discovered the reflexive density constant (in a personal sense, as others probably knew about it before me), I had devised a related test, described below as the LMICBR test. It can be considered as a useful adjunct to reflexive density constant compliance and other tests such as the famous Diehard Tests. Also discussed below are some nonobvious sources of systemic noise which should be attenuated to the maximum possible extent that could possibly occur during actual usage scenarios, prior to evaluating any TRNG.

The text below was adapted from a white paper I wrote on October 15, 2011, but didn't get around to publishing until now.

1. Introduction

In order to fully characterize Jytter, even on a single model of CPU, we would need to run it trillions of times, as it can produce over 4 billion unique outputs (2^32, to be exact), and we would need to obtain some statistically significant measure of the frequency of each output, all of which should be asymptotically identical in the ideal case. In practice, because the execution time of one iteration of Jytter is on the order of 10 100ms, regardless of CPU speed, it would take too many machines and/or too much time to conduct this experiment.

We present herein a quick and dirty shortcut that can give us substantial statistical insight in a matter of machine-weeks. This is described below, after we present some necessary considerations pertinent to the characterization process.

2. Experimental Quality Safeguards

1.0. CPU Model Variability

Clearly, every model of CPU has its own timing idiosyncracies. Jytter should be characterized on as many as possible, relative to those on which it would be expected to run. For the paranoid, new CPU models should continue to be characterized as they become available.

1.1. Core Frequency Throttling

Empirically, CPUs maintain a constant timestamp counter (TSC) tick rate, regardless of whether or not they are running at maximum core frequency. They do this so that physical time appears to pass at the same rate, insofar as TSC clients (such as Jytter) are concerned, regardless of core frequency. This is obviously important when time-sensitive processes are being run.

The downside of this fact, as applies to Jytter, is that, when the CPU is operating at less than maximum frequency, the low bits of the TSC become more predictable, if not constant. Worse still, the throttling factor is unpredictable: while 2 or 4 is common, 2^10 is not out of the question during periods of extreme idleness. Even odd divisors such as 3 can occur, making the low bits of the TSC nonconstant, but still highly predictable.

Because Jytter must be competent to run, regardless of CPU clock frequency, any experiment seeking to ascertain Jytter's output noise quality should be run at all supported CPU frequencies. For maximum effect, it should be run during CPU frequency transitions ("throttling" events) as well. (In practice, this is probably of little concern, because a frequency transition inevitably entails at least as much event duration variability as does constant-frequency execution, as such transitions severely disrupt the frequency and phase relationships between the CPU core and external devices. So the generated noise should be less predictable than in the constant-frequency case. Nevertheless, it is possible that Jytter would mistake the same events for different ones at either side of the frequency transition(s).)

1.2. Initial Timestamp Counter (TSC) Uncertainty

Now, Jytter is designed to provide sufficient randomness so as not to reduce, in any statistically significant way, its set of possible outputs, when run in the presence of an Internet-remote attacker. (A local attacker can do anything to the physical machine, so randomness quality is of the least concern in such case.) Given variances in boot time and Internet latency, it's hard to imagine that an attacker would know the TSC value to better than about 1 part in 2^24. (This is a much larger envelope than Network Time Protocol (NTP) would suggest; however, there is only a weak association between the time-of-day clock and the TSC on most operating systems.) Machines which boot faster, up to the point at which Jytter is first executed, and machines which contain less timing variability (such as those featuring solid-state media, or minimal network interaction) will of course have less uncertainty in their TSC state than would otherwise be the case. Still, 2^24 ticks is only ~10ms on a modern CPU. Faster CPUs will only increase the uncertainty, as network latency is not improving as fast as core speed.

Given this, prior to running each behavioral characterization iteration of Jytter, the TSC should be zeroed above bit 23, reflecting the "zero point uncertainty" described above. Because a TSC write operation cannot be performed in user space, it is recommended that the characterization occur in a customized kernel. Failing that, Jytter could be modified to subtract the starting TSC value (above bit 23 only) from each TSC read, which would of course modify its exact execution path. Should Jytter prove sufficiently robust under such quiescent conditions, then it will certainly stand up in a noisier system.

The TSC masking code is as follows. Note that it destroys EAX, EDX, and ECX (or RAX, RDX, and RCX, in a 64-bit environment).

  rdtsc
  and eax, (1<<24)-1
  xor edx, edx
  mov ecx, 0x10
  wrmsr

If you think that the attack uncertainty is greater than 24 bits, then increase the "24" in the code above. Or decrease it in the opposite case. In any event, do this prior to generating each true random number. Note that the TSC value is stored in Model-Specific Register 0x10, hence the value in ECX.

1.3. Driver Event Noise Minimization

The more unpredictable the devices which are connected to a machine, the better Jytter's input noise quality will be. In this case, the intent of characterization is to show acceptable performance under worst-case circumstances. For this reason, it is suggested that the experiment be run on a relatively quiescent system, as described in 1.1. Ideally, this system would be the minimally complex system on which Jytter is intended to be deployed, in the presence of only such other applications or operating system functions as would be expected to be running concurrently in all reasonable use scenarios. For example, the network driver may always be running concurrently with Jytter, but the audio input device may not be, so don't run the latter in the experiment. Similarly, while the network driver may be running, it should probably be disconnected from the network, for greater quiescence.

2. Statistical Methods

2.1. Ideal Case

As mentioned in 1.0, the "right" way to characterize Jytter is to approximate how often each of its 2^32 possible output states occurs under the most timing-predictable realistic use scenarios. If, in fact, only 2^29 of those states turned out to occur, then we would say that Jytter is good for 29 bits of randomness, as opposed to 32.

Unfortunately, such a brute-force experiment is too resource-intensive to be realistic.

2.2. Basic Noise Quality Tests

Let's assume for a moment that Jytter produces 32 bits of perfect noise per iteration, i.e. it's an unbiased 32-bit TRNG. In other words, none of its 2^32 possible outputs will tend to occur more than any other, if sampled over a sufficiently large number of iterations.

It's assumed that any Jytter characterization would involve the usual tests of unbiased noise quality: tests for the bias of any bit, tests for biases in the differences between successive outputs, and tests for the correlation between pairs of bits. Other good tests may come to mind.

Unfortunately, some TRNGs do very well on the above tests, but when observed over many iterations, turn out to heavily favor certain 32-bit states over others.

One excellent test, which can easily reveal a bad TRNG which passes the above tests, is what I call the Limit Median Iteration Count Before Repetition Test.

2.3. Limit Median Iteration Count Before Repetition (LMICBR) Test

[EDIT: If this is of interest to you, then see also the Scintillating Entropy Test.]

Think of an ideal 1-bit TRNG. It would produce a 0 half of the time and a 1 half of the time. Put another way, as you can see here, the average number of 0s in a row will be 2, starting from the 1st 0 immediately following a 1. That is to say, its "iteration count before repetition" (ICBR) will have an average of 2 over many iterations. Any deviation therefrom, measured over a large number of iterations, would imply that the randomness value of an iteration would be less than 1 bit, and thus that the TRNG is not in fact ideal.

Now, the median of a sorted set consisting of an odd number of integers, is simply the middle integer. For example, consider the set {-2, 0, 5, 8, -2, 1, 5}, which has an odd number of members (7). Sorted, it looks like: {-2, -2, 0, 1, 5, 5, 8}. The median is 1 because 3 integers are below it, and 3 are above. We could add 2 more 1s to the set, and the median would still be 1. In other words, the median need not be unique.

The "limit median" of such a set is herein defined as the value of the middle integer as the number of members in the set approaches (odd) infinity.

Consider a set consisting of an odd number of ICBRs. For a 1-bit TRNG, it might look like {2, 3, 1, 2, 9, 1, 4}. In this case, the median is 2. However, in the limit that the trial count approaches (odd) infinity, there is no predictable median, because 1 and 2 occur with equal probability (as median ICBRs; not as ICBRs per se, in which case 1 is 50% likely and 2 is 25% likely). However, if we consider n-bit TRNGs, n>1, then we do not have this problem, as we will show below.

Consider then, an n-bit TRNG, n>1. Consider a sorted set of its observed ICBRs, as the size of this set approaches (odd) infinity. Then we can exactly predict its median, which we call the "limit median iteration count before repetition" (LMICBR).

Given n, we need to find the least value m, such that the probability of surviving at least m+1 iterations without repeating any outputs is <0.5. (For n>1, there is no such probability exactly equal to 0.5 -- unlike for n=1 -- as shown below.) Because less than half of the ICBRs will be >= (m+1), the LMICBR is then simply m. (More formally, we can think of m as the (noninteger) expected number of iterations after which the survival probability is exactly 0.5, but that adds little value to our purposes here, and in any event is immaterial in the asymptotic case.)

2.3.1. Computing the LMICBR of a TRNG

Consider a sequence of outputs of an n-bit TRNG, n>1:

What's the probability, p, that the 1st output will be unique? 100%, of course!

  p1 = 1

What's the probabilty that the 2nd output will not equal the 1st?

  p2 = 1-(1/(2^n))

And the chance that the 3rd will match neither the 1st nor the 2nd?

  p3 = 1-(2/(2^n))

So if no iterations have yet occurred, then the probability that we will make
it through 3 iterations without generating a repeated value is:

  p = p1p2p3

  p = (1)(1-(1/(2^n)))(1-(2/(2^n)))

You get the idea. Generalizing, the probability that we will make it through
m+1 iterations without generating a repeated value is:

  p = 1 if m = 0, else product(1-(i/(2^n))) for i=(1 through m),

Alternatively:

  p = 1 if m = 0, else product((2^n)-i) for i=(1 through m)/(2^(mn))

The LMICBR of an n-bit TRNG is thus the value of m for which p is as close as possible to 0.5, without being >0.5. (The product above cannot generate a probability of exactly 0.5 because (2^n-1) is not a power of 2 for n>1. So we need not consider that case.)

For a 32-bit TRNG such as Jytter, Wolfram Alpha shows here and here that 77,163 is the least value of m for which p<0.5. This implies that the fraction of output sequences of lengths greater than >=77,164, which contain only unique values, is <0.5. Thus the LMICBR is 77,163.

2.3.2. Table of LMICBRs of n-Bit TRNGs

These values were obtained using interval-based floating point math, using the formula described in Section 2.3.1. They are exactly correct, based on the above definition of an LMICBR.

  (n, LMICBR)

  2, 2
  3, 3
  4, 4
  5, 6
  6, 9
  7, 13
  8, 19
  9, 26
  10, 37
  11, 53
  12, 75
  13, 106
  14, 150
  15, 213
  16, 301
  17, 426
  18, 603
  19, 852
  20, 1205
  21, 1705
  22, 2411
  23, 3410
  24, 4822
  25, 6820
  26, 9645
  27, 13640
  28, 19290
  29, 27281
  30, 38581
  31, 54562
  32, 77163
  33, 109124
  34, 154325
  35, 218249
  36, 308651
  37, 436498
  38, 617302
  39, 872997
  40, 1234604
  41, 1745993
  42, 2469208
  43, 3491987
  44, 4938415
  45, 6983974
  46, 9876831
  47, 13967948
  48, 19753662
  49, 27935897
  50, 39507324
  51, 55871794
  52, 79014649
  53, 111743588
  54, 158029298
  55, 223487176
  56, 316058596
  57, 446974353
  58, 632117192
  59, 893948707
  60, 1264234385
  61, 1787897413
  62, 2528468770

2.3.3. LMICBR and TRNG Quality

If the ICBR of an TRNG is tested over a large number of iterations, thereby affording an accurate estimate of its LMICBR, then its quality can be measured in terms of the number of bits which that TRNG is worth. For example, Jytter should ideally produce an LMICBR of around 77K. But perhaps it produces only 7K (which would be astoundingly bad). Looking at the table above, we find that an LMICBR of 7K is worth about 25 bits. In other words, Jytter seems to be getting attracted to 2^25 of its 2^32 possible outputs. Call it twice, xoring the outputs together, and you probably have a very unbiased 32-bit value.

At the other extreme, it's possible to generate LMICBRs which are greater than the ideal values listed above. (Think of an incrementing counter trying to pass itself off as an TRNG.) In this case, the TRNG probably needs to be redesigned.

Note that the shape of ICBR histogram is also of import, because a pseudorandom number generator could easily be constructed which has a cycle length equal to its ideal LMICBR! The ideal ICBR histogram can be derived from the equation for p in 2.3.1. (Basically, the horizontal axis is m and the vertical axis is (p(m)-p(m+1)).)

2.3.4. The Mysterious LMICBR Constant

Look at the table in 2.3.2. Note that as the number of bits grows, the ratio between successive LMICBRs appears to approach the square root of 2. Why that is the case, is a fascinating mystery to this author that might be solved by examination of the binomial products involved...

No comments:

Post a Comment