Tuesday, November 27, 2012

Jytter Ported to Mac OS X

Thanks in part to a tweet from Dan Kaminsky, one kind soul has done the work, so you don't have to. Instructions are posted on his site. Just remember that "_32_" is for 32-bit build environments, so you may have to use "_64_".

Friday, August 24, 2012

C Source Code, Version 1

Sorry, this has been superceded by the latest version.

Characterization

This post won't make sense unless you've read the previous one.

Randomness quality is ultimately a subjective consideration, as it's always possible that nonobvious patterns lurk below the surface of a TRNG which make it appear much better than it is, or that obvious patters would emerge during periods of atypical quiescence in the underlying physical system. As such, the quality of any TRNG should be evaluated under circumstances of maximum quiescence, e.g. just after cold boot, during periods of maximum device idleness, etc.

Jytter is designed to compensate for quiescence by accumulating entropy for as long as necessary to ensure quality (most importantly, negligible deviation from the reflexive density constant). Indeed, it runs much faster in a virtual machine under Windows (high timing entropy) than on a minimal Linux installation (low timing entropy). For a cool demonstration of this systemic entropy response effect, compare the time taken to generate 1000 random numers with and without a video playing in the background.

Before I discovered the reflexive density constant (in a personal sense, as others probably knew about it before me), I had devised a related test, described below as the LMICBR test. It can be considered as a useful adjunct to reflexive density constant compliance and other tests such as the famous Diehard Tests. Also discussed below are some nonobvious sources of systemic noise which should be attenuated to the maximum possible extent that could possibly occur during actual usage scenarios, prior to evaluating any TRNG.

The text below was adapted from a white paper I wrote on October 15, 2011, but didn't get around to publishing until now.

1. Introduction

In order to fully characterize Jytter, even on a single model of CPU, we would need to run it trillions of times, as it can produce over 4 billion unique outputs (2^32, to be exact), and we would need to obtain some statistically significant measure of the frequency of each output, all of which should be asymptotically identical in the ideal case. In practice, because the execution time of one iteration of Jytter is on the order of 10 100ms, regardless of CPU speed, it would take too many machines and/or too much time to conduct this experiment.

We present herein a quick and dirty shortcut that can give us substantial statistical insight in a matter of machine-weeks. This is described below, after we present some necessary considerations pertinent to the characterization process.

2. Experimental Quality Safeguards

1.0. CPU Model Variability

Clearly, every model of CPU has its own timing idiosyncracies. Jytter should be characterized on as many as possible, relative to those on which it would be expected to run. For the paranoid, new CPU models should continue to be characterized as they become available.

1.1. Core Frequency Throttling

Empirically, CPUs maintain a constant timestamp counter (TSC) tick rate, regardless of whether or not they are running at maximum core frequency. They do this so that physical time appears to pass at the same rate, insofar as TSC clients (such as Jytter) are concerned, regardless of core frequency. This is obviously important when time-sensitive processes are being run.

The downside of this fact, as applies to Jytter, is that, when the CPU is operating at less than maximum frequency, the low bits of the TSC become more predictable, if not constant. Worse still, the throttling factor is unpredictable: while 2 or 4 is common, 2^10 is not out of the question during periods of extreme idleness. Even odd divisors such as 3 can occur, making the low bits of the TSC nonconstant, but still highly predictable.

Because Jytter must be competent to run, regardless of CPU clock frequency, any experiment seeking to ascertain Jytter's output noise quality should be run at all supported CPU frequencies. For maximum effect, it should be run during CPU frequency transitions ("throttling" events) as well. (In practice, this is probably of little concern, because a frequency transition inevitably entails at least as much event duration variability as does constant-frequency execution, as such transitions severely disrupt the frequency and phase relationships between the CPU core and external devices. So the generated noise should be less predictable than in the constant-frequency case. Nevertheless, it is possible that Jytter would mistake the same events for different ones at either side of the frequency transition(s).)

1.2. Initial Timestamp Counter (TSC) Uncertainty

Now, Jytter is designed to provide sufficient randomness so as not to reduce, in any statistically significant way, its set of possible outputs, when run in the presence of an Internet-remote attacker. (A local attacker can do anything to the physical machine, so randomness quality is of the least concern in such case.) Given variances in boot time and Internet latency, it's hard to imagine that an attacker would know the TSC value to better than about 1 part in 2^24. (This is a much larger envelope than Network Time Protocol (NTP) would suggest; however, there is only a weak association between the time-of-day clock and the TSC on most operating systems.) Machines which boot faster, up to the point at which Jytter is first executed, and machines which contain less timing variability (such as those featuring solid-state media, or minimal network interaction) will of course have less uncertainty in their TSC state than would otherwise be the case. Still, 2^24 ticks is only ~10ms on a modern CPU. Faster CPUs will only increase the uncertainty, as network latency is not improving as fast as core speed.

Given this, prior to running each behavioral characterization iteration of Jytter, the TSC should be zeroed above bit 23, reflecting the "zero point uncertainty" described above. Because a TSC write operation cannot be performed in user space, it is recommended that the characterization occur in a customized kernel. Failing that, Jytter could be modified to subtract the starting TSC value (above bit 23 only) from each TSC read, which would of course modify its exact execution path. Should Jytter prove sufficiently robust under such quiescent conditions, then it will certainly stand up in a noisier system.

The TSC masking code is as follows. Note that it destroys EAX, EDX, and ECX (or RAX, RDX, and RCX, in a 64-bit environment).

  rdtsc
  and eax, (1<<24)-1
  xor edx, edx
  mov ecx, 0x10
  wrmsr

If you think that the attack uncertainty is greater than 24 bits, then increase the "24" in the code above. Or decrease it in the opposite case. In any event, do this prior to generating each true random number. Note that the TSC value is stored in Model-Specific Register 0x10, hence the value in ECX.

1.3. Driver Event Noise Minimization

The more unpredictable the devices which are connected to a machine, the better Jytter's input noise quality will be. In this case, the intent of characterization is to show acceptable performance under worst-case circumstances. For this reason, it is suggested that the experiment be run on a relatively quiescent system, as described in 1.1. Ideally, this system would be the minimally complex system on which Jytter is intended to be deployed, in the presence of only such other applications or operating system functions as would be expected to be running concurrently in all reasonable use scenarios. For example, the network driver may always be running concurrently with Jytter, but the audio input device may not be, so don't run the latter in the experiment. Similarly, while the network driver may be running, it should probably be disconnected from the network, for greater quiescence.

2. Statistical Methods

2.1. Ideal Case

As mentioned in 1.0, the "right" way to characterize Jytter is to approximate how often each of its 2^32 possible output states occurs under the most timing-predictable realistic use scenarios. If, in fact, only 2^29 of those states turned out to occur, then we would say that Jytter is good for 29 bits of randomness, as opposed to 32.

Unfortunately, such a brute-force experiment is too resource-intensive to be realistic.

2.2. Basic Noise Quality Tests

Let's assume for a moment that Jytter produces 32 bits of perfect noise per iteration, i.e. it's an unbiased 32-bit TRNG. In other words, none of its 2^32 possible outputs will tend to occur more than any other, if sampled over a sufficiently large number of iterations.

It's assumed that any Jytter characterization would involve the usual tests of unbiased noise quality: tests for the bias of any bit, tests for biases in the differences between successive outputs, and tests for the correlation between pairs of bits. Other good tests may come to mind.

Unfortunately, some TRNGs do very well on the above tests, but when observed over many iterations, turn out to heavily favor certain 32-bit states over others.

One excellent test, which can easily reveal a bad TRNG which passes the above tests, is what I call the Limit Median Iteration Count Before Repetition Test.

2.3. Limit Median Iteration Count Before Repetition (LMICBR) Test

[EDIT: If this is of interest to you, then see also the Scintillating Entropy Test.]

Think of an ideal 1-bit TRNG. It would produce a 0 half of the time and a 1 half of the time. Put another way, as you can see here, the average number of 0s in a row will be 2, starting from the 1st 0 immediately following a 1. That is to say, its "iteration count before repetition" (ICBR) will have an average of 2 over many iterations. Any deviation therefrom, measured over a large number of iterations, would imply that the randomness value of an iteration would be less than 1 bit, and thus that the TRNG is not in fact ideal.

Now, the median of a sorted set consisting of an odd number of integers, is simply the middle integer. For example, consider the set {-2, 0, 5, 8, -2, 1, 5}, which has an odd number of members (7). Sorted, it looks like: {-2, -2, 0, 1, 5, 5, 8}. The median is 1 because 3 integers are below it, and 3 are above. We could add 2 more 1s to the set, and the median would still be 1. In other words, the median need not be unique.

The "limit median" of such a set is herein defined as the value of the middle integer as the number of members in the set approaches (odd) infinity.

Consider a set consisting of an odd number of ICBRs. For a 1-bit TRNG, it might look like {2, 3, 1, 2, 9, 1, 4}. In this case, the median is 2. However, in the limit that the trial count approaches (odd) infinity, there is no predictable median, because 1 and 2 occur with equal probability (as median ICBRs; not as ICBRs per se, in which case 1 is 50% likely and 2 is 25% likely). However, if we consider n-bit TRNGs, n>1, then we do not have this problem, as we will show below.

Consider then, an n-bit TRNG, n>1. Consider a sorted set of its observed ICBRs, as the size of this set approaches (odd) infinity. Then we can exactly predict its median, which we call the "limit median iteration count before repetition" (LMICBR).

Given n, we need to find the least value m, such that the probability of surviving at least m+1 iterations without repeating any outputs is <0.5. (For n>1, there is no such probability exactly equal to 0.5 -- unlike for n=1 -- as shown below.) Because less than half of the ICBRs will be >= (m+1), the LMICBR is then simply m. (More formally, we can think of m as the (noninteger) expected number of iterations after which the survival probability is exactly 0.5, but that adds little value to our purposes here, and in any event is immaterial in the asymptotic case.)

2.3.1. Computing the LMICBR of a TRNG

Consider a sequence of outputs of an n-bit TRNG, n>1:

What's the probability, p, that the 1st output will be unique? 100%, of course!

  p1 = 1

What's the probabilty that the 2nd output will not equal the 1st?

  p2 = 1-(1/(2^n))

And the chance that the 3rd will match neither the 1st nor the 2nd?

  p3 = 1-(2/(2^n))

So if no iterations have yet occurred, then the probability that we will make
it through 3 iterations without generating a repeated value is:

  p = p1p2p3

  p = (1)(1-(1/(2^n)))(1-(2/(2^n)))

You get the idea. Generalizing, the probability that we will make it through
m+1 iterations without generating a repeated value is:

  p = 1 if m = 0, else product(1-(i/(2^n))) for i=(1 through m),

Alternatively:

  p = 1 if m = 0, else product((2^n)-i) for i=(1 through m)/(2^(mn))

The LMICBR of an n-bit TRNG is thus the value of m for which p is as close as possible to 0.5, without being >0.5. (The product above cannot generate a probability of exactly 0.5 because (2^n-1) is not a power of 2 for n>1. So we need not consider that case.)

For a 32-bit TRNG such as Jytter, Wolfram Alpha shows here and here that 77,163 is the least value of m for which p<0.5. This implies that the fraction of output sequences of lengths greater than >=77,164, which contain only unique values, is <0.5. Thus the LMICBR is 77,163.

2.3.2. Table of LMICBRs of n-Bit TRNGs

These values were obtained using interval-based floating point math, using the formula described in Section 2.3.1. They are exactly correct, based on the above definition of an LMICBR.

  (n, LMICBR)

  2, 2
  3, 3
  4, 4
  5, 6
  6, 9
  7, 13
  8, 19
  9, 26
  10, 37
  11, 53
  12, 75
  13, 106
  14, 150
  15, 213
  16, 301
  17, 426
  18, 603
  19, 852
  20, 1205
  21, 1705
  22, 2411
  23, 3410
  24, 4822
  25, 6820
  26, 9645
  27, 13640
  28, 19290
  29, 27281
  30, 38581
  31, 54562
  32, 77163
  33, 109124
  34, 154325
  35, 218249
  36, 308651
  37, 436498
  38, 617302
  39, 872997
  40, 1234604
  41, 1745993
  42, 2469208
  43, 3491987
  44, 4938415
  45, 6983974
  46, 9876831
  47, 13967948
  48, 19753662
  49, 27935897
  50, 39507324
  51, 55871794
  52, 79014649
  53, 111743588
  54, 158029298
  55, 223487176
  56, 316058596
  57, 446974353
  58, 632117192
  59, 893948707
  60, 1264234385
  61, 1787897413
  62, 2528468770

2.3.3. LMICBR and TRNG Quality

If the ICBR of an TRNG is tested over a large number of iterations, thereby affording an accurate estimate of its LMICBR, then its quality can be measured in terms of the number of bits which that TRNG is worth. For example, Jytter should ideally produce an LMICBR of around 77K. But perhaps it produces only 7K (which would be astoundingly bad). Looking at the table above, we find that an LMICBR of 7K is worth about 25 bits. In other words, Jytter seems to be getting attracted to 2^25 of its 2^32 possible outputs. Call it twice, xoring the outputs together, and you probably have a very unbiased 32-bit value.

At the other extreme, it's possible to generate LMICBRs which are greater than the ideal values listed above. (Think of an incrementing counter trying to pass itself off as an TRNG.) In this case, the TRNG probably needs to be redesigned.

Note that the shape of ICBR histogram is also of import, because a pseudorandom number generator could easily be constructed which has a cycle length equal to its ideal LMICBR! The ideal ICBR histogram can be derived from the equation for p in 2.3.1. (Basically, the horizontal axis is m and the vertical axis is (p(m)-p(m+1)).)

2.3.4. The Mysterious LMICBR Constant

Look at the table in 2.3.2. Note that as the number of bits grows, the ratio between successive LMICBRs appears to approach the square root of 2. Why that is the case, is a fascinating mystery to this author that might be solved by examination of the binomial products involved...

Theory

[EDIT: If you're interested in this sort of thing, then you might want to read my blog.]

Jytter is a true random number generator (TRNG) which conforms to the reflexive density constant and runs in X86 and X64 userspace, but could easily be adapted to other processor architectures. It uses no IO and, apart from saving and restoring registers at the beginning and end of the code, no memory whatsoever. Whereas existing TRNGs tend to use elaborate physical phenomena, Jytter uses only the CPU core timer, in such as a way as to ensure the accrual of a certain level of entropy before returning its result.

If you're looking for a very fast pseudorandom number generator which also conforms to the reflexive density constant, have a look at LMD2 and its long nonzero sequence adaptation.

The text below was adapted from a white paper I wrote on October 15, 2011, but didn't get around to publishing until now.

1. Introduction

A TRNG converts random physical information to random virtual information, ideally, producing uniformally distributed random numbers.

Historically, TRNGs have tended to measure macroscopic phenomena, such as network states. Such operations can require substantial time due to operating system calls and IO requirements. Worse, they can suffer from high predictability during quiescent periods, for example, shortly after cold boot, or during idle hours of the night. Ultimately, they may be impossible to run under certain scenarios due to hardware malfunctions or lack of execution privilege.

Jytter has been designed from the opposite philosophical perspective: find the simplest, most accessible source of randomness available to userspace, and exploit it in the simplest possible algorithm which will yield randomness of a certain quality. The point is to be able to characterize the randomness under conditions of maximum quiescence -- not to create an algorithm so complex that it appears to produce high quality randomness under typical use scenarios.

2. Timer Phase and Interrupt Duration Vectors

A computer may be crudely modeled as a CPU, a memory, and a set of external periodic timers. A "timer" may actually be a harddrive, which interrupts every 128 ms in order to inform the CPU that another block has been read into memory. Or perhaps it's a network device, which interrupts every 71 ms to indicate that another packet has been received. (Of course these devices are not so precisely periodic in real life, but that fact only improves randomness quality, so let's ignore it.)

We can think of many of these "timers": (1) harddrives, (2) network devices, (3) time-of-day clocks, (4) video devices, (5) sound devices, (6) USB controllers, etc.

Imagine that each timer is firing with perfect regularity. Suppose, furthermore, that the CPU contains a timer (an actual clock, as opposed to one of the devices above) which ticks once every ms, that period being shorter than any of the external timer periods. Now, the only thing that we don't know is each external timer's phase.

Back to the harddrive that "ticks" every 128 ms. Suppose we take note of the time, and wait for the tick. After 29 ms, it interrupts the CPU. Bingo! We have a true random number, which is the phase of the harddrive interrupt. In this case, the phase would be 99 ms. (99+29=128.) Furthermore, it's good for 7 bits of randomness, because the period is 128 ms.

Realistically, this doesn't quite work, because we don't know the expected period. In userspace, all we see is some lost time due to the interrupt occurring at the kernel level; we don't know which device actually interrupted us because the OS prevents us from knowing that. How can we use this interrupt, then, to provide some randomness?

Well, there are 3 things that we do know from userspace: (1) the timestamp counter (TSC) value when we entered the TRNG, (2) the TSC value when we last observed it (before the interrupt), and (3) the TSC value now (after the interrupt).

First, consider the difference between TSC values (3) and (2). This difference is a determinant of whether or not we have observed this particular interrupt before. Specifically, if we've seen this same difference before, then it's possible that we've seen this same interrupt before, in which case, we ignore it. But if the difference has never been observed before, then there is something new about this interrupt. It may in fact be due to the same physical source as a prior interrupt (e.g. the harddrive), but traffic congestion on the way to the CPU has caused it to take longer to complete, which in itself adds randomness -- maybe only an extra bit, but more randomness nonetheless. And by the way, many of the shorter-duration events will be due to sources other than quasiperiodic device interrupts, such as or memory line loads or CPU pipeline stalls. We don't care. The same reasoning applies.

Secondly, think back to the phases of the timers above. While we don't know the phase of this interrupt (because we don't know the period), we do in fact have equivalent information, which is the difference between TSC values (3) and (1). If we evaluate the product of (TSC at (3) minus TSC at (1)) and (the interrupt duration just measured above, which is TSC at (3) minus TSC at (2)), then we have an intermediate true random value incorporating timer phase and interrupt duration of one particular "timer". In other words, we consider (TSC at (1)) to be 0. (In practice, we don't bother to subtract it from (TSC at (3)) because doing so is a waste of time. But when characterizing Jytter's randomness quality, it's paramount to do so, because we don't want to mistake the continuously changing TSC values for actual randomness.)

It is the linear combination of products of the form (TSC at (3) minus TSC at (1)) and (TSC at (3) minus TSC at (2)) which ultimately serves as the output of the TRNG.

Jytter is a bit more complex, but essentially, it's evaluating the dot product of the phases (TSC at (3) minus TSC at (1)) of several quasiperiodic timers, with their respective interrupt durations (TSC at (3) minus TSC at (2)). In particular, it doesn't do exactly the above; it actually takes a hash of the products before summing them into a dot product, because doing so eliminates the need for all memory access whatsoever, or even conditional execution ("if") paths of different lengths, and thus reduces the incidence of pseudorandom timing masquerading as true random timing, due to its own execution path variances. This means that it does at least as much work as is required in order to obtain randomness of the quality of the above algo. We explore this more below.

3. Implementation

Realistically, Jytter can't properly evaluate the above dot product exactly as proscribed. One reason is that, due to practical limits of characterization, it can only produce 32 bits at a time, but the TSC alone is 64 bits. Another reason is that we don't want Jytter to use memory at all, lest we create internal timing complexity which persuades us that Jytter is less predictable than it actually is; the lack of memory means that we are limited to looking at the low 5 bits of interrupt duration (due to the limitations of a 32-bit bitmap implemented in a single CPU register), so occasionally we confuse 2 different interrupts for one and the same (but again, that only improves randomness because it makes Jytter run longer than necessary).

In its usual implementation, it waits for the occurrence of at least 16 unique interrupt sources (i.e. 16 bits set in the register bitmap) before issuing a result. Thus each such unique event is valued at merely 2 bits of randomness, which is very conservative. (Some events may be worth less than 2 bits of randomness, but the average is probably in excess of 4 bits, so we're safe in this assumption.) Furthermore, it integrates a hash of the observed TSC value on every iteration of its loop, whether or not a new event was observed. This is done in such a way that the exact order of observed interrupt durations matters, even if we're seeing the same 3 previously observed events occur over and over again, while waiting for a new unique interrupt duration to appear. Even very low-level phenomena such as memory loads tend to occur at slightly unpredictable rates, which allows Jytter to benefit from this ordering sensitivity.

Best of all, it can be run in userspace because it merely reads the TSC, and takes only as much time as is required to observe 16 unique interrupt durations (or a few more, on account of the occasional hash collision).