If you’re a software person, read the post. Linus Åkesson, this year’s winner, is brilliant, and I want you to appreciate just how much.
If you aren’t, stick around, because you should appreciate Mr. Åkesson’s brilliance, too. I’m going to explain this in layman’s terms. The International Underhanded C Code Competition, or IOUCCC, is an annual competition where software engineers and computer scientists from all the world over compete to write the most innocent-looking yet fiendishly guilty C programs possible. This year’s challenge is as follows:
There are two countries, which have signed a nuclear disarmament treaty. Each wants to carry out inspections to ensure that the other side is decommissioning real bombs, and not fakes, but neither side wants to expose the actual gamma ray spectra produced by their real bombs—that’s sensitive information on their nuclear programs. So, they have need of a tool which takes an observed spectrum and compares it to a reference spectrum, returning either yes (if the observed spectrum matches the reference spectrum, and is therefore a bomb), or no (if that is not so). Here’s the underhanded bit: one country wants to cheat. They want to keep more bombs than the treaty permits, but they have to fool the inspectors—that is, they have to figure out a way to show inspectors fake decommissioned bombs, which nevertheless yield a ‘yes’ when inspected with the tool. They can’t just fake the reading, because the reading has two parts: first, the placement of the peaks in the spectrum and their relative sizes, which only come from the specific mixes of fissile materials used in bombs; and second, the overall radioactive energy level of the bomb, which only comes from having a large amount of radiation coming from the bomb. It’s easy to replicate the peaks with a small amount of fissile material, and easy to replicate the total energy with an x-ray source, but impossible to combine them to look like a bomb without being a bomb.
We need to establish some background information about how spectra appear to digital tools. The way you collect a spectrum is to point a detector at a radioactive source of some sort. Energetic particles strike the detector face. An event is recorded, and deposited into a ‘bin’ based on the energy of the event. The collection of bins composes the spectrum. So, a spectrum, to a computer, looks like this: energies between 0 and 100, 20 events; energies between 101 and 200, 40 events; energies between 201 and 300, 153 events; and so on. In C, the easiest way to represent this is an array—that is, a region of computer memory which holds a list of numbers all in a row. Since, to a computer, numbers are a fixed number of bits (binary digits), the numbers don’t need to have any space between them; number N+1 is a fixed length from the start of number N.
We also need to talk briefly about floating point numbers. Floating point numbers are how computers represent decimals (generally speaking). Since there are an infinite number of real numbers between any two real numbers, a computer obviously can’t represent them precisely. Instead, it uses scientific notation, akin to this example: , or 120. (In binary, obviously, but that’s not important yet.) We call the 1.2 part the mantissa, and the 2 the exponent. (For a computer, the base of the exponent is always two, because of binary.) Moreover, they are arranged like this: sign bit, exponent bits (highest place to lowest place), mantissa bits (highest place to lowest place), and the mantissa is always assumed to be one, unless the exponent is zero. (Instead of writing 1.2, we just write .2, and remember that we usually put a one before the decimal point.)
Finally, we need to discuss how one might want to compare two spectra. First, we’ll introduce a quantity called the dot product: if you have two lists of numbers of the same length, and go number-by-number, multiplying the first by the first, the second by the second, the third by the third, and so on, then add all of those products, the number you get is called the dot product. If you take the dot product of a list of numbers with itself, then take the square root of the dot product, you get the magnitude. A list of numbers, each entry in which has been divided by the list’s magnitude, is said to be normalized. Finally, if you take the dot product of two normalized lists of numbers, you get a number between 0 and 1. If you interpret the lists of numbers as multidimensional vectors (remember your geometry?) the number you just got is the cosine of the angle between them. For our purposes, it’s a number called the spectral contrast angle. If it’s 1, the two vectors are completely identical. If it’s -1, they’re completely different. So, when we get our two spectra as lists of numbers, we want to calculate the spectral contrast angle. If it’s sufficiently similar, then the spectra match.
So, Mr. Åkesson’s program takes two lists of numbers, representing spectra. It runs some preprocessing steps: smoothing the data, taking its first-order derivative to remove any constant-slope noise1, and smoothing the derivative to obtain a spectral fingerprint: a representation which contains the locations and relative heights of the peaks. Before going any further, it checks to see whether the total energy in the spectrum is above a threshold—a decommissioned bomb will still have a small signature, because of the remains of the fissile materials inside—then hands off the data to another source code file, which handles normalizing the spectra and doing the dot product stuff.
The underhandedness comes in that handoff. The data is defined as a list of numbers of type float_t. float_t is, in turn, defined in a header file (a file which holds definitions) as a double, a 64-bit (i.e. double-precision) floating point number, of which one bit is the sign of the number, 11 bits are the exponent, and the remaining bits are the mantissa. It is important to note a few things about doubles and Mr. Åkesson’s code: first, data in a double-precision number is left-justified. In cases where a given number can be expressed with sufficient precision, the bits on the right-hand side of the double’s position in memory are set to 0, unused. Second, the values in this particular program are generally fairly small—no more than a few thousand. Third, the preprocessing has the effect of making the numbers smaller: a derivative, as a rate of change, will always be smaller than the total value, except when it changes instantly. So, after the preprocessing, you have a list of double-precision floating point numbers, whose values range from about 0 to 100. Representing these numbers is simple, and doesn’t require using very much of the 64-bit length of the number.
The list is passed to the second file as the argument to a function, still a list of numbers of type float_t. The thing is, in this file, float_t means something else. This file does not include the definitions header from the first file, because it doesn’t need it. Instead, it includes a standard math library header file, and the math library header file defines float_t as a 32-bit (i.e. single-precision) floating point number, of which one bit is a sign bit, eight bits are the exponent, and the remaining bits are the mantissa. So, instead of a list of 64-bit numbers of length n, you have a list of 32-bit numbers of length 2n—but you’ve told the code you have a list of size n, so the code only looks at n numbers. That is, the code only looks at the first half of the spectrum. This lets you evade the total energy threshold without having to have a bunch of radioactive material in the bomb: place an x-ray source in your fake bomb. The total energy goes up so that it passes the threshold, and it won’t be considered in the peak-matching code, because it’s on the right side (the second half, the high-energy side) of the spectrum.
You still have to get the peak matching right, but the underhandedness has a second, subtler component which will help. It has to do with what happens when you convert a 64-bit floating point number into a 32-bit floating point number. Remember that, when a floating-point number doesn’t need its full length to express a number, all of the significant bits are on the left side of the number. Therefore, representing numbers between 1 and 100, especially whole numbers (which the preprocessing helps to ensure) doesn’t take anywhere near 64 bits. It doesn’t even take 32 bits. So, when you turn an array of 64-bit numbers into an array of 32-bit numbers, you split each 64-bit number in half. The second half is all zero bits, and ends up being zero. Since this happens to both the reference and the observed spectra, every other number in both arrays ends up being zero, and zero is the same as zero, so we don’t need to look at them any more.
The first half is the interesting half. The sign bit works exactly the same. The exponent, though, gets shortened from 11 bits to 8 bits, and the three bits that we leave off are the highest bits. Removing the low three bits moves the high bit over three places, which is the same as dividing by eight2. The three bits which were the end of the exponent become the beginning of the mantissa. This is fine: it preserves the relative sizes of numbers. (The least significant digits of the exponent are more significant than the most significant bits of the mantissa; you’ll just have to trust me here, or do some examples in base 10 yourself, following the rules I gave for floating point numbers.)
So what we’ve done is dramatically reduced the sizes of each energy level bin in the spectrum. Furthermore, we’ve also compressed their dynamic range: the difference between the lowest peaks and the highest peaks is much smaller, and so they resemble each other much more closely. The recipe for a fake bomb, then, is to take a small amount of fissile material and a powerful x-ray source, and put them in the casing. The x-ray source takes care of the total energy level, and the dynamic range compression from the 64-bit to 32-bit switch takes care of the low-energy peaks.
Congratulations! You just convinced the other country that you’re about to decommission a real bomb, when in actuality, it’s a fake.
Congratulations also to Linus Åkesson, who wrote such a clever piece of deception into a mere 66 lines, and did so with such panache I could not resist writing on it.
1. This appears in real-world spectra, because there’s naturally a lot more low-energy noise than high-energy noise. In support of this assertion, I offer the evidence that you are not suffering from radiation sickness right now. (Probably.)
2. 1000 is 8, in binary. (From right to left, you have the 1s-place, the 2s-place, the 4s-place, and the 8s-place.) Remove three zeros, and you have a 1 in the 1s place: 1.