Another type of line coding, Data Whitening or Scrambling, addresses the issue that radio transmission requires the data bits to alternate often.
What It Does
Whitening addresses all three major issues:
- DC-bias is less likely. Unlike Manchester Coding, a DC-bias is not guaranteed to be removed, but it is statistically improbable.
- Long bit runs are less likely. Some long runs can still occur with whitening, but the average run length is reduced. More importantly, bit transitions on the input and output are decoupled from each other, so long bit runs in the input data do not cause long bit runs in the output.
- Power spectral density is more likely to be evenly distributed. Whitening tends to distribute the data evenly across the radio channel’s frequency bandwidth, allowing the transmitter to run at a higher power without violating FCC regulations.
- No extra data bandwidth is used. Manchester encoding doubles the data bandwidth: every byte in is two bytes out. Whitening does not: every byte in corresponds to one byte out.
How It Does It
Whitening is so named because it introduces a pseudo-random element that makes the output appear more like white noise – uniformly distributed.
Whitening is typically implemented by using a maximal Linear-Feedback Shift Register (LFSR) to generate a repeating, pseudo-random pattern of bits that are XORed with the input data. Since XOR is reversible, the receiver can remove the ‘noise’ from the data with its own LFSR that is synced to the transmitter’s.
An LFSR is a shift-register: a sequence of bits that are shifted down on each cycle, and the ‘output’ bit (the one that would be shifted out of the register) is XORed with some of the higher bits in the register and shifted back into the top as the ‘input’ bit.
An example LFSR used for whitening 8-bit data is a 9-bit shift register with XOR taps on the 0th and 5th bits. That is, the bit being shifted off (index 0) is XORed with the bit at index 5 and fed back into the top, index 8.
That particular configuration implements a ‘maximal LFSR’, which means it cycles through every possible combination of a 9-bit number, except 0, with no repeats. The example LFSR starts over every 511 cycles. It must be seeded with some non-zero number. An LFSR filled with all 0s gets ‘stuck’ and never recovers.
Maximal LFSRs have some neat mathematical properties regarding the length of bit runs. For an N-bit maximal LFSR, there will be one run of N bits, one run of N-1 bits, …, ¼th of all runs will be 2 bits, and ½ of all runs will be 1 bit.
Input bytes are XORed with the least-significant 8-bits of current value of the LFSR register, and then the LFSR register is shifted 8 times. The pseudo-random distribution from the LFSR mixes up the bits of the input data across the 8-bit range, so big blocks of identical bytes (i.e. 0x00 or 0xFF padding) gets spread out.
An LFSR improves the resulting radio transmission on average, but does not guarantee anything. Two specific cases should be considered:
- If your input data is exactly the output of the LFSR register, each data byte will be XORed with itself. The output stream will be 100% zeroes. This kills the radio.
- If your input data is exactly the complement of the output of the LFSR register, each data byte will be XORed with its complement. The output stream will be 100% ones. This kills the radio.
This script is not intended to be fast, readable, or perfect, but does some statistical analysis of whitening data with the 9-bit LFSR described above. Generated images are shown at the bottom of this post.
The following images show random bytes generated with various probability distributions before and after whitening. Note that, in all cases, whitening results in more uniform distribution of bytes.