WavPack Technical Description

Introduction

This document is provided for those interested in the technical aspects of WavPack or audio compression in general. It is not detailed enough for implementing WavPack independently or decoding WavPack files, but it should give a clear idea of the technology used for both the lossless and lossy modes. This document is based on WavPack version 3.2 and is therefore quite dated. For more recent technical information, refer to the source code.
In general, lossless compressors of digital signals consist of two parts: a predictor and a coder. The predictor uses some number of previous digital samples to calculate a best guess (or "prediction") of what the next sample will be. The compressor then subtracts this estimate from the actual sample to obtain an "error" value. Finally, these error values (which should be much smaller numbers than the original data) are sent to a coder which converts the data into a optimized bitstream.
The prediction and subtraction operation just described is sometimes know as "decorrelation" because the idea is to remove all sample to sample correlation. In fact, if the predictor were perfect, the output of the error values would be indistinguishable from random noise, which is by definition the absence of correlation.
Once the data values are as small in magnitude as possible, they are sent to the coder for translation into a bitstream. It would seem that the most efficient method would be to simply transmit the significant bits from each sample, and in fact this would be very efficient. However, the problem is that when the decoder tried to interpret this data it would not be able to tell where one sample ended and the next began.
The most popular scheme for transmitting these types of values (at least in lossless audio compressors) is called Rice coding. I won't go into great detail because information is readily available on this method, but the idea is that a buffer of samples is accumulated before transmission. Once the average magnitude of these samples is known, a determination is made as to how many bits of each sample will be transmitted directly, which we will call 'k'. Then, for each sample, the least significant 'k' bits are sent, followed by a number of 1's corresponding to the magnitude of the value that is greater than 'k' bits. Finally, a terminating 0 is sent.

WavPack's Technology

To ensure high-speed operation, WavPack uses a very simple predictor that is implemented entirely in integer math. In its "fast" mode the prediction is simply the arithmetic extrapolation of the previous two samples. For example, if the previous two samples were -10 and 20, then the prediction would be 50. For the default mode a simple adaptive factor is added to weigh the influence of the earlier sample on the prediction. In our example the resulting prediction could then vary between 20 for no influence to 50 for full influence. This weight factor is constantly updated based on the audio data's changing spectral characteristics, which is why it is called "adaptive".
The prediction generated is then subtracted from the actual sample to be encoded to generate the error value. In mono mode this value is sent directly to the coder. However, stereo signals tend to have some correlation between the two channels that can be further exploited. Therefore, two error values are calculated that represent the difference and average of the left and right error values. In the "fast" mode of operation these two new values are simply sent to the coder instead of the left and right values. In the default mode, the difference value is always sent to the coder along with one of the other three values (average, left, or right). An adaptive algorithm continuously determines the most efficient of the three to send based on the changing balance of the channels.
I have developed a unique data encoder for WavPack that I believe is better than Rice coding in two different areas. It is impossible to encode more efficiently than Rice coding because it represents the optimal bit coding (sometimes known as the Huffman code) for this type of data. My encoder is slightly less efficient than this, but only by about 0.15 bits/sample (or less than 1% for 16-bit data). The first advantage of my coder is that it does not require the data to be buffered ahead of encoding, instead it converts each sample directly to bitcodes. This is more computationally efficient and it is better in some applications where coding delay is critical. The second advantage is that it is easily adaptable to lossy encoding because all significant bits (except the implied "one" MSB) are transmitted directly. In this way it is possible to only transmit, for example, the 3 most significant bits (with sign) of each sample. In fact, it is possible to transmit only the sign and implied MSB for each sample with an average of only 3.65 bits/sample.
This coding scheme is used to implement the "lossy" mode of WavPack. In the "fast" mode the output of the non-adaptive decorrelator is simply rounded to the nearest codable value for the specified number of bits. In the default mode the adaptive decorrelator is used (which reduces the average noise about 1 dB) and also both the current and the next sample are considered in choosing the better of the two available codes (which reduces noise another 1 dB).
I have decided to not use any floating-point arithmetic in WavPack's data path because I believe that integer operations are less susceptible to subtle chip to chip variations that could corrupt the lossless nature of the compression, the recent Pentium floating point bug being a blatant example of this. It is possible that a lossless compressor that used floating-point math could generate different output when running on that faulty Pentium. Even disregarding actual bugs, floating-point math is complicated enough that there could be subtle differences between "correct" implementations that could cause trouble for this type of application. To further ensure confidence in the integrity of WavPack's compression, I have included a 32-bit error detection code.

Summary

To achieve very fast and reliable operation for lossless audio compression, WAVPACK utilizes only integer math and employs no pre-scanning of sample data (sample in --> bits out). A variable "lossy" compression mode is also implemented through the use of a novel encoding scheme having several advantages to the standard Rice method.
Home