The residuals between the original and the predicted image are those that correspond to the less frequent predictor configurations. Efficiently coded residuals constitute the output image. To our knowledge, the performance of the proposed compression algorithm is comparable to the current state of the art. Especially good results were obtained for binary images, greylevel cartoons and manmade drawings.
Data compression can be viewed as a process producing a model of the input data together with associated residuals. An improved data model yields better compression results.
Redundant information in generic raster images is due to high correlation between neighbouring cells^{1}. In other words, the local structure and similarity are the main source of redundancy in generic imagery. The proposed image compression method examines the neighbourhood of the current cell using an adaptive probe, creates a nonlinear predictor for the current cell value, and stores residuals of the predictor in the resulting compressed image.
In the compression phase, an image is systematically traversed using an adaptive probe and, for the current cell, a nonlinear predictor is created by examining the local neighbourhood. The estimated value is compared to the actual value of the cell. Where there is a discrepancy between the estimated and the actual value the residual (binary value one) is stored in the position of the current cell. If the predictor provides a good estimate, a precise match will be found for the majority of the cells and the resulting compressed image will be small. The description of the predictor and the positions of the residuals in the image are stored together in the output file. Three predictors with increasing complexity are described in this paper, i.e. (i) for binary images, (ii) for greylevel images, (iii) adaptive predictor for greylevel image.
The reduced data is optimally coded, and together with the predictor description, the compressed image is created.
The paper is organized as follows. Section lists the current most efficient lossless compression methods and these are later used for comparison with the proposed methods. Section explains the basic predictor as a set of probe pairs that differ in one bit only. Section generalizes the predictor for the greylevel image that is treated as stacked bit planes. Section further improves the predictor by allowing it to adapt to the actual image data. Section proposes an efficient way to encode the sparse matrix of predictor residuals. In Section we present experimental results. In the last section we draw conclusions.
A preliminary version of the paper was published in [].
The IBM Q80 coder (lossless JPEG) [] is obsolete and outperformed by the conceptually similar Portable Network Graphics (PNG).
PNG [] consists of two steps. In the first step, the difference image D_{x} (x,y) = f(x1,y)f(x,y) is computed and a matrix which contains only differences is created. In the second step, this matrix is encoded using the LZW method. Both parts may be connected by a pipe and these can run in parallel. The source code of the algorithm is freely available. The disadvantage is that the second step is not tuned to 2D image data. PNG is based on the assumption that the number of regions with limited 1st derivatives is smaller than the number of regions with higher 1st derivatives. The context of images is usually richer.
Paper [] discusses several simple compression methods PVRGJPEG proposed by the Portable Video Research Group at Stanford. Each method was tried on the images and the best method was selected for actual compression.
The second generation lossless compression methods are usually divided into two subgroups:
The rounding transform from [] uses circularshaped hierarchical estimators. The compression ratio does not rank the method among the best.
There is one more method which is relevant to ours. It treats the image as separate bit planes and is called the Logic coding method []. The method uses boolean logic for image compression.
The initial idea comes from Schlesinger [] who proposed a new representation scheme for binary images. Let us denote the method by Schl2 . Number 2 comes from two degrees of freedom (DOF).
The Schl2 method takes a binary image f as input. The support of the image is T={(x,y): 0 £ x £ M; 0 £ y £ N }, where M is the width of the image and N is the height. The image represents the mapping f(x,y) ®{0,1}. Without loss of generality, we assume that the leftmost column and the bottom line of the image have value zero, i.e. f(0,y) = f(x,0) = 0.
The Schl2 method traverses the binary image f by the probe consisting of 2×2 cells. Its representative cell (x,y) is placed in the upper right corner of the probe. The value of the actual (estimated) cell is p_{4} = f(x,y). Let values of the probe placed in the particular position in the probe be denoted p_{1} = f(x,y1); p_{2} = f(x1,y1); p_{3} = f(x1,y); see Fig. .
Schlesinger designed the predictor (estimator, probe) e in the following way. All the 16 possible combinations of cells within the 2 ×2 probe are considered.
Probes are arranged into two rows and eight columns in such a way that probes in the same column differ in the upper right bit only, see Fig. . A frequency analysis of the predictor configurations was performed on many different images with the aim of finding out which configuration in each pairs occurs more often. The more frequent configurations are highlighted in Fig. . The probes in column 5 will be discussed soon.
The idea of this compression method is to store residuals of less frequent probes, i.e. value one in the representative point of the probe. The arrangement of the probes allows us to reconstruct the image from a sparse matrix of stored residuals only.
Schlesinger tested the predictor on many images and the more frequent probe of each pair was found through extensive experiments. In the compression phase, only the residuals between predicted and real values are stored. This typically yields a much sparser image.
Actually, the probe configurations proposed by Schlesinger are suboptimal with respect to the relative frequencies of different configurations. Probes in Fig. , column 5 would have to be highlighted in opposite manner if the probe statistics had to be optimal for binary images tested by Schlesinger. A substantial advantage of the suboptimal configurations independent of the image content is that standard set operations (intersection, union, negation) and binary morphology operations can be performed on the compressed images.
Our compression method for binary images abbreviated as FH2 takes a step forward compared to Schlesinger's one. The probe frequency analysis is performed for each image. The result is then stored in a frequency table consisting of two rows and eight columns. A value in the table represents the number of occurrences of the corresponding probe in the image. The maxima in columns of the frequency table point to optimal predictor configurations. These are marked.
One bit is needed to distinguish between two probes in one column. Thus one byte describes all eight pairs of probes. This information will be added to the compressed image to store the predictor description.
In the compression phase , the image is traversed in a bottomup manner and from right to left. In each position of the image, a check is made to determine whether a more frequent or less frequent probe appeared. In the former case, the value 0 is written to the representative cell. Whereas, in the latter case, the value one is stored (corresponding to the residual of the predictor). The algorithm traverses the whole image and produces an image of residuals. This reduced image is typically much sparser than the input image due to the estimation abilities of the predictor.
The decompression phase has at hand the description of the predictor (one byte) and the image of residuals. The leftmost column and the bottom most row are known to be zerovalued. The probe starts at the bottom left corner with the values p_{1}, p_{2}, and p_{3} being known. These three values uniquely determine one of the eight columns of the predictor. The value of the stored residual distinguishes the correct probe in the pair. The decompressed value p_{4} is set accordingly and overwrites the residual in the position p_{4}. The probe traverses the image rowwise from left to right and from bottom to top, until the original image is restored.
The greylevel image can be treated as stacked bit planes. The predictor described in the previous section was generalized to cope with the current bit plane and the closest bit plane upwards. The predictor context encompassed by the volumetric probe has three degrees of freedom (DOF). Therefore we abbreviate the proposed compression algorithm FH3. The arrangement of cells in the probe is shown in Fig. . The processed bit plane is the lower one. Its representative point, i.e. the predicted cell, is denoted by p_{8} and is highlighted. The idea of the compression remains the same as in the binary case.
The frequency analysis of probe configurations is performed when passing through the image for the first time. The process is analog to the binary image case described in Section 3. Here 2^{7} bits = 16 bytes are needed to store the frequency table of all possible configurations. The image of the processed bitplane is reduced and residuals are stored in the same way as in the binary image case. This requires a second pass through the original image data. The result is a sparse matrix with a small number of residuals.
The original image can be uniquely reconstructed using the residuals and the description of the predictor stored in the compressed image. The compression process as described above starts from the top bit plane using the FH2 algorithm (since there is no available data above the top plane). Then the FH3 algorithm is applied to remaining bit planes.
More complicated probes spanning several bit planes can be also used, see Fig. . The rightmost probe is used for processing the top bit plane. The probe to the left is used for coding the bit plane below the top plane and so on.
The size of the predictor support is limited from a practical point of view by the size of the frequency table. The size of the frequency table grows exponentially and has 2^{k} entries, where k is the number of cells of the predictor. The predicted representative cell of the probe is not counted. For example, for 20 cells, the frequency table size is 2^{20} = 1Mbit. This would be too much. Of course, not all predictor cells contain the same amount of information. The trick we propose is to throw away some cells from a relatively large probe adaptively. This is done according to the content of the particular image. In the following, let us denote such an adaptive predictor by FHAdapt in the sequel.
The construction of predictor probes is similar to the procedure described earlier. They have more cells, e.g. 20, and could span nDOF (degrees of freedom). In the case of an RGB colour image, there are 3DOF in the intensity image corresponding to one of the colour components. The two other DOF are added by the context in the same bitplane in other two colour components. Probes are again arranged in pairs that differ only in the value of the actual (estimated) cell. This facilitates a unique reconstruction of the original image in the decompression phase.
The frequency analysis comes next. The result of the analysis is the frequency table with the size 2^{k+1}, where k is the number of support cells of the predictor. Note that there is no need for the representative cell to be in the set of k elements in the probe. The frequency table is usually rather big. The aim is to select the subset of i cells, i < k, containing most of the information. This results in a reduction of the length of the frequency table to 2^{i}.
The selection step does not need to pass through the original image another time and thus is independent of the size of the image. Predictor cells may be viewed as features in statistical pattern recognition task. The cells contribute to the estimate of the representative point of the probe. After the frequency analysis, it is known how often a particular predictor probe occurs in the image. Moreover, the cells can be ordered according to their importance for estimation of the representative cell. The proposed algorithm decouples the influences of individual cells from the frequency table.
Let us explain the decoupling in the specific case of a binary image. The aim is to reduce the number of probe cells by one to k = 3. All configurations of the probe were shown in Fig. . Assume for a moment that we would like to decouple the influence of cell p_{3} (the top left corner of the probe Fig. ). If this feature was omitted the probe pairs would merge as shown in Fig. . The frequency of the new probe configuration is the sum of the frequencies of its two constituting configurations, which differ only by the value of the removed cell.
An appropriate representation of the frequency table facilitates the use of very efficient bit operations. Recall that the representative cell has the highest index. Let the frequency table be stored in a vector q with 2^{k+1} elements. The first 2^{k} elements of the vector q = (q_{0}, ¼,q_{k}) correspond to the frequencies of the top row of probe configurations.
Let e_{all} be the number of occurrences of the less frequent configurations of the probes. Furthermore, let b be an index to the first element of the second half of the frequency table, b = 2^{k}, and c the index of the removed cell. The value e_{all} is calculated as


The factor ^{1}/_{2} in front of the sum comes from the fact that the value within the sum is considered twice. The estimativness e_{c} expresses the increase in the number of residuals when the cell c is omitted. The operator Å denotes logical eXclusive OR.
The cell with the lowest estimativness e_{c} is omitted from the probe. This algorithm can be applied iteratively and in each iteration the cell corresponding to the minimal value e_{c} is thrown away.
At last, we will define the termination criterion for removal of cells. The number of residuals e_{all} increases monotonically if we throw away cells one by one. The number of entries in the frequency table decreases monotonically and thus its length decreases. We are trying to find the extremum of the difference between these two functions. If the increase of the length of the coded residuals (given by the value e_{c}) is bigger than the decrease of the table length or the table is empty then the algorithm terminates.

The question how to code distances l_{1}, ¼, l_{k} efficiently is important. The traditional Huffman coding is not suitable because the maximal possible length l_{max} = M N+1 is too big. The time needed to compute the Huffman table is also important. The best algorithms can do it in O(n log(n)) time where the optimal complexity of sorting items is O(n log(n)) and the complexity of computing the Huffman table is O(n). The number O((M N+1) log(M N+1)) would be too high for all practical purposes.
It is desirable to find suitable code to represent big integer numbers. There is a need to encode large values very often. The Elias code [] inspired us. The Elias code is composed of two parts, the first uses unary and the second binary code. We propose a new code, called code with logarithmic growth , that was inspired by Elias' code. The length of this code grows logarithmically with the encoded number l_{i}.
We combined the Huffman coding and the prefix code with logarithmic growth for better compression. The combination of the Huffman code and the prefix code with logarithmic growth is illustrated in Fig. . The number K is the highest number coded by the Huffman code. If K ¹ 0 we chose a modified code with logarithmic growth. This code can be found in the column round(log_{2}(K)).
There are three options, see Tab. : (a) K = 0, i.e. only the code with logarithmic growth is used (basic option); (b) K is fixed. Experiments have shown that the best values are typically K = 32 or K = 64; (c) K is set optimally according to the data to be compressed. This option is described next.
No. Binary (a) K = 0 (b) Fixed K (c) Optimal K
1 00 00 000 0000
2 01 01 001 0001
3 100 1000 010 0010
4 101 1001 011 0011
5 11000 1010 10000 0100
6 11001 1011 10001 0101
7 11010 110000 10010 0110
8 11011 110001 10011 0111
9 1110000 110010 10100 100000
10 1110001 110011 10101 100001
11 1110010 110100 10110 100010
12 1110011 110101 10111 100011
13 1110100 110110 1100000 100100
14 1110101 110111 1100001 100101
15 1110110 11100000 1100010 100110
16 1110111 11100001 1100011 100111
17 111100000 11100011 1100100 101000
The length L_{1} of the Huffman code table is an increasing function

The relationship between the number K and the length of encoded residuals is illustrated in Fig. . Eight curves, one for each bit plane of a test image, are shown.
The goal of the experiments were to (1) test the properties of the proposed method, (2) compare it with the results of others, and (3) find possibilities of further improvements.
Let us illustrate the behaviour of the proposed FH3 compression algorithm on the standard greylevel testing image Lena of size 256×256×8 bits, see Fig. . Selected bitplanes of the original and compressed image are shown in Fig. . Note that the compression is more effective on bitplanes corresponding to more significant bits as there are usually less changes there. Look how well the residuals express perceptually significant features in the image. The mentioned experiment is quantified in Tab. . Notice how the number of residuals increases from the most significant bitplane to the least significant bitplane. It is difficult to represent randomness by the estimator. The estimativness e_{c} gives a clear insight into the behaviour of the predictor. Cell 6 and 4 from the neighbouring bit planes contain the highest amount of information.
Bitplane  bit 7  bit 6  bit 5  bit 4  bit 3  bit 2  bit 1  bit 0 
FH2 residuals  2010  5477  10177  15067  21865  28595  31771  32214 
FH3 residuals  empty  3841  5935  9421  15045  21690  28622  31272 
estimativness e_{1}  empty  25  160  218  233  91  125  320 
estimativness e_{2}  empty  616  2165  3108  4930  4919  2521  457 
estimativness e_{3}  empty  52  180  219  233  91  128  352 
estimativness e_{4}  empty  1892  4101  5543  6611  6600  2813  447 
estimativness e_{5}  0  14  123  143  92  26  112  361 
estimativness e_{6}  1457  2097  3477  4169  5807  5031  2312  467 
estimativness e_{7}  0  23  118  136  170  85  71  220 
The proposed adaptive compression method FHAdapt was compared on several images from the standard JPEG image set []. The methods used for comparison were: Segment from [], Portable Video Research Group at Stanford PVRGJPEG from [], Rounding transform from [], and Logic coding from []. Results were taken directly from the papers. The only implementation that was available was that of Callic [].
The quantitative measure used for comparison of the different methods is the compression efficiency CE []

The value CE in Tab. shows how our method performs in comparison with others. The FH3 method was outperformed only by Segment and PNG in the case of an image of a monkey (Baboon). The reason is that the predictor cannot cope with the complicated structure of the monkey's hair. Segment performs also better on the Lena image. Unfortunately we were not able to experiment with Segment because the code was not available. The PNG compression is very similar to the IBM Q80 coder []. IBM Q80 is obsolete and the implementation was not accessible.
Method/Image  Lena  Baboon  Boats 
PVRGJPEG  22.729.4  6.713.7  24.933.1 
Logic  26.1  12.6  29.3 
Segment  54.6  36.3  empty 
Rounding Tr.  37.6  empty  empty 
PNG  20.7  41.7  33.7 
FH3  34.0  19.1  41.2 
The Tab. shows the comparison of our method FH Adapt with other common methods on the Lena image and on a technical drawing image, see Fig. . The latter image is perfectly suited for our compression method as can be seen from the results.
Type of data  Lena [Bytes]  CE Lena [%]  Drawing [bytes]  CE Drawing [%] 
Uncompressed  262656  0  106940  0 
ZIP  225007  15.62  10224  90.44 
ARJ  228389  14.35  10380  90.29 
PCX  274256  2.85  38624  63.88 
GIF  278771  4.54  17290  83.83 
FHAdapt.  175434  34.21  7297  93.18 
Tab. illustrates the influence of three residual coding methods on three different predictors. This gives us an understanding what improvement the individual method brings. The values in the table are the lengths of the compressed 512 × 512 image Lena in bytes.
Method/Coding  Exp. code only  Fixed Huffman  Moving Huffman 
FH2  203000  200758  200599 
FH3  178655  176021  175922 
FHAdapt  178130  175541  175432 
Tests were performed on an Intel Pentium 120MHz computer. The computation time was obtained using a RAM disk to avoid randomness in accessing the disk. The compression of the image Lena (512 × 512 × 8 Bits) takes 6.2 seconds using FH3 method. The decompression is approximately twice as fast, i.e. in the Lena case it runs 2.7 seconds. Loading/Storing a data without compression takes 0.1 seconds.
We have proposed a new lossless image compression method that in many respects outperforms other methods considered as state of the art. The basic idea of the method is to predict differences in one bit only.
The advantages are:
The quick implementation is possible since the operations used are very simple, e.g. SHL, XOR, ADD, and the selection from a table.
Substantial parts of the compression and decompression phases can run in parallel.
The predictor that constitutes the core of the method can be extended to handle more degrees of the freedom, e.g. to colour images or image sequences.
When the resolution of the image increases, the compression ratio does not degrade. This happens in the case of pyramidal algorithms, e.g. MLP (Multi Level Progressive) [].
The disadvantages of the method are:
The decompression algorithm is very sensitive to errors in the compressed data. Of course, this disadvantage holds for most compression methods.
We encourage the reader to test the method, see the www page
http://cmp.felk.cvut.cz/~fojtik/.
This research was supported by the Czech Ministry of Education grant VS96049, the Grant Agency of the Czech Republic, grants 102/97/0480, 102/97/0855 and the European Union grant Copernicus CP941068.
^{1} The term cell is used instead of pixel common for 2D images as it copes with higher dimensions too.