We suggest a preprocessing phase that (a) analyses statistics of the adjacency relations of index values, (b) performs palette optimization, and (c) permutes indices to palette to achieve more smooth image. The smoother image causes that the lossless image compression methods yield less output data.
The task to optimally permute palette indices is a NP complete combinatorial optimization. Instead of checking all possibilities, we propose a reasonable initial guess and a fast suboptimal hill climbing optimization.
The proposed permutation of indices should enhance performance of most lossless compression method used after it. To our knowledge, the proposed reordering followed by our own nonlinear compression technique [,] yields the best compression. Experiments with various images show that the indices reordering provides data savings from 10% to 50%.
Key words: lossless image compression, palette image, pseudocolor image, statistical techniques in image processing.
Let us assume that the palette image is mapping color = [R,G,B] = palette(f(x,y)), where [R,G,B] are individual color components i.e. three intensity images. The output of the function f(x,y) is an index. This is a reason why we call this function palette index function in sequel. The image support is T={(x,y): 0 £ x £ M; 0 £ y £ N } and palette is the look up table with [R,G,B] entries.
This report discusses a lossless compression of pseudo color images (images with a palette) see also [] for description of this color representation. Our lossless compression method [,] and a lot of other compression techniques are based on the assumption about the continuity of the image function. Typical palette functions break this assumption.
The palette is usually created from the true color RGB image (camera, color scan) or by an interactive painting program. The algorithm that quantizes the original true color image causes discontinuities in its output index function f(x,y). This effect can significantly decrease the compression ratio.
0.31 0.99
There are many available compression algorithms for gray level images and true color images. Pseudo color images are usually compressed by the same algorithms as gray level ones. The compression ratio depends on a first order entropy (we will call it smoothness in the sequel) in these cases.
The first order entropy H_{1} can be defined on the alphabet D consisting of 2K1 possible differences between the elements of an alphabet S. The alphabet S contains K indices. See [] for more detailed explanation.
 (1) 
A simple way how to increase a smoothness of an image function was proposed in [] for compression purpose. The brightness index Y, calculated e.g. as Y = R + G + B or Y = 0.3R + 0.6G +0.1B, was linked to each color item. The palette entries were sorted according to this index. Some unused palette entries could be removed and some duplicated items could be merged.
The reordering of palette is useful mainly in these cases. First two of them are mentioned in []
The requirements for first two approaches are not fully in contradiction. In this report, we will discuss first one.
The most related work to our contribution is []. The linear predictor is assumed as lossless compression technique. Indices reordering is formulated as optimization. Three heuristic solutions are proposed to it. Two of them are very expensive due to simulated annealing and third, based on a greedy algorithm, produces worse results.
The other related paper [] describes lossy compression of palette images. Proposed lossy compression starts with construction of the optimal shortest route among colors in the RGB (or LUV) color space. This route is depicted on the Fig. 1.4. Colors close each other (on this route) are grouped into clusters. Each cluster corresponds to just one new color.
The basic idea is to reestablish the smoothness of the palette index function f(x,y) in the preprocessing step, see Fig. 1. Both newly created palette index function f¢(x,y) and the table look up palette' are modified in the way that the resulting colors, i.e. [R,G,B] values, remain the same for all pairs of corresponding pixels from both images.
 (2) 
This report proposes a computationally effective method that finds the modified palette index function f¢(x,y) and new palette' automatically.
Every multilevel image^{2} (as our f(x,y) is) can be treated as stacked bit planes. It allows that method can optimize bit planes individually.
The proposed method (a) performs statistical analysis of the adjacency of the intensity values in the neighborhood of each pixel then (b) minimizes the number of value changes (i.e. first order entropy H_{1}) in the current bit plane by rearranging bit planes below it. Smoothing of palette index function f¢(x,y) is achieved. The proposed algorithm starts from the most significant bit plane and proceeds to bit planes below it.
Let u, v be two values (i.e. indices into palette) of the palette index function f(x,y) in two different but neighboring pixels. Several cases of neighborhood of the current pixel (x,y) are shown in Fig. 2. The discussion about creating neighbourhood relation is placed in the section .
The symmetric relation clasp(u,v) = clasp(v,u) tells whether two indices u, v are adjacent. Let us call clasp the index adjacency relation. We use the intuitive name clasp as the notion of a thing that fastens two regions with intensities u,v (index values) together. Simple statistics of the local index adjacency relation will serve as a measure of the smoothness of the palette index function f(x,y).
We are interested in the number of occurrences of indices adjacency relations rel in the whole image f(x,y). This information is stored in the index adjacency table S(i,j). The table S(i,j) has the same number of rows and columns that is equal to range (number of values) of the palette index function f(x,y). The size of S(i,j) is 256 × 256 for typical palette images. The table entries tell us how many times the indices u and v are adjacent in the image. The statistical information stored in S(i,j) resembles more general cooccurrence matrix used often in the texture analysis [].
The indices adjacency table S is symmetric and only lower (or upper) triangular part of it needs to be stored. The values on the main diagonal (identity relation) are not needed by the proposed algorithm. Therefore we ignore a diagonal axis in the table S in the further text. This is because information placed on the diagonal is not suitable for further optimization^{3}. The value on the diagonal axis only says how many indices with same value are close each to other. This value has no influence on optimization step. It can be assumed that all diagonal axis is set to zero. It is also possible to compute with nonzero diagonal axis but final equations would be more complicated.
In our case, the whole square table is stored as it eases further computations. We further assume that a number of gray values is power of two for a computational purposes. If this is not the case the table S can be extended by dummy zeros.
The indices adjacency table S is created by one pass traversal of the image according to the neighborhood mask. If the mask on Fig. 2(a) is chosen the cells S(f(i,j),f(i+1,j)); S(f(i,j),f(i,j+1)); S(f(i,j),f(i+1,j+1)); and also their symmetric counterparts S(f(i+1,j),f(i,j)); S(f(i,j+1),f(i,j)); S(f(i+1,j+1),f(i,j)) are incremented by one in each position of the neighborhood mask.
The statistics stored in the intensity adjacency table S will be used to find optimal palette index function f¢(x,y). How it is done is described in the coming section.
The user can tune our method by changing the index adjacency relation. Several possible neighborhood relations are depicted in Fig. 2. Now we will discus how to create these configurations and other ones.
The maximum possible number of clasps in one iteration is depicted in the Fig. 3. There are four possible clasps for one box. They are labeled by numbers 1, 2, 3, 4 and are coloured in black for one box. Clasps, which are applied from another boxes are gray and clasps which disappear are white. User can select any combination of them. Our experiments show that clasps placed in horizontal No_ 1 and vertical No_ 2 directions, have high information level. These should not to be omitted. We also solve to select a clasp No_ 4 due to the structure of our binary predictor.
Notice that several other configurations produce same results. We present here two of them in Fig. 3(a) and Fig. 3(b). Configurations derived from these ones do not compute any relation twice. Configurations according to Fig. 2(a) and Fig. 2(d) are nearly the same. The only difference occurs in processing first and last row and/or column.
It seems that the case in Fig. 2(d) is the best one for our compression algorithm. It becomes more apparent when one recalls structure of our predictor from [] which is depicted in Fig. 4 and is first of all used by []. Another configuration may be better for another purposes.
Further optimization does not depend on the selected local configuration.
We show one example for better understanding of our concept of creating index adjacency table. Tab. shows initial index function with indices from 0 to 7 that was selected randomly. The index adjacency table for this image according to local configuration in Fig. 2(d) is depicted in the table Tab. . The index function depicted in Tab. is obtained after optimization.
1  3  5  7  5  3  1  0  0  0  0  0  2  2  2  2 
3  1  0  0  0  0  0  0  0  0  0  0  2  2  2  2 
5  0  1  0  0  0  0  0  0  0  0  0  2  2  2  2 
7  0  0  1  0  0  0  0  0  0  0  0  2  2  2  2 
5  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0 
3  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0 
1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0 
0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0 
0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0 
0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0 
0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  2 
4  0  4  4  0  0  0  0  0  0  0  1  0  0  0  4 
0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  6 
4  4  4  4  0  0  0  0  0  0  0  0  0  1  0  4 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2 
4  4  4  4  0  0  0  0  0  0  2  4  6  4  2  1 
0  4  5  7  5  4  0  1  1  1  1  1  2  2  2  2 
4  0  1  1  1  1  1  1  1  1  1  1  2  2  2  2 
5  1  0  1  1  1  1  1  1  1  1  1  2  2  2  2 
7  1  1  0  1  1  1  1  1  1  1  1  2  2  2  2 
5  1  1  1  0  1  1  1  1  1  1  1  1  1  1  1 
4  1  1  1  1  0  1  1  1  1  1  1  1  1  1  1 
0  1  1  1  1  1  0  1  1  1  1  1  1  1  1  1 
1  1  1  1  1  1  1  0  1  1  1  1  1  1  1  1 
1  1  1  1  1  1  1  1  0  1  1  1  1  1  1  1 
1  1  1  1  1  1  1  1  1  0  1  1  1  1  1  1 
1  1  1  1  1  1  1  1  1  1  0  1  1  1  1  2 
3  1  3  3  1  1  1  1  1  1  1  0  1  1  1  3 
1  1  1  1  1  1  1  1  1  1  1  1  0  1  1  6 
3  3  3  3  1  1  1  1  1  1  1  1  1  0  1  3 
1  1  1  1  1  1  1  1  1  1  1  1  1  1  0  2 
1  3  3  3  1  1  1  1  1  1  2  3  6  3  2  0 
0  1  2  3  4  5  6  7  
0  467  58  17  6  44  8  3  4 
1  58  14  3  6  0  0  0  0 
2  17  3  27  0  2  0  0  0 
3  6  6  0  0  0  4  0  0 
4  44  0  2  0  6  0  2  0 
5  8  0  0  4  0  0  0  4 
6  3  0  0  0  2  0  0  0 
7  4  0  0  0  0  4  0  0 
Let us start with an informal description of the optimization task that should be performed. A pseudocolor image depicting green and red peppers was chosen as an example, see first row in Fig. , where the pseudocolor image is shown in intensity values only. The difficulty with the lossless compression algorithms for pseudocolor images is that there are too many discontinuities in corresponding indices in the palette index function f(x,y). Let us show them in individual bit planes of f(x,y) for the pepper image. Three bit planes #8, #5 and #1 from all eight bit planes of the palette index function f(x,y) are displayed in the leftmost column of the Fig. . Even the most significant bit plane #8 changes often too.
Our aim is to reorder indices of the palette index function f(x,y) in such a manner that resulting binary images in individual bit planes will consist of the smallest number of large regions. Larger and smoother regions ease the further image compression. Number of possible indices rearrangements is huge  n!, where n is a number of indices.
The top bit plane is most significant and thus it is processed first. Our algorithm is designed in such a way, that lower bit planes could be modified similarly but the already modified bit planes above it should remain intact.
Processing of bit planes is a combinatorial optimization problem. To simplify it, we assume that all K indices are divided into two groups G_{1} and G_{2} consisting of half of the entries, i.e. k = ^{K}/_{2} each. The amount of possible combinations is still tremendous in typical case of 256 indices and the top bit plane:

The function combinations has two arguments. The first argument MP = MaxPlanes defines a number of biplanes log_{2} K and second one PP = ProcessedPlane denotes the actual bit plane. For bit planes below the top bit plane and groups G_{1} and G_{2} with same number of indices, we can compute a number of possible arrangements using following more general equation:
 (3) 
Let us assume that there is some initial division of indices into two groups G_{1} and G_{2}. A single candidate index in both groups is found that fits the least to the current group than other indices. These two found indices are swapped between groups. The process is repeated until the the global criterion describing division into two groups is minimized. Theoretically, this minimization would perform perfectly if we would have a space without local extrema. Unfortunately, it seems that the number of local extrema is very high for real images and our optimization space. The algorithm may get stuck in a local minimum. On the other hand, experiments have shown that even this simple minimization yields much smoother palette index function f¢(x,y). Moreover we use some heuristic rules to avoid some local minima.
The good initial estimate of division of indices k into groups G_{1} and G_{2} is available to help us to be nearer to the global minimum. Our division is based on the strategy that light colors create one group and dark colors second group. Therefore such initial division is very close to the situation when indices are sorted according to their intensity.
The adjacency relations between values of palette index function in the local neighborhood (recall Fig. 2) are used to define the optimization criterion. The index adjacency relation was denoted clasp(u,v) in Section 2.1. Indices are split into two disjoint groups G_{1} and G_{2} with the same cardinality. This requirement is not strict and our approach could be generalized for any division. Two groups are given by system used in computers and also because it is minimal possible value for doing such thing.
The number of connections clasp(u,v), u Î G_{1}, v Î G_{2}, u ¹ v informs how many relations are between (in our case two) distinct groups. The number of existing relations clasp(u,v), u Î G_{1}, v Î G_{1}, u ¹ v informs how many relations are within the group G_{1}, similarly for G_{2}. All three numbers can build up a quality measure of the grouping of indices into G_{1} and G_{2}. They help to find best candidates for swapping between G_{1} and G_{2}. The good news is that all needed statistics can be efficiently extracted from the indices adjacency table S.
The image in Fig 5 shows one bit plane from hypothetical image before and after sorting indices. Note that a number of white (black) pixels is not preserved. The simple configuration Fig. 2(b) was used only to make the shown image simple.
If nothing is known about dependencies among indices a simulated annealing or other global optimization would be a possible way for reorganizing indices [,]. Such space can be imagined as a space with dimensionality n!. Every permutation of indices corresponds to one state in this hypothetical space. Any knowledge significantly decreases our optimization space. Do we know really nothing about indices?
Every combination of indices is joined to the one state in a hypothetical space. We want to define some metric in the space. We can find out whether there is some relation between two states (possible configurations of indices). Such relation could measure a length between them. We define metric as a minimal number of swapped indices after that we traverse from one state to another state. The reverse traversal way is also possible and is the same. When movements are applied in reverse order to the second state it will guide us to the first state in the shortest way.
Assuming that indices are divided into two groups with the same number of entries, the possible number of traversals from a state is (^{n}/_{2})^{2}  1. This is still a big number but much smaller than n!. Recall that number of dimensions do not depend on the number of states. Only in the case, when we cannot find any dependence among states is its dimensionality equal to the number of states i.e. n!.
Idea about state space allows us to discuss further how to chose the optimum in space with factorial number of states.
If the index k belongs into the group G_{1}:
 (4) 
It can be seen that w_{k}^{+} is a number of clasps (satisfied index adjacency relations) within own group G_{1} for index k, w_{k}^{} is number of clasps between own group G_{1} and the alien group G_{2}. Their difference w_{k} is a relative measure of the index k quality. The index with minimal quality within group G_{1} will be selected as a candidate for swapping.
Similarly for group G_{2}, if the the index k belongs to G_{2}:
 (5) 
Qualities of all indices constitute an indices quality vector w = (w_{1}, w_{2} ... w_{k}).
We have seen how quality w_{k} enables to find candidates for swapping. A global optimization criterion is needed to know when to stop swapping iterations. We use slightly modified hill climbing optimization.
The precalculated indices adjacency table S(i,j) contains needed information, see Fig. (a), where four distinct parts W_{1}, W_{21}, W_{12}, and W_{2} are illustrated. Note that W_{12} = W_{21} due to the symmetry of the table S. The entries of the table S(i,j) are summed within the parts according to labeling W_{xx}:
 (6) 
If any labeling occurs more than once in the table S then its value is sum of all areas with this label into the matrix. This situation occurs in the case of bit planes below the top one, see Fig. .
When we know a quality of each isolated index, we can compute a quality of the whole group of indices as a sum of qualities of their members. The quality of whole group of indices, i.e. Q(G_{1}) and Q(G_{2}) is defined as:
 (7) 
 (8) 
The global optimization criterion Q is the the sum of qualities of all indices in both groups. It is evaluation of the whole area of the index adjacency table S with same labeling.
Notice that our approach to the computation is strictly hierarchical in terms of data amount. It hierarchically reduces the amount of data. At the beginning we have the whole pseudocolor image. Next we compute the index adjacency table, which needs less storage space. Qualities of indices could be stored in a indices quality vector w. This vector w can be computed only from the table S. We proceed up to the global criterion Q from indices quality vector w of indices qualities. Such approach eases computation.
The worst index in each group (with the lowest value w_{k}) is found. The worst indices are swapped between groups. Let us suppose that we have two indices, each one from different group a Î G_{1} and b Î G_{2}. We want to swap them. Swapping two indices is depicted in figures Fig. (b)(c).
0.3
New values of qualities of swapped indices are:
 (9) 
 (10) 
Let us analyze the change of a global optimization criterion Q. Let Q^{*} be evaluation of the residual of the matrix without rows and columns a and b. The optimization criterion can be written as^{4}:
 (11) 
After swapping (please notice that Q^{*} and Q^{¢*} are same, because changing a and b changes only w_{a}^{+} w_{a}^{} w_{b}^{+} w_{b}^{}):

Equations (14) and (11) are subtracted:

All actions which must be done after swapping two indices are called transaction. A transaction includes a recomputation of the indices quality vector w and global criterion Q.
If the transaction decreases the global criterion then ( Q¢ < Q ® D = Q¢ Q < 0). The D value means how many clasps will appear when D > 0 or disappear when D < 0 in the image. We use equation (16) for computation of the value D:
 (17) 
Notice that only values w_{k} are needed and not their components w_{k}^{} and w_{k}^{+}. We want to find a couple of indices with the smallest value D. Of course, this value can be found by examining all possibilities. But this algorithm is very computational expensive. We use simpler algorithm which searches only for indices with smallest value in the evaluating indices quality vector w for both groups. Each chosen index is attached to such an index from other group in a way that the value D is minimal for this couple. Now, we choose one couple of indices with smaller D from two couples.
Swapping step is repeated until the optimization criterion decreases (e.g. D < 0 for any combination a Î G_{1} and b Î G_{2}). Then the algorithm stops. The division into two groups in the actual bit plane is finished. The algorithm is recursive but it must stop after finite number of steps. The overall criterion Q cannot be smaller than minus sum of all clasps in the image. Every transaction must decrease a value Q. The minimal possible decreasing is by 2. Maximal number of transaction can be estimated from this. It is very pessimistic but the algorithm stops in practise after small number of swaps (typically 510).
When indices are swapped, all items in the indices quality vector w must be updated too. New items w_{a} and w_{b} were already defined. The other items w_{x} are defined by the following equation:

All these operations cam be encapsulated into one symbolic operator:
 (20) 
G_{1} and G_{2} are groups a, b are changed indices and S is matrix describing relations among indices.
The computational cost can be computed from equations (19) and (10). The cost of swapping indices and maintaining global criterion Q and the indices quality vector w with particular criteria is:
Additions and Subtractions  Looking into the table 
(2*n  2) + 2  (2*n2) + 2 
This means that complexity depends only linearly on the number of indices n. No additional computation is required for one iteration step. This is true only if values a and b are known. Finding them costs something too.
0.32 0.99
The algorithm proceeds from the highest to less significant bit planes. The task is to split both groups G_{1} and G_{2} into four groups in bit plane below the top one. The approach is similar to that described above. But the computation is slightly more complicated. See Figure 3.8 for arrangement of groups in the matrix S for the second bit plane (below the top one).
Previous group G_{1} is divided into two groups G_{11} and G_{12} and the group G_{2} is divided to G_{21} and G_{22}. Indices may be transferred only between G_{11} and G_{12} and between G_{21} and G_{22}. Each group has two parts: moveable one and unmovable one. When we optimize groups G_{11} and G_{12}, the group G_{11} has G_{21} as its unmovable part and the group G_{12} has G_{22} as its unmovable part.
Indices could be swapped only between moveable parts, but computation must involve both parts. Thus all previous equations remain unchanged. The change is only in the selection phase which finds the pair of indices to swap.
It can look like that the number of groups increases exponentially for lower planes. But this is not the case. Previous groups are split into two parts. One pair of groups is selected as moveable and all other subgroups are merged into their unmovable residuals.
We split a set of indices into two disjoint groups in the previous text. We supposed that both groups have same cardinality in the preceeding text. This property was very useful for previous computation where a preprocessing step for a binary predictor was made. Here is shown, how this constraint can be released. Following simplest operator^{5} encloses our approach to the pretty system. All operators could be created from this one.
Unfortunately I cannot find any application which require this operator now. But I hope that it will be userfull for some things.
0.3
We have defined the operator for swapping indices between two groups. Multiple application of this operator allows to rearrange indices inside groups while preserving cardinality of groups. For changing cardinality, we need also an operator for removing one index from one group and insert it to another one. Let us suppose that a total sum of indices (in all groups) will remain the same in this step. Using this operator allows us to change cardinality of groups with checking and maintaining optimization criterion Q (i.e. an current number of clasps) between two groups. This is computationally efficient because the criterion need not to be computed from scratch. The simple update suffices.
Please notice image in the Fig. 7(a) where items belonging to the index w are highlighted. We need to remove the index w from the group 2 and insert it to the group 1 as is shown in Fig. 7(b). Components of quality of this index will change as follows:
 (21) 
We must know global optimization criterion for computation a changes of partial criteria W_{1}, W_{12}, W_{21}, W_{2} (recall Fig. 6(a)).
 (22) 
The global optimization criterion is:

The D value tells how many clasps will appear when D > 0 or disappear when D < 0 in the image after one step of transferring indices. We use equation (24) to compute value D:
 (25) 
Criteria from an indices quality vector w should be updated as follows:

Symbolical equation describing operator Move:

The cardinality of both groups can be freely changed with this additional operator while a number of clasps between groups is simply maintained known along all steps. Previous swap operator can be derived from this one when it is applied twice to both groups. Therefore only this more general operator can be used for permuting indices.

In the case, where we strictly want to preserve a cardinality of groups (as previous palette sorting case) the previous operator (composed from these two ones) is better. The cardinality must change after each step with using this operator and we are interested in every even step in this case. The composed operator do not allow us to change cardinality in any way. This feature significantly eases computation.
In the previous text we said, that we use our compression method [,] based on Schlessinger's predictor after proposed preprocessing phase. We cannot say, that both methods are not tuned each other. Therefore we will discuss, how a number of residuals depends on the number of clasps in each bit plane.
Please suppose, that we use original corner Schlessinger's predictor. Any other predictor (with bigger probe) must produce better or equal results. We suppose that the predictor is optimally tuned. Otherwise this is impossible to say anything. The dependence between number of clasps and a number of residuals is depicted in Fig. 8. Please note that two objects (with similar area) produce more clasps then one object.
The minimal amount of residuals per one isolated object, which is not touching the border of image is 4. Object, which produces this combination of residuals, must be a square. Our optimization is based on the assumption that changes of objects are only moderate during one iteration step. Only two many smaller objects can have the same amount of clasps as one big object.
Basically we can say, that clasps measure a length of a boundary between black and white parts of a binary image.
The total length of compressed data does not depend on the sum of residuals per all bit planes. The dependence between final compressed size and a number of residuals is not a linear function. It is a monotonic function. Therefore it is good to minimize number of residuals in the highest bit plane first.
0.4
It is not possible to change both rows and columns in the table S during their swapping for effective computation. If it would be performed like that, the algorithm would be very slow.
We use reindexing vector r see 9. We are looking to the table S through this vector S(i,j) = S^{#}(r(i),r(j)). S^{#} is an original unchanged matrix as a result of the first computation step. It is now very simple to swap one couple of rows and columns. Such operation needs only changing of two values in the vector r.
At the end of all optimizations, the original image data are sorted according to the reindexing vector. The table S can be forgotten at this point.
The proposed method was tested on six pseudocolor images obtained from the web. The results are summarized in Table . First column gives image names. Second column shows image sizes of input uncompressed images, i.e. rows × columns × number of bits per pixel. The third column depicts number of bytes of uncompressed images including palette stored in BMP format. The fourth column with header FH gives the number of bytes after our own FH compression [,] was applied. The fifth column gives the size of images if palette was first reordered according to intensity []. The rightmost column shows the influence of indices and palette modification that is contribution of the palette rearranging method suggested in this report. The length of the files is in bytes including palette after palette modification and FH compression.
Image name  Image size  Uncompressed  FH  Y + FH  Opt. + FH 
Descent  320x200x8  65078  24334  27115  21132 
Garfield  640x480x4  153718  3323    2955 
Lena, color  512x512x8  263222  235579  185275  154401 
Lynne  320x200x8  65078  59016  43506  38769 
Peppers  512x512x8  263222  206349  165315  129513 
Tartan  256x256x4  32886  1596    1478 
We wanted to learn how the proposed palette modification method helps lossless compression if other algorithm are used too.
The quantitative measure used for comparison of different methods is compression efficiency CE []
 (30) 
The gain in compression for palette color images is measured as a ratio:
 (31) 
Experiments showing how the proposed modification of the palette index function and palette works with other lossless compression algorithms are shown in Table . Besides our own FH algorithm the Calic [] and PNG (Portable Network Graphics) [] were tested. Wrong results in case PNG are caused because its compression method does some simple rearrangement of indices. Implementations of two latter methods were obtained from web. Compression method Calic supports compression for images with 256 colors only. Thus some entries in Table remained labeled unknown.
Image name  CE  Gain  CE  Gain  CE  Gain 
FH  FH  Calic  Calic  PNG  PNG  
Descent  67,6%  13,6%  50,8%  13.1%  66,5%  0.51% 
Garfield  98,1%  13,4%  unknown  unknown  97,2%  2.0% 
Lena, color  41,3%  52,2%  33,4%  39,1%  32,4%  0% 
Lynne  40,4%  52,2%  30,6%  15,7%  24,5%  0% 
Pepper  50,1%  59,3%  41,9%  49,0%  45,3%  0% 
Tartan  95,0%  11,0%  unknown  unknown  92,6%  3.2% 
Compression results of lossless compression methods improved significantly when palette index function f(x,y) and palette were modified using method suggested in this paper. The best results were obtained with our FH algorithm [,].
Let us visualize results of the palette modification algorithm in pictorial form. The image with several red and green peppers (called Pepper) was chosen as an example. The original color image is depicted in the image . Three different palette index functions are shown in the first row of Figure . These images were obtained by truncating palette and adding monotonic gray palette. The gray level image that looks most similarly to the original pseudocolor image is displayed in Figure (b), first row. Colors were converted to intensities Y = R + G + B and the palette index function was sorted according to intensity Y. Figure (a), first row, illustrates the original palette index function. Notice very many changes in it. The palette index function that is the outcome of the proposed optimization algorithm is shown in Figure (c), first row.
Nine binary images in Figure , rows 2 ¸4 provide more intuitive insight into results. Bit planes #8, #5, #1 corresponding to intensity images in the first row in Figure are shown in each column. The leftmost column shows three bit planes of the palette index function. The middle column visualizes performance of the much simpler rearrangement of the palette according to intensity Y []. The rightmost column visually demonstrates results of the palette optimization described in this paper. Notice that even bit plane #5 is relatively smooth if our modification was used. Do not forget that these three images produce the same color image.
(a) Original f(x,y) (b) f(x,y) sorted by Y (c) Optimized f(x,y)
(a) Original f(x,y) (b) f(x,y) sorted by Y (c) Optimized f(x,y)
(a) Original f(x,y) (b) f(x,y) sorted by Y (c) Optimized f(x,y)
The proposed palette index function optimization algorithm is reasonably fast. It runs around one second for an 512x512x8bit pseudocolor image on Intel Pentium with 200 MHz clock frequency. For better insight into behaviour of this method, the total time is splitted into three parts, see Tab. . First part (second column) measure the time of the computation of the matrix S. This time depends on the size of the image. Second part reflects the main computation time. This time depends on the number of indices only. And the last part evaluates a time of the remapping data according to the reindexing r vector. This part can be omitted in several cases.
Image name  Compute S  Optimize  Remap  Total å 
Lynne  0.22  0.88  0.16  1.26 
Park  0.55  1.04  0.55  2.14 
The method that optimizes palette index function of the pseudocolor image was described. The modification aims at more smooth palette index function. Such smoothing consequently yields better results of lossless compression. The method can be used prior any lossless compression technique is applied. We believe that the suggested optimization of the palette index function can be easily incorporated into standard lossless pseudocolor images compression methods. The reverse step is not needed during decompression phase. Thus no additional software is needed in image viewers. The method forgets the original positions of indices. The knowledge of original positions of indices is not significant in many cases. Only if original places of indices should remain unchanged a reindexing vector have to be remembered.
The method can be practically used for 4, 16 and 256 color palette images of any size. If there are many more colors (e.g. 16 bits per pixel) the indices adjacency table S would be huge to be stored in the memory. Theoretically, the proposed method should work even in this case.
Tests on real images demonstrate compression improvement between 10%  50%. The actual improvement depends how the original color image was quantized when pseudocolor image was created and of course on the used compression method.
If the reader wants to test the method then she/he is advised to consult www page http://cmp.felk.cvut.cz/~fojtik/ for our implementation.
The planned future work is to: (a) study more carefully the optimization and learn if there is not a computationally plausible way how to overcome local minima; (b) to formalize a good initial guess that is used prior the iterative optimization starts.
^{1} This research was supported by the Czech Ministry of Education grant VS96049, the Grant Agency of the Czech Republic 102/97/0480, 102/97/0855.
^{2} This image has same data ordering as gray level one, but without a gray interpretation.
^{3} Optimization step will be explained in section .
^{4} We split overall criterion into two parts. The first part depends on rows and/or columns a , b and the second one Q^{*} which is independent on them.
^{5} An operator is generally a function which transforms data. Such functions (operators) can be combined to achieve more complicated ones. In our case the operator allows one step in rearranging indices.