## Arithmetic Coding

Ever since I heard about Arithmetic Coding in a presentation I have wanted to try implementing it. On the surface it seems like a simple enough of an idea but I couldn’t figure out where to start. I looked at a few different implementations and one was the most helpful in explaining how it was done. It is the one found on Michael Dipperstein’s site and he gives a very thorough explanation of how it is done. Another really good resource was this paper (Arithmetic Coding Revealed) that I found invaluable while trying to understand what was going on.

During my research into the topic, I found a few things confusing. One was that there was a +1 offset of the high bound, but this was explained by the fact that we wanted to keep an open interval between 0 and 1 i.e. [0, 1). The other puzzled me for a much longer time, and that was that we were using integer division in the algorithm. How can we claim that we are computing, what i would call, the “correct” number when we are discarding a big part of the intermediate result? And then I understood that we are not.

None of the implementations actually compute the “correct” decimal number which represents the compressed form of our message. They get close but they actually drift away from the correct number by a small margin and I believe that the reason this is done is simply for performance.

The usual algorithm used is described on Michael’s page and I will copy it here:

```
lower bound = 0
upper bound = ∑ci
while there are still symbols to encode
current range = upper bound - lower bound
upper bound = lower bound + ((current range × upper bound of new symbol) ÷ ∑ci) - 1
lower bound = lower bound + (current range × lower bound of new symbol) ÷ ∑ci
end while
```

but you see, here he is trying to keep his implementation as accurate as possible. This becomes even more evident if you read the Arithmetic Coding Revealed that I have linked to in the first paragraph. The way they do it is by computing a *step* instead of the *range* like Michael does.

```
mStep = (mHigh - mLow + 1) / total;
mHigh = mLow + (mStep * high_count) - 1;
mLow = mLow + (mStep * low_count);
```

Here the division by total truncates our range first (losing precision) and then further magnifies the error by multiplying it by *high_count*. They justify this by saying that this way you can keep more bits in your register and hence get better accuracy but I would have to disagree. The larger the total the larger the error. Similarly, if you have a relatively uneven distribution of symbols where one symbol has a much higher probability than the others, every time you encode that symbol you are losing more accuracy than when you are encoding the symbols with lower probabilities. This means that the better your statistical representation of your dataset the worse your encoding will be… At least Michael’s implementation had a uniform error across all the symbols that was lower than the error of this second implementation. The reason the algorithm works is that the decoding process carries through the same error in its implementation.

Though I understand their motivation, I didn’t like the implementation and I had to go back a step and try and implement the slower but more accurate encoding method. I started thinking about how I would carry through all the *remainders* and came up with the following approach:

```
mRange = mHigh - mLow + 1;
mHigh = mLow + (mRange * high_count + mRemHigh) / total - 1;
mLow = mLow + (mRange * low_count + mRemLow) / total;
mRemHigh = (mRange * high_count) % total;
mRemLow = (mRange * low_count) % total;
```

Basically, I’m trying to carry through all of the remainders of the integer division in the algorithm, but the downside is that it is going to be a lot slower than the existing “less precise” algorithms.

My implementation is limited in one aspect that could prove to be the key reason not to use it in any more complex compression algorithm. With this “more accurate” approach I have limited my self to not being able to change the “total” variable. This is probably not that big if you are using a static probabilities but if you are using adaptive probabilities then this means that you must always keep the total the same. Although this could be easy it may incur additional computation unnecessarily.

My still evolving implementation can be found here

*post scriptum* One possible optimization would be to introduce reciprocal multiplication concepts into the code thereby removing the one and only division. Given that we already must keep *total* constant in my new algorithm this would be a good optimization… However, this still wouldn’t speed it up sufficiently to compete with the popular implementation. Also it is likely that the compiler already does this and that trying to optimize it prematurely could cause us more grief than good.