Arithmetic Coding
Ever since I heard about Arithmetic Coding in a presentation I have wanted to try implementing it. On the surface it seems like a simple enough of an idea but I couldn’t figure out where to start. I looked at a few different implementations and one was the most helpful in explaining how it was done. It is the one found on Michael Dipperstein’s site and he gives a very thorough explanation of how it is done. Another really good resource was this paper (Arithmetic Coding Revealed) that I found invaluable while trying to understand what was going on.
During my research into the topic, I found a few things confusing. One was that there was a +1 offset of the high bound, but this was explained by the fact that we wanted to keep an open interval between 0 and 1 i.e. [0, 1). The other puzzled me for a much longer time, and that was that we were using integer division in the algorithm. How can we claim that we are computing, what i would call, the “correct” number when we are discarding a big part of the intermediate result? And then I understood that we are not.
None of the implementations actually compute the “correct” decimal number which represents the compressed form of our message. They get close but they actually drift away from the correct number by a small margin and I believe that the reason this is done is simply for performance.
The usual algorithm used is described on Michael’s page and I will copy it here:
lower bound = 0
upper bound = ∑ci
while there are still symbols to encode
current range = upper bound - lower bound
upper bound = lower bound + ((current range × upper bound of new symbol) ÷ ∑ci) - 1
lower bound = lower bound + (current range × lower bound of new symbol) ÷ ∑ci
end while
but you see, here he is trying to keep his implementation as accurate as possible. This becomes even more evident if you read the Arithmetic Coding Revealed that I have linked to in the first paragraph. The way they do it is by computing a step instead of the range like Michael does.
mStep = (mHigh - mLow + 1) / total;
mHigh = mLow + (mStep * high_count) - 1;
mLow = mLow + (mStep * low_count);
Here the division by total truncates our range first (losing precision) and then further magnifies the error by multiplying it by high_count. They justify this by saying that this way you can keep more bits in your register and hence get better accuracy but I would have to disagree. The larger the total the larger the error. Similarly, if you have a relatively uneven distribution of symbols where one symbol has a much higher probability than the others, every time you encode that symbol you are losing more accuracy than when you are encoding the symbols with lower probabilities. This means that the better your statistical representation of your dataset the worse your encoding will be… At least Michael’s implementation had a uniform error across all the symbols that was lower than the error of this second implementation. The reason the algorithm works is that the decoding process carries through the same error in its implementation.
Though I understand their motivation, I didn’t like the implementation and I had to go back a step and try and implement the slower but more accurate encoding method. I started thinking about how I would carry through all the remainders and came up with the following approach:
mRange = mHigh - mLow + 1;
mHigh = mLow + (mRange * high_count + mRemHigh) / total - 1;
mLow = mLow + (mRange * low_count + mRemLow) / total;
mRemHigh = (mRange * high_count) % total;
mRemLow = (mRange * low_count) % total;
Basically, I’m trying to carry through all of the remainders of the integer division in the algorithm, but the downside is that it is going to be a lot slower than the existing “less precise” algorithms.
My implementation is limited in one aspect that could prove to be the key reason not to use it in any more complex compression algorithm. With this “more accurate” approach I have limited my self to not being able to change the “total” variable. This is probably not that big if you are using a static probabilities but if you are using adaptive probabilities then this means that you must always keep the total the same. Although this could be easy it may incur additional computation unnecessarily.
My still evolving implementation can be found here
post scriptum One possible optimization would be to introduce reciprocal multiplication concepts into the code thereby removing the one and only division. Given that we already must keep total constant in my new algorithm this would be a good optimization… However, this still wouldn’t speed it up sufficiently to compete with the popular implementation. Also it is likely that the compiler already does this and that trying to optimize it prematurely could cause us more grief than good.
How an ARM Cortex M3 boots
This post is a part of my documentation on the process I went through while learning how to write a bootloader for the Atmel SAM3N processor family. This is for my own reference but is very detailed in the hope that it might help novices by explaining the thought process I went through.
Lets get to it:
For obvious reasons, writing a bootloader for any machine requires you to understand that machine and how it boots (starts up). All processors have a particular memory address from which they begin to execute code. Most examples that I’ve seen in my computer architecture class used the address 0 since it made the examples simple. There is no reason that it has to be 0, it can be any address. The best place to find out the address from which your microcontroller starts up is either in the datasheet or in the linker script that comes with the IDE from your chip vendor.
Using the datasheet for our example processor series SAM3N from Atmel, which you can find here, (note: this datasheet also includes information about the ARM Cortex M3 architecture that is also available from ARM here, or in pdf here) we can find out how the processor boots by reading the following sections:
- 7.2.4 Boot Strategies
- 7. Product Mapping (diagram)
- 10.4.3.2 Stack Pointer
- 10.4.3.4 Program Counter
- 10.6.2.1 Reset
- 10.6.4 Vector table
If you are unfamiliar with what a Stack Pointer or Program Counter are you need to learn about computer architecture. The briefest of introductions is on this site, but I’m not sure that it is detailed enough for beginners. The ARM Cortex M3 documentation tells us the following:
7.2.4 Boot StrategiesThe system always boots at address 0x0. To ensure a maximum boot possibilities the memory layout can be changed via GPNVM.A general purpose NVM (GPNVM) bit is used to boot either on the ROM (default) or from the Flash.The GPNVM bit can be cleared or set respectively through the commands “Clear General-pur-pose NVM Bit” and “Set General-purposeNVM Bit” of the EEFC User Interface.Setting the GPNVM Bit 1 selects the boot from the Flash, clearing it selects the boot from the ROM. Asserting ERASE clears the GPNVM Bit 1 and thus selects the boot from the ROM by default.10.4.3.2 Stack PointerThe Stack Pointer (SP) is register R13. In Thread mode, bit[1] of the CONTROL register indicates the stack pointer to use:
- 0 = Main Stack Pointer (MSP). This is the reset value.
- 1 = Process Stack Pointer (PSP).
On reset, the processor loads the MSP with the value from address 0x00000000.10.4.3.4 Program CounterThe Program Counter (PC) is register R15. It contains the current program address. Bit[0] is always 0 because instruction fetches must be halfword aligned. On reset, the processor loads the PC with the value of the reset vector, which is at address 0x00000004.10.6.2.1 ResetReset is invoked on power up or a warm reset. The exception model treats reset as a special form of exception. When reset is asserted, the operation of the processor stops, potentially at any point in an instruction. When reset is deasserted, execution restarts from the address provided by the reset entry in the vector table. Execution restarts as privileged execution in Thread mode.
10.6.4 Vector tableThe vector table contains the reset value of the stack pointer, and the start addresses, also called exception vectors, for all exception handlers.Figure 10-3 on page 67 shows the order of the exception vectors in the vector table. The least-significant bit of each vector must be 1, indicating that the exception handler is Thumb code.
7.2.4 Boot Strategies… the memory layout can be changed via GPNVM.…
10.6.4 Vector table… Privileged software can write to the VTOR to relocate the vector table start address to a different memory location …
You will find the vector table defined as:
typedef void (*IntFunc) (void);
/* Exception Table */
__attribute__ ((section(".vectors")))
IntFunc exception_table[] = {
/* Configure Initial Stack Pointer, using linker-generated symbols */
(IntFunc) (&_estack),
Reset_Handler,
NMI_Handler,
HardFault_Handler,
MemManage_Handler,
BusFault_Handler,
UsageFault_Handler,
0, 0, 0, 0, /* Reserved */
SVC_Handler,
DebugMon_Handler,
0, /* Reserved */
PendSV_Handler,
SysTick_Handler,
/* Configurable interrupts */
SUPC_Handler, /* 0 Supply Controller */
RSTC_Handler, /* 1 Reset Controller */
RTC_Handler, /* 2 Real Time Clock */
RTT_Handler, /* 3 Real Time Timer */
WDT_Handler, /* 4 Watchdog Timer */
PMC_Handler, /* 5 PMC */
EFC_Handler, /* 6 EEFC */
Dummy_Handler, /* 7 Reserved */
UART0_Handler, /* 8 UART0 */
UART1_Handler, /* 9 UART1 */
Dummy_Handler, /* 10 Reserved */
PIOA_Handler, /* 11 Parallel IO Controller A */
PIOB_Handler, /* 12 Parallel IO Controller B */
#ifdef ID_PIOC
PIOC_Handler, /* 13 Parallel IO Controller C */
#else
Dummy_Handler,
#endif
USART0_Handler, /* 14 USART 0 */
#ifdef ID_USART1
USART1_Handler, /* 15 USART 1 */
#else
Dummy_Handler,
#endif
Dummy_Handler, /* 16 Reserved */
Dummy_Handler, /* 17 Reserved */
Dummy_Handler, /* 18 Reserved */
TWI0_Handler, /* 19 TWI 0 */
TWI1_Handler, /* 20 TWI 1 */
SPI_Handler, /* 21 SPI */
Dummy_Handler, /* 22 Reserved */
TC0_Handler, /* 23 Timer Counter 0 */
TC1_Handler, /* 24 Timer Counter 1 */
TC2_Handler, /* 25 Timer Counter 2 */
#ifdef ID_TC3
TC3_Handler, /* 26 Timer Counter 3 */
#else
Dummy_Handler,
#endif
#ifdef ID_TC4
TC4_Handler, /* 27 Timer Counter 4 */
#else
Dummy_Handler,
#endif
#ifdef ID_TC5
TC5_Handler, /* 28 Timer Counter 5 */
#else
Dummy_Handler,
#endif
ADC_Handler, /* 29 ADC controller */
#ifdef ID_DACC
DACC_Handler, /* 30 DAC controller */
#else
Dummy_Handler,
#endif
PWM_Handler, /* 31 PWM */
Dummy_Handler /* 32 not used */
};
Ignore everything except the first two pointers:
(IntFunc) (&_estack),
Reset_Handler,
This is our Stack Pointer value and the address where the code will begin execution. So it is clear that our code starts executing from the Reset_Handler() function and that our Stack Pointer is defined as the address of the estack symbol.
But how is it ensured that the compiler always places the vector table at the address 0x00400000 as is required for the Reset_Handler() to be at address 0x00400004? Well, this is answered in the linker script. If you’re unfamiliar with linker scripts I recommend reading The GNU Linker PDF.
The linker scripts are also autogenerated and are included in your solution, in the cmsis/src/linkerScripts/ folder. There are two linker scripts of interest. The generic one for the SAM3N family sam3n_flash.ld and the specific one for your chip sam3nXX_flash, where XX denotes the specific code for your chip e.g. 1A, 1B, 1C etc.
Let’s examine the specific linker script first. Its sole purpose is to define the memory regions of your chip. It looks like this:
/* Memory Spaces Definitions */
MEMORY
{
rom (rx) : ORIGIN = 0x00400000, LENGTH = 0x00040000
ram (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00006000
}
/* The stack size used by the application. NOTE: you need to adjust according to your application. */
STACK_SIZE = DEFINED(STACK_SIZE) ? STACK_SIZE : 0x3000;
And all it says is that the memory that spans 0x00400000 to 0x00440000 will be called rom, will be read-only and we can executed code from it. ram is defined as read-write, we can execute from it and it spans 0x20000000 to 0x20006000.
Now lets have a look at the generic code. It’s quite long so I will only take the relevant bit.
/* Section Definitions */
SECTIONS
{
.text :
{
. = ALIGN(4);
_sfixed = .;
KEEP(*(.vectors .vectors.*))
*(.text .text.* .gnu.linkonce.t.*)
// There's more code here ...
} > rom
This says that no matter what the first thing that is placed into memory is the .vectors section. This brings me back to the definition of our exception_vectors or vector table. Remember it had the
__attribute__ ((section(".vectors")))
before the definition? Well, if you are unfamiliar with the GCC attribute syntax it means that this variable is placed into the .vectors section.
This is how Reset_Handler() is made to be the first function executed in our code. However, we always thought our program started with main() and if you wrote code for the SAM3N before you would have been presented with the main() function as the one that you need to implement.
The Reset_Handler() executes first because before main() can execute we need to prepare our memory for the program. I think that the Reset_Handler() code is self explanatory
void Reset_Handler(void)
{
uint32_t *pSrc, *pDest;
/* Initialize the relocate segment */
pSrc = &_etext;
pDest = &_srelocate;
if (pSrc != pDest) {
for (; pDest < &_erelocate;) {
*pDest++ = *pSrc++;
}
}
/* Clear the zero segment */
for (pDest = &_szero; pDest < &_ezero;) {
*pDest++ = 0;
}
/* Set the vector table base address */
pSrc = (uint32_t *) & _sfixed;
SCB->VTOR = ((uint32_t) pSrc & SCB_VTOR_TBLOFF_Msk);
/* Initialize the C library */
__libc_init_array();
/* Branch to main function */
main();
/* Infinite loop */
while (1);
}
The relocate segment is the part of the memory where you have defined globals that are initialized to non-zero values. For example:
int numbersArray[] = { 17, 18, 19, 20, 21 };
int main()
{
numbersArray[4] = 7;
return 0;
}
In the example above we initialised numbersArray to some values. Since we are able to modify this array at run time it can’t stay in Flash. But if we put it in RAM it will lose its value when the ARM Cortex M3 resets and on reset it must be initialized once again. This means that we need to store the initialization of the numbersArray in Flash but copy it into RAM before we start our program. This is what the Reset_Handler() does.
The zero segment (.bss section) initialization, also done in the Reset_Handler() needs to be there because when we reset we never know what state the RAM will be in (even if there is a known state of RAM when the power is off, if you reset and power stays on most often the RAM retains its values). In the zero segment live all of our global variables that we haven’t initialized or we initialized to zero. For example:
int UninitializedInt;
int ZeroInitializedInt = 0;
int main()
{
UninitializedInt = 1;
ZeroInitializedInt = 2;
return 0;
}
Both of these variables would be placed in the .bss section or the zero segment. The zero segment isn’t included in your compiled binary. There is no point to include it since we already know what it is. We simply note down where it starts, where it ends and zero the whole lot when we reset.
After this, we initialize the C library and execute main().