J7A3AP

This is a blog of stuff that I do, like or want people to know

Arithmetic Coding

Ever since I heard about Arithmetic Coding in a presentation I have wanted to try implementing it. On the surface it seems like a simple enough of an idea but I couldn’t figure out where to start. I looked at a few different implementations and one was the most helpful in explaining how it was done. It is the one found on Michael Dipperstein’s site and he gives a very thorough explanation of how it is done. Another really good resource was this paper (Arithmetic Coding Revealed) that I found invaluable while trying to understand what was going on.

During my research into the topic, I found a few things confusing. One was that there was a +1 offset of the high bound, but this was explained by the fact that we wanted to keep an open interval between 0 and 1 i.e. [0, 1). The other puzzled me for a much longer time, and that was that we were using integer division in the algorithm. How can we claim that we are computing, what i would call, the “correct” number when we are discarding a big part of the intermediate result? And then I understood that we are not.

None of the implementations actually compute the “correct” decimal number which represents the compressed form of our message. They get close but they actually drift away from the correct number by a small margin and I believe that the reason this is done is simply for performance.

The usual algorithm used is described on Michael’s page and I will copy it here:

lower bound = 0
upper bound = ∑ci

while there are still symbols to encode
    current range = upper bound - lower bound
    upper bound = lower bound + ((current range × upper bound of new symbol) ÷ ∑ci) - 1
    lower bound = lower bound + (current range × lower bound of new symbol) ÷ ∑ci
end while

but you see, here he is trying to keep his implementation as accurate as possible. This becomes even more evident if you read the Arithmetic Coding Revealed that I have linked to in the first paragraph. The way they do it is by computing a step instead of the range like Michael does.

mStep = (mHigh - mLow + 1) / total;
mHigh = mLow + (mStep * high_count) - 1;
mLow = mLow + (mStep * low_count);

Here the division by total truncates our range first (losing precision) and then further magnifies the error by multiplying it by high_count. They justify this by saying that this way you can keep more bits in your register and hence get better accuracy but I would have to disagree. The larger the total the larger the error. Similarly, if you have a relatively uneven distribution of symbols where one symbol has a much higher probability than the others, every time you encode that symbol you are losing more accuracy than when you are encoding the symbols with lower probabilities. This means that the better your statistical representation of your dataset the worse your encoding will be… At least Michael’s implementation had a uniform error across all the symbols that was lower than the error of this second implementation. The reason the algorithm works is that the decoding process carries through the same error in its implementation.

Though I understand their motivation, I didn’t like the implementation and I had to go back a step and try and implement the slower but more accurate encoding method. I started thinking about how I would carry through all the remainders and came up with the following approach:

mRange = mHigh - mLow + 1;
mHigh = mLow + (mRange * high_count + mRemHigh) / total - 1;
mLow = mLow + (mRange * low_count + mRemLow) / total;
mRemHigh = (mRange * high_count) % total;
mRemLow = (mRange * low_count) % total;

Basically, I’m trying to carry through all of the remainders of the integer division in the algorithm, but the downside is that it is going to be a lot slower than the existing “less precise” algorithms.

My implementation is limited in one aspect that could prove to be the key reason not to use it in any more complex compression algorithm. With this “more accurate” approach I have limited my self to not being able to change the “total” variable. This is probably not that big if you are using a static probabilities but if you are using adaptive probabilities then this means that you must always keep the total the same. Although this could be easy it may incur additional computation unnecessarily.

My still evolving implementation can be found here

post scriptum One possible optimization would be to introduce reciprocal multiplication concepts into the code thereby removing the one and only division. Given that we already must keep total constant in my new algorithm this would be a good optimization… However, this still wouldn’t speed it up sufficiently to compete with the popular implementation. Also it is likely that the compiler already does this and that trying to optimize it prematurely could cause us more grief than good.

How an ARM Cortex M3 boots

This post is a part of my documentation on the process I went through while learning how to write a bootloader for the Atmel SAM3N processor family. This is for my own reference but is very detailed in the hope that it might help novices by explaining the thought process I went through.

Lets get to it:

For obvious reasons, writing a bootloader for any machine requires you to understand that machine and how it boots (starts up). All processors have a particular memory address from which they begin to execute code. Most examples that I’ve seen in my computer architecture class used the address 0 since it made the examples simple. There is no reason that it has to be 0, it can be any address. The best place to find out the address from which your microcontroller starts up is either in the datasheet or in the linker script that comes with the IDE from your chip vendor.

Using the datasheet for our example processor series SAM3N from Atmel, which you can find here, (note: this datasheet also includes information about the ARM Cortex M3 architecture that is also available from ARM here, or in pdf here) we can find out how the processor boots by reading the following sections:

  1. 7.2.4 Boot Strategies
  2. 7. Product Mapping (diagram)
  3. 10.4.3.2 Stack Pointer
  4. 10.4.3.4 Program Counter
  5. 10.6.2.1 Reset
  6. 10.6.4 Vector table

If you are unfamiliar with what a Stack Pointer or Program Counter are you need to learn about computer architecture. The briefest of introductions is on this site, but I’m not sure that it is detailed enough for beginners. The ARM Cortex M3 documentation tells us the following:

7.2.4 Boot Strategies
The system always boots at address 0x0. To ensure a maximum boot possibilities the memory layout can be changed via GPNVM.
A general purpose NVM (GPNVM)  bit is used to boot either on the ROM (default) or from the Flash.
The GPNVM bit can be cleared or set respectively through the commands “Clear General-pur-pose NVM Bit” and “Set General-purposeNVM Bit” of the EEFC User Interface.
Setting the GPNVM Bit 1 selects the boot from the Flash, clearing it selects the boot from the ROM. Asserting ERASE clears the GPNVM Bit 1 and thus selects the boot  from the ROM by default.
10.4.3.2 Stack Pointer
The Stack Pointer (SP) is register R13. In Thread mode, bit[1] of the CONTROL register indicates the stack pointer to use:
  • 0 = Main Stack Pointer (MSP). This is the reset value.
  • 1 = Process Stack Pointer (PSP).
On reset, the processor loads the MSP with the value from address 0x00000000.
10.4.3.4 Program Counter
The Program Counter (PC) is register R15. It contains the current program address. Bit[0] is always 0 because instruction fetches must be halfword aligned. On reset, the processor loads the PC with the value of the reset vector, which is at address 0x00000004.
10.6.2.1 Reset
Reset is invoked on power up or a warm reset. The exception model treats reset as a special form of exception. When reset is asserted, the operation of the processor stops, potentially at any point in an instruction. When reset is deasserted, execution restarts from the address provided by the reset entry in the vector table. Execution restarts as privileged execution in Thread mode.
Now this explains things but not clearly enough for me when I first read it. What was evident, and reiterated again in section 10.6.2.9 Interrupt (IRQ) Table 10-9, Reset Vector, is that when an ARM Cortex M3 boots, the processor loads the Stack Pointer from the first 32 bit integer in the memory and loads the Program Counter from the second 32 bit integer in memory.
According to the product mapping our Flash is mapped at address 0x00400000. Since Flash is the only non-volatile non-readonly memory available that is where our code will run from. But how does the SAM3N boot OUR code? Does the built in bootloader just start executing code from 0x00400000 when we set that GPNVM Bit 1? Or does it do the same thing and load the address from which it will start executing from some special region of flash? We need to keep reading, but it seems we weren’t far from the key bit of information. In section 10.6.4 Vector table we discover the following:
10.6.4 Vector table
The vector table contains the reset value of the stack pointer, and the start addresses, also called exception vectors, for all exception handlers.
Figure 10-3 on page 67 shows the order of the exception vectors in the vector table. The least-significant bit of each vector must be 1, indicating that the exception handler is Thumb code.
On system reset, the vector table is fixed at address 0x00000000. Privileged software can write to the VTOR to relocate the vector table start address to a different memory location, in the range 0x00000080 to 0x3FFFFF80, see “Vector Table Offset Register” on page 170.
Finally we have enough information to make a reasonable assumption. There are two bits that I found crucial:
7.2.4 Boot Strategies
… the memory layout can be changed via GPNVM.
and
10.6.4 Vector table
… Privileged software can write to the VTOR to relocate the vector table start address to a different memory location …
This tells me that our code will start from Flash by having a vector table be the very first thing in memory and from that vector table the processor will initialize our stack and program counter and begin execution.
This conclusion was supported by the code that was autogenerated by Atmel Studio for my test project.
When you generate your solution the file cmsis/src/startup_sam3n.c contains your vector table definition and the reset vector.

You will find the vector table defined as:

typedef void (*IntFunc) (void);

/* Exception Table */
__attribute__ ((section(".vectors")))
IntFunc exception_table[] = {

    /* Configure Initial Stack Pointer, using linker-generated symbols */
    (IntFunc) (&_estack),
    Reset_Handler,

    NMI_Handler,
    HardFault_Handler,
    MemManage_Handler,
    BusFault_Handler,
    UsageFault_Handler,
    0, 0, 0, 0,         /* Reserved */
    SVC_Handler,
    DebugMon_Handler,
    0,                  /* Reserved  */
    PendSV_Handler,
    SysTick_Handler,

    /* Configurable interrupts  */
    SUPC_Handler,    /* 0  Supply Controller */
    RSTC_Handler,    /* 1  Reset Controller */
    RTC_Handler,     /* 2  Real Time Clock */
    RTT_Handler,     /* 3  Real Time Timer */
    WDT_Handler,     /* 4  Watchdog Timer */
    PMC_Handler,     /* 5  PMC */
    EFC_Handler,     /* 6  EEFC */
    Dummy_Handler,   /* 7  Reserved */
    UART0_Handler,   /* 8  UART0 */
    UART1_Handler,   /* 9  UART1 */
    Dummy_Handler,   /* 10 Reserved */
    PIOA_Handler,    /* 11 Parallel IO Controller A */
    PIOB_Handler,    /* 12 Parallel IO Controller B */
#ifdef ID_PIOC
    PIOC_Handler,    /* 13 Parallel IO Controller C */
#else
    Dummy_Handler,
#endif
    USART0_Handler,  /* 14 USART 0 */
#ifdef ID_USART1
    USART1_Handler,  /* 15 USART 1 */
#else
    Dummy_Handler,
#endif
    Dummy_Handler,   /* 16 Reserved */
    Dummy_Handler,   /* 17 Reserved */
    Dummy_Handler,   /* 18 Reserved */
    TWI0_Handler,    /* 19 TWI 0 */
    TWI1_Handler,    /* 20 TWI 1 */
    SPI_Handler,     /* 21 SPI */
    Dummy_Handler,   /* 22 Reserved */
    TC0_Handler,     /* 23 Timer Counter 0 */
    TC1_Handler,     /* 24 Timer Counter 1 */
    TC2_Handler,     /* 25 Timer Counter 2 */
#ifdef ID_TC3
    TC3_Handler,     /* 26 Timer Counter 3 */
#else
    Dummy_Handler,
#endif
#ifdef ID_TC4
     TC4_Handler,     /* 27 Timer Counter 4 */
#else
    Dummy_Handler,
#endif
#ifdef ID_TC5
    TC5_Handler,     /* 28 Timer Counter 5 */
#else
    Dummy_Handler,
#endif
    ADC_Handler,     /* 29 ADC controller */
#ifdef ID_DACC
    DACC_Handler,    /* 30 DAC controller */
#else
    Dummy_Handler,
#endif
    PWM_Handler,     /* 31 PWM */
    Dummy_Handler    /* 32 not used */
};

Ignore everything except the first two pointers:

(IntFunc) (&_estack),
Reset_Handler,

This is our Stack Pointer value and the address where the code will begin execution. So it is clear that our code starts executing from the Reset_Handler() function and that our Stack Pointer is defined as the address of the estack symbol.

But how is it ensured that the compiler always places the vector table at the address 0x00400000 as is required for the Reset_Handler() to be at address 0x00400004? Well, this is answered in the linker script. If you’re unfamiliar with linker scripts I recommend reading The GNU Linker PDF.

The linker scripts are also autogenerated and are included in your solution, in the cmsis/src/linkerScripts/ folder. There are two linker scripts of interest. The generic one for the SAM3N family sam3n_flash.ld and the specific one for your chip sam3nXX_flash, where XX denotes the specific code for your chip e.g. 1A, 1B, 1C etc.

Let’s examine the specific linker script first. Its sole purpose is to define the memory regions of your chip. It looks like this:

/* Memory Spaces Definitions */
MEMORY
{
  rom (rx)  : ORIGIN = 0x00400000, LENGTH = 0x00040000
  ram (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00006000
}

/* The stack size used by the application. NOTE: you need to adjust according to your application. */
STACK_SIZE = DEFINED(STACK_SIZE) ? STACK_SIZE : 0x3000;

And all it says is that the memory that spans 0x00400000 to 0x00440000 will be called rom, will be read-only and we can executed code from it. ram is defined as read-write, we can execute from it and it spans 0x20000000 to 0x20006000.

Now lets have a look at the generic code. It’s quite long so I will only take the relevant bit.

/* Section Definitions */
SECTIONS
{
    .text :
    {
        . = ALIGN(4);
        _sfixed = .;
        KEEP(*(.vectors .vectors.*))
        *(.text .text.* .gnu.linkonce.t.*)
       // There's more code here ...
    } > rom

This says that no matter what the first thing that is placed into memory is the .vectors section. This brings me back to the definition of our exception_vectors or vector table. Remember it had the

__attribute__ ((section(".vectors")))

before the definition? Well, if you are unfamiliar with the GCC attribute syntax it means that this variable is placed into the .vectors section.

This is how Reset_Handler() is made to be the first function executed in our code. However, we always thought our program started with main() and if you wrote code for the SAM3N before you would have been presented with the main() function as the one that you need to implement.

The Reset_Handler() executes first because before main() can execute we need to prepare our memory for the program. I think that the Reset_Handler() code is self explanatory

void Reset_Handler(void)
{
        uint32_t *pSrc, *pDest;

        /* Initialize the relocate segment */
        pSrc = &_etext;
        pDest = &_srelocate;

        if (pSrc != pDest) {
                for (; pDest < &_erelocate;) {
                        *pDest++ = *pSrc++;
                }
        }

        /* Clear the zero segment */
        for (pDest = &_szero; pDest < &_ezero;) {
                *pDest++ = 0;
        }

        /* Set the vector table base address */
        pSrc = (uint32_t *) & _sfixed;
        SCB->VTOR = ((uint32_t) pSrc & SCB_VTOR_TBLOFF_Msk);

        /* Initialize the C library */
        __libc_init_array();

        /* Branch to main function */
        main();

        /* Infinite loop */
        while (1);
}

The relocate segment is the part of the memory where you have defined globals that are initialized to non-zero values. For example:

int numbersArray[] = { 17, 18, 19, 20, 21 };

int main()
{
    numbersArray[4] = 7;
    return 0;
}

In the example above we initialised numbersArray to some values. Since we are able to modify this array at run time it can’t stay in Flash. But if we put it in RAM it will lose its value when the ARM Cortex M3 resets and on reset it must be initialized once again. This means that we need to store the initialization of the numbersArray in Flash but copy it into RAM before we start our program. This is what the Reset_Handler() does.

The zero segment (.bss section) initialization, also done in the Reset_Handler() needs to be there because when we reset we never know what state the RAM will be in (even if there is a known state of RAM when the power is off, if you reset and power stays on most often the RAM retains its values). In the zero segment live all of our global variables that we haven’t initialized or we initialized to zero. For example:

int UninitializedInt;
int ZeroInitializedInt = 0;

int main()
{
    UninitializedInt = 1;
    ZeroInitializedInt = 2;
    return 0;
}

Both of these variables would be placed in the .bss section or the zero segment. The zero segment isn’t included in your compiled binary. There is no point to include it since we already know what it is. We simply note down where it starts, where it ends and zero the whole lot when we reset.

After this, we initialize the C library and execute main().