Documents:
PM: https://www.st.com/resource/en/programming_manual/pm0214-stm32-cortexm4-mcus-and-mpus-programming-manual-stmicroelectronics.pdf
FreeRTOS: https://github.com/orgs/FreeRTOS/repositories?type=all
Start by understanding Cortex-M4 stack frame layout and architecture
For instance:
void OS_TaskCreate(TCB_t *tcb, void (*task_func)(void), uint32_t *stack_base, uint32_t stack_size)
{
// Cortex-M is a full descending architecture
// Therefore, the top pointer must point to the last element of the array
uint32_t *sp = stack_base + stack_size; // Starting at the highest
// Stack of hardware, auto PUSH/POP
sp--;
*sp = 0x01000000; // xPSR: Must set bit 24 to 1 (Thumb state), otherwise it will return HardFault
sp--;
*sp = (uint32_t)task_func; // PC: Program Counter point to the task function
sp--;
*sp = 0x00000000; // LR: Link Resigter
sp--;
*sp = 0x12121212; // R12
sp--;
*sp = 0x03030303; // R3
sp--;
*sp = 0x02020202; // R2
sp--;
*sp = 0x01010101; // R1
sp--;
*sp = 0x00000000; // R0
//......
This is a stack of hardware, due to the fact that Cortex-M’s architecture is descending so the top pointer must point to the last element of the array
Note:
xPSR: Cortex-M4 only supports Thumb state execution. When performing an exception return to start this task, bit 24 (T-bit) of the stacked xPSR must be set to 1. Otherwise, the processor will trigger a UsageFault/HardFault.PendSV exception.The rest of the stack frame will be in the software stack. Continue the automatic push/pop stack, 8 registers left (R4 - R11) must be taken care by PendSV function. We will complete the rest of the code:
//......
// Stack of software (PendSV), PUSH/POP by itself
sp--;
*sp = 0x11111111; // R11
sp--;
*sp = 0x10101010; // R10
sp--;
*sp = 0x09090909; // R9
sp--;
*sp = 0x08080808; // R8
sp--;
*sp = 0x07070707; // R7
sp--;
*sp = 0x06060606; // R6
sp--;
*sp = 0x05050505; // R5
sp--;
*sp = 0x04040404; // R4
tcb->sp = sp; // Store the current pointer to TCB
}
Note:
Next, we will write “The HeartBeat”, which called SysTick Timer.
RTOS need to know a “tick” so that it can run a next task. We will use the SysTick Timer that already exist in every Cortex-M cores.
Starting by initialize Bare-Metal registers (Read more in PM) :
// System Control Space (SCS) Registers
#define STK_CTRL *((volatile uint32_t *)0xE000E010) // Control and Status
#define STK_LOAD *((volatile uint32_t *)0xE000E014) // Reload Value
#define STK_VAL *((volatile uint32_t *)0xE000E018) // Current Value
#define SCB_ICSR *((volatile uint32_t *)0xE000ED04) // Interrupt Control and State Register
SysTick and SysTick_Handler function:
void OS_InitSysTick(uint32_t ticks) {
STK_LOAD = ticks - 1; // Set the reload counter
STK_VAL = 0; // Reset current counter
// Enable SysTick, Enable Interrupt, Use Processor Clock
STK_CTRL |= (1 << 0) | (1 << 1) | (1 << 2);
}
TCB_t *current_task; // Initilize current_task pointer
TCB_t *next_task; // Initilize next_task pointer
extern TCB_t tcb1; // Initilize the first task control block
extern TCB_t tcb2; // The second task control block
/*The rule is very simple. So if the current task is in task control block 1 (tcb1)
so the next task will be in task control 2 (tcb2). It also work in reversed pattern.
*/
void Systick_Handler(void)
{
if (current_task == &tcb1)
{
next_task = &tcb2;
}
else
{
next_task = &tcb1;
}
// Set bit 28 to 1 -> enable PENDSVSET
SCB_ICSR |= (1 << 28);
}
Note:
extern to manages the visibility and linkage of variables and functions across multiple source files → so that other files can access to it.Up next, we write the PendSV_Handler for switching tasks (only 2 for now). The rule is very simple:
__attribute__((naked)) void PendSV_Handler(void)
{
__asm volatile( // Using Volatile is for not to be omtipmize by the complier
"CPSID I \\n" // Disable interrupt
"MRS R0, PSP \\n" // Read the top of the current stack of task A (PSP) and then put into R0
/*=== STORE TASK A ===*/
"LDR R1, =current_task \\n" // Take the address of the current task
"LDR R2, [R1] \\n" // Take R1's content, which is TCB of task A
"CBZ R2, restore_context \\n" // Compare and Branch on Zero: If TCB_A is NULL (first context switch), jump straight to restore Task B
"STMDB R0!, {R4-R11} \\n" /* STMDB: Store Multiple Decrement before. It takes values from
R4 to R11 then stores into SRAM which R0 points to. "!" means
after storing, R0 will automatically descend to the new place */
"STR R0, [R2] \\n" // Store R0 into the first stack of TCB_A (TCB->sp)
/*=== SWITCH TASK ===*/
"restore_context: \\n"
"LDR R3, =next_task \\n" // Take the address of next_task
"LDR R4, [R3] \\n" // Take R3's content, which is TCB of task B
"STR R4, [R1] \\n" // Overwrite the current task with Task B. Right now, the system knows that Task B is running
/*=== RESTORE TASK B ===*/
"LDR R0, [R4] \\n" // Read the address of tcb->sp of Task B then put it into R0. R0 is currently pointing to the top of Task B's stack
"LDMIA R0!, {R4-R11} \\n" /* LDMIA (Load Multiple Increment After). It takes the data from SRAM and
stores into R4 to R11 of CPU. "!" makes R0 go up */
"MSR PSP, R0 \\n" // Help CPU notice that stack pointer of task B is in R0 (SỬA Ở ĐÂY: MSR thay vì MSP)
"CPSIE I \\n" // Enable interrupt
"BX LR \\n" // Branch Exchange
);
}
Note:
__attribute__((naked)) ? The reason is for the compiler not to fix or write other Assembly code in your function, it let you do on your own. PendSV_Handler need to be “naked” due to the fact that it need store the right stacks and restore the right registers.
Next, we write the SVCall_Handler function for starting the first task (keep in the mind that this function only run once). It consist of these steps:
tcb->sp).__attribute__((naked)) void SVCall_Handler(void)
{
__asm volatile(
"LDR R0, =current_task \\n" // Take the current task's address
"LDR R1, [R0] \\n" // Take TCB of the first task
"LDR R0, [R1] \\n" // Take the Stack Pointer of this task (tcb->sp)
"LDMIA R0!, {R4-R11} \\n" // Take data from 8 registers (R4-R11) and put into SRAM
"MSR PSP, R0 \\n" // Update the current stack's top into PSP regsiter
"ORR LR, LR, #0x04 \\n" // Set bit 2 to 1 in LR (Link Register) register
"CPSIE I \\n" // Enable interrupt
"BX LR \\n" // Branch Exchange
);
}
Note:
ORR LR, LR, #0x04 \\n" : So when entering the SVCall for main, the ARM will automatically put special code into Link Register to force the CPU using the Main Stack Pointer (MSP) is usually 0xFFFFFFF9. So when we “OR” it with #0x04 , it will become 0xFFFFFFD and it will force the CPU to use the Process Stack Pointer (PSP) for thread mode.SVC 0 (via OS_Start() that we will dicuss later) before properly initializing and assigning it. If the current task is NULL, the CPU will read 0x00000000, it will return HardFault_Handler.Reason why PendSV and SVC used in RTOS since they can avoid Usage Fault.

