11. Heap & Stack Issues

 

This tutorial is not about FreeRTOS memory management options. Since we're using the heap_1.c model, there is no plan to free heap memory by deleting tasks or kernel objects after creation. Yet such approach with actually no memory management mechanism already covers a lot of application cases. Instead, the tutorial introduces the quite esoteric question of memory sizing for both the heap and the task stacks. And there's enough to say about that...

 

1. The Heap

 

The Heap represents the amount of RAM memory allocated for data placed under FreeRTOS control. It is the one reservoir of bytes that FreeRTOS uses for:

  • Kernel objects data structures (Semaphores, Queues, ...)
  • Tasks-associated data

The heap size is up to you. This is done by defining the configTOTAL_HEAP_SIZE symbol in FreeRTOSConfig.h:

#define configTOTAL_HEAP_SIZE	( ( size_t ) ( 7 * 1024 ) )

How to size heap is a complex question. On one hand, you have a limited amount of RAM available in the device (16kB for STM32F072RB), therefore, you have a upper limit. Moreover, you probably need to store some data outside FreeRTOS control (incl. global variables, ...). And if you're using the trace recorder in snapshot mode, you need to keep some room for it. On the other hand, we need enough memory for kernel objects and tasks data...

In the above example (and so far in these tutorials), the heap is set to 7kB (exactly 7168 bytes).

Is it enough?
Is it too much?

Well, I don't know... I just can say "It has to be enough", given that it is close to a maximum considering that:

  • I also want 4kB for trace recording (1000 events)
  • I want to keep a little amount of memory apart for non-FreeRTOS stuff

Of course, this is not an acceptable answer. The less heap you have, the more limitations you will face in terms of:

  • Number of possible kernel objects
  • Number of possible tasks
  • Amount of data processed by the tasks

 

The xPortGetFreeHeapSize() function returns the amount of free heap bytes. Let-us experiment this using the case study below.

/*
 * main.c
 *
 *  Created on: 01/04/2018
 *      Author: Laurent
 */

#include "main.h"

// Static functions
static void SystemClock_Config	(void);

// FreeRTOS tasks
void vTask1 	(void *pvParameters);
void vTask2 	(void *pvParameters);
void vTaskHWM 	(void *pvParameters);

xTaskHandle	vTask1_handle;
xTaskHandle	vTask2_handle;
xTaskHandle 	vTaskHWM_handle;

// Kernel objects
xSemaphoreHandle xSem;
xSemaphoreHandle xConsoleMutex;
xQueueHandle	 xConsoleQueue;

// Define the message_t type as an array of 60 char
typedef uint8_t message_t[60];

// Trace User Events Channels
traceString ue1, ue2, ue3;

// Main program
int main()
{
	uint32_t	free_heap_size;

	// Configure System Clock
	SystemClock_Config();

	// Initialize LED pin
	BSP_LED_Init();

	// Initialize Debug Console
	BSP_Console_Init();

	// Start Trace Recording
	vTraceEnable(TRC_START);

	// Report Free Heap Size
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("\r\nFree Heap Size is %d bytes\r\n", free_heap_size);

	// Create Semaphore object (this is not a 'give')
	my_printf("\r\nNow creating Binary Semaphore...\r\n");
	xSem = xSemaphoreCreateBinary();
	vTraceSetSemaphoreName(xSem, "xSEM");
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("Free Heap Size is %d bytes\r\n", free_heap_size);

	// Create Queue to hold console messages
	my_printf("\r\nNow creating Message Queue...\r\n");
	xConsoleQueue = xQueueCreate(10, sizeof(message_t *));
	vTraceSetQueueName(xConsoleQueue, "Console Queue");
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("Free Heap Size is %d bytes\r\n", free_heap_size);

	// Create a Mutex for accessing the console
	my_printf("\r\nNow creating Mutex...\r\n");
	xConsoleMutex = xSemaphoreCreateMutex();
	vTraceSetMutexName(xConsoleMutex, "Console Mutex");
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("Free Heap Size is %d bytes\r\n", free_heap_size);

	// Register the Trace User Event Channels
	my_printf("\r\nNow registering Trace events...\r\n");
	ue1 = xTraceRegisterString("ticks");
	ue2 = xTraceRegisterString("msg");
	ue3 = xTraceRegisterString("HWM");
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("Free Heap Size is %d bytes\r\n", free_heap_size);

	// Create Tasks
	my_printf("\r\nNow creating Tasks...\r\n");
	xTaskCreate(vTask1,	"Task_1",	128, NULL, 2, &vTask1_handle);
	xTaskCreate(vTask2,	"Task_2",	128, NULL, 3, &vTask2_handle);
	xTaskCreate(vTaskHWM,	"Task_HWM",	128, NULL, 1, &vTaskHWM_handle);
	free_heap_size = xPortGetFreeHeapSize();
	my_printf("Free Heap Size is %d bytes\r\n", free_heap_size);

	// Start the Scheduler
	my_printf("\r\nNow Starting Scheduler...\r\n");
	vTaskStartScheduler();

	while(1)
	{
		// The program should never be here...
	}
}

 

Then provide an ultra-minimal implementation for the three tasks:

/*
 *	Task_1
 */
void vTask1 (void *pvParameters)
{
	while(1)
	{
		// Wait for 100ms
		vTaskDelay(100);
	}
}

/*
 *	Task_2
 */
void vTask2 (void *pvParameters)
{
	while(1)
	{
		// Wait for 100ms
		vTaskDelay(100);
	}
}

/*
 * vTaskHWM
 */
void vTaskHWM (void *pvParameters)
{
	while(1)
	{
		// Wait for 100ms
		vTaskDelay(100);
	}
}

 

Now take a look at the console during initializations:

image_000.png
 
 
At the beginning of code execution, we've got 7160 bytes available. That's our 7kB of total heap, minus 2 words (assume FreeRTOS has good use for these).
  • Creating the binary semaphore took 80 bytes of heap memory
  • Creating the message queue took 120 bytes of heap memory
  • Creating the mutex took 80 bytes of heap memory
  • Registering trace events took nothing in the heap
  • Creating the three tasks took 1800 bytes of heap memory (presumably 600 bytes for each of the 3 tasks)

The memory taken from the heap for each task is made of two segments:

  • The task stack
  • The Task Control Block (TCB)

The TCB size is the same for all tasks and it is a FreeRTOS port feature. Here, we have 88 bytes for TCBs.

The stack size is a user setting, passed as third argument to the xTaskCreate() function. In our example, stack is set to 128 words (i.e. 512 bytes).

xTaskCreate(vTask1, "Task_1", 128, NULL, 2, &vTask1_handle);

Is it enough?
Is it too much?

Well again, I don't know...

It depends on what tasks are actually doing. More precisely, it depends on the amount of data each task is manipulating (either direcly using local variables, or indirectly by calling functions that manipulate variables), and what needs to be saved when (or if) task is preempted. Remember that when a task is preempted by another task, the OS saves all the information required for further recovery into the stack.

In the above example, tasks are pretty much doing nothing. There's no local variables, and a single call to vTaskDelay(). Therefore, a small stack should be enough...

Anyway, if the heap or the task stacks are sized too small, you'll probably end up with an application crash... but nothing will tell you the reason it occurred, unless you take care of it... see later.

 

2. Stack High Water Mark

 

Stacks are used to store task local variables, and any temporary data required for recovery before a task is preempted. Therefore, the amount of instantaneous stack in use continuously fluctuates during application life. FreeRTOS provides a mechanism to calculate the maximum amount of stack memory that has been (at least once) required since the application started. Internally, what this mechanism does is an initial "painting" (et creation time) of the stack RAM segment with a regular pattern. Figure below shows this painting with a uniform 0xA5 pattern (1010 0101).

image_001.png
 
When the stack is used, the painting is overridden by meaningful data. The FreeRTOS uxTaskGetStackHighWaterMark() function implements an algorithm that searches for remaining paint within the stack and then determines the amount of unused stack. Think of it as a flood in your house. The watermark you get on the walls at a given time corresponds to the highest level the water ever reached. That explains the name for High Water Mark. The uxTaskGetStackHighWaterMark() function actually return the number of words (4-bytes unit) above the watermark (i.e. the remaining space into the stack).
 
In order have the uxTaskGetStackHighWaterMark() function available, you first need to activate the mechanism in FreeRTOSConfig.h:
/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */
#define INCLUDE_vTaskPrioritySet		0
#define INCLUDE_uxTaskPriorityGet		0
#define INCLUDE_vTaskDelete			0
#define INCLUDE_vTaskCleanUpResources		0
#define INCLUDE_vTaskSuspend			0
#define INCLUDE_vTaskDelayUntil			1
#define INCLUDE_vTaskDelay			1

#define INCLUDE_uxTaskGetStackHighWaterMark 	1  // <-- Add this line

 

Let-us now experiment this. The purpose of the Task_HWM is actually to periodically report the high watermarks for both Task_1, Task_2 and itself into the console.
/*
 * vTaskHWM
 */
void vTaskHWM (void *pvParameters)
{
	uint32_t	count;
	uint16_t	hwm_Task1, hwm_Task2, hwm_TaskHWM;
	uint32_t	free_heap_size;

	count = 0;

	// Prepare console layout using ANSI escape sequences
	my_printf("%c[0m",   0x1B);	// Remove all text attributes
	my_printf("%c[2J",   0x1B); 	// Clear console
	my_printf("%c[1;0H", 0x1B);	// Move cursor [1:0]

	my_printf("High Water Marks console");

	my_printf("%c[3;0H", 0x1B);	// Move cursor line 3
	my_printf("Iteration");

	my_printf("%c[4;0H", 0x1B);	// Move cursor line 4
	my_printf("Task1");

	my_printf("%c[5;0H", 0x1B);	// Move cursor line 5
	my_printf("Task2");

	my_printf("%c[6;0H", 0x1B);	// Move cursor line 6
	my_printf("TaskHWM");

	my_printf("%c[7;0H", 0x1B);	// Move cursor line 7
	my_printf("Free Heap");


	while(1)
	{
	  // Gather High Water Marks
	  hwm_Task1	= uxTaskGetStackHighWaterMark(vTask1_handle);
	  hwm_Task2 	= uxTaskGetStackHighWaterMark(vTask2_handle);
	  hwm_TaskHWM	= uxTaskGetStackHighWaterMark(vTaskHWM_handle);

	  // Get free Heap size
	  free_heap_size = xPortGetFreeHeapSize();

	  // Reports watermarks into Trace Recorder
	  vTracePrintF(ue3, (char *)"1[%d] 2[%d] HWM[%d]",
                             hwm_Task1,
                             hwm_Task2,
                             hwm_TaskHWM );

	  // Display results into console
	  my_printf("%c[0;31;40m", 0x1B); 	// Red over black

	  my_printf("%c[3;12H", 0x1B);
	  my_printf("%5d", count);

	  my_printf("%c[1;33;44m", 0x1B); 	// Yellow over blue

	  my_printf("%c[4;12H", 0x1B);
	  my_printf("%5d", hwm_Task1);

	  my_printf("%c[5;12H", 0x1B);
	  my_printf("%5d", hwm_Task2);

	  my_printf("%c[6;12H", 0x1B);
	  my_printf("%5d", hwm_TaskHWM);

	  my_printf("%c[1;35;40m", 0x1B); 	// Majenta over black
	  my_printf("%c[7;12H", 0x1B);
	  my_printf("%5d", free_heap_size);

	  my_printf("%c[0m", 0x1B); 		// Remove all text attributes
	  count++;

	  // Wait for 200ms
	  vTaskDelay(200);
	}
}

I couldn't resist playing with ANSI escape sequences to bring some fun into the console display, although this quite nonessential here.

Still, if you like it, see more there: http://ascii-table.com/ansi-escape-sequences.php

The console then reports:

  • The remaining amount of stack for each task (in 4-bytes words)
  • The remaining amount of heap (in bytes)

 

image_002.png
 
None of the above is zero, or even close to zero, therefore, I can now temporarily answer the questions:
  • Do we have heap memory enough? → Yes, maybe too much actually. Yet, one can see that starting the scheduler took an additional 1344 bytes on the heap (5080-3756). That's because FreeRTOS created internal tasks (such as the IDLE task, timer task, ...) and objects.
  • Do we have stack memory enough? → Yes, but not too much as we should stay somehow far from zero. Both stacks associated with Task_1 and Task_2 have been used up to 39 bytes (128-89). Also note that 128 bytes is the minimal stack size defined in FreeRTOSConfig.h:
#define configMINIMAL_STACK_SIZE ( ( unsigned short ) 128 )

Above answers are only valid for the current application, which does actually nothing useful. If we want the tasks to perform something real, answer may probably be different.

Now, let us shorten Task_1 and Task_2 wake-up period in order to increase the density of Task_HWM preemption events:

/*
 *	Task_1
 */
void vTask1 (void *pvParameters)
{
	while(1)
	{
		// Wait for 20ms
		vTaskDelay(20);
	}
}

/*
 *	Task_2
 */
void vTask2 (void *pvParameters)
{
	while(1)
	{
		// Wait for 30ms
		vTaskDelay(30);
	}
}

 

Launch the application and watch the console. Reaching iteration #9, the watermark for Task_HWM drops from 50 to 42:

image_003.png
 
As said before, the stack is used to save task information when it is preempted. Therefore, the amount of data to save totally depends on the very moment the preemption occurs. Considering Task_HWM, there is a strong probability that preemption occurs during a call to my_printf() function. In that case, the current state (variables) of my_printf() function are saved into Task_HWM stack. If you're not lucky, you might never catch the worst situation.
 
What the above experiment demonstrates, is:
  • You will never be 100% sure that watermarks you get represents the worst case of stack usage
  • The longer you'll leave application running, trying to cover a lot of application situations, the closer you should get from the 'true' worst case

 

Now, let us add some data into Task_1 and Task_2:

/*
 *	Task_1
 */
void vTask1 (void *pvParameters)
{
	uint8_t	msg[] = "This is task_1 message"; // 22 bytes string

	while(1)
	{
		// Send message to Trace Recorder
		vTracePrint(ue2, (char *)msg);

		// Wait for 20ms
		vTaskDelay(20);
	}
}

/*
 *	Task_2
 */
void vTask2 (void *pvParameters)
{
	uint8_t	msg[] = "This is a much longer task_2 message"; // 36 bytes string

	while(1)
	{
		// Send message to trace Recorder
		vTracePrint(ue2, (char *)msg);

		// Wait for 30ms
		vTaskDelay(30);
	}
}

 

image_005.png
 
 
image_004.png
 
High watermarks dropped from the previous 89 for both Task_1 and Task_2 tasks to 77 and 73 respectively. Stored strings are 36 and 22 bytes long, so that difference is 14 bytes, leading to the observed 4 words difference between Task_1 and Task_2 watermarks. Yet, the 36 bytes string only takes 9 words (36/4) in memory and the reported 73 watermark for Task_2 is below the expected 89-9 = 80 (same thing for Task_1). Calling vTracePrint() function has its own footprint into the stack, but that's not enough to explain the difference (actually, commenting the vTracePrint() call would raise the watermark by 4 words only).
 
 

3. Crash Test

 

3.1. Insufficient Heap

According the the above results, we need 7168-3736 = 3432 bytes of heap memory to fit all the kernel objects and tasks in our application.

What if we set the heap size below this requirement?

Well, let us experiment with 3kB of heap memory:

#define configTOTAL_HEAP_SIZE ( ( size_t ) ( 3 * 1024 ) )

Build the application and step over dbg_step_over_btn.png initialization code. Everything goes well until you start the scheduler. Then, the application freezes. Hit the suspend button:

image_011.png
 
That's pretty clear...
 
Consider you're lucky if the scheduler didn't start. Way more dangerous is the case everything starts well, but you fail creating a kernel objects during application life (i.e. after scheduler is started). As a matter of fact, nothing impose kernel objects to be created at initialization.
 
One way to adress such issue would be to implement a dynamic memory allocation failure hook function. First, you must enable this feature in FreeRTOSConfig.h by enabling configUSE_MALLOC_FAILED_HOOK:
#define configUSE_MALLOC_FAILED_HOOK	1

 

Then, you'll need to provide your own implementation of the function vApplicationMallocFailedHook(). Let us write something basic in main.c: This hook does nothing, but at least you'll be able to make sure that you can catch the problem.

/*
 * Malloc failed Basic Hook
 */
void vApplicationMallocFailedHook()
{
	while(1);
}

 

Repeat the above experiment now. After the program freezes, hit the suspend button. I you should end-up in your hook function:

image_012.png

 

In a real scenario, you can use such hook to allow the watchdog an application reset and use that knowledge to avoid further attempts to create the failing object.

 

3.2. Stack overflow

Say now we need to call a function that computes the cumulative weighted sum of an array of integers:

/*
 * sum_prod function
 *
 * Calculate y the sum of (x * coef[n])
 * x is a floating point number
 * coef[n] is an array of 120 32-bit integers
 *
 * returns y a floating point number
 */
float sum_prod(float x)
{
	uint32_t 	coef[120];
	float		y;
	uint8_t		n;

	// Initialize array
	for (n=0; n<120; n++) coef[n] = n;

	// Calculate sum of products
	y = 0;
	for (n=0; n<120; n++) y += x * coef[n];

	return y;
}

 

Now, let-us implement a call to this function within Task_2, but only if the user push-button is pressed. To do that, we'll use the EXTI #13 interrupt signal to give a semaphore to Task_2:

/*
 *	Task_2
 */
void vTask2 (void *pvParameters)
{
	uint8_t		msg[] = "This is a much longer task_2 message"; // 36 bytes string
	float 		x,y;

	// Initialize the user Push-Button
	BSP_PB_Init();

	// Set maximum priority for EXTI line 4 to 15 interrupts
	NVIC_SetPriority(EXTI4_15_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY + 1);

	// Enable EXTI line 4 to 15 (user button on line 13) interrupts
	NVIC_EnableIRQ(EXTI4_15_IRQn);

	// Take the semaphore once to make sure it is cleared
	xSemaphoreTake(xSem, 0);

	// Now enter the task loop
	while(1)
	{
		// Wait here endlessly until button is pressed
		xSemaphoreTake(xSem, portMAX_DELAY);

		// Compute y
		x = 1.0f;
		y = sum_prod(x);

		// Send message to trace Recorder
		vTracePrint (ue2, (char *)msg);
		vTracePrintF(ue2, (char *)"%d", (uint32_t)y);
	}
}

 

Then, make sure that EXTI #13 interrupt handler is well written in stm32f0xx_it.c:

/**
  * This function handles EXTI line 13 interrupt request.
  */

extern xSemaphoreHandle xSem;

void EXTI4_15_IRQHandler()
{
	portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;

	// Test for line 13 pending interrupt
	if ((EXTI->PR & EXTI_PR_PR13_Msk) != 0)
	{
		// Clear pending bit 13 by writing a '1'
		EXTI->PR |= EXTI_PR_PR13;

		// Release the semaphore
		xSemaphoreGiveFromISR(xSem, &xHigherPriorityTaskWoken);

	    // Perform a context switch to the waiting task
	    portEND_SWITCHING_ISR(xHigherPriorityTaskWoken);
	}
}

 

Leave Task_1 as something minimal:

/*
 *	Task_1
 */
void vTask1 (void *pvParameters)
{
	uint8_t	msg[] = "This is task_1 message"; 	// 22 bytes string

	while(1)
	{
		// Send message to Trace Recorder
		vTracePrint(ue2, (char *)msg);

		// Wait for 100ms
		vTaskDelay(100);
	}
}

 

Build and fire the application. Watch the console reporting watermarks (do not press the button yet). You should see your watermarks at safe level, and everything doing good as expected.

image_006.png
 
 
Now press the user-button... Something bad happens!
Depending on when the button is pressed, you may experience two different behaviors.
 
  • Case #1 : Application crash

In that case, the console just freezes after you pressed the button:

image_007.png
 
Looking at the trace, you'll see that Task_2 execution was the very last event to be recorded, and then nothing... Application has died.
 
image_008.png
 
 
If you're curious enough to suspend the debugger there, you'll find out that application seems catched in the IDLE task forever...
 
 
  • Case #2 : Application keeps running but watermarks dropped to zero

In that case, the Task_HWM keeps running and reports both stacks remaining space to zero:

image_009.png
 
Which is confirmed by the trace:
 
image_010.png
 
The reason Task_2 stack overflew is quite obvious. Calling the sum_prod() function requires the stack size to cope with a large array of 32-bit integers. The surprise comes from Task_1 stack which appear to be corrupted as well by the Task_2 stack overflow... Or maybe the watermarks algorithm get lost somewhere.
 
 
Both cases are bad. But again, what the above experience teatches you, is that you can miss stack overflow if you forget to test the button case.
 
FreeRTOS can help you track stack overflows. First go to FreeRTOSConfig.h and enable configCHECK_FOR_STACK_OVERFLOW the feature:
#define configCHECK_FOR_STACK_OVERFLOW	1

Then, you'll need to provide your own implementation of the function vApplicationStackOverflowHook(). Let us write something basic in main.c:

/*
 * Stack Overflow Basic Hook
 */
void vApplicationStackOverflowHook()
{
	while(1);
}

Unfortunately, depending on the severity of the problem, the hook does not always work. You may end up into the Hardfault handler as well...