Memory has significant impact on silicon cost as well as die size, hence from hardware perspective having optimal size is important and from software perspective being able to utilise it to fullest is crucial.
In this post we will discuss some upcoming features and commonly available configuration options (knobs) in ESP-IDF to allow end application to utilise various internal memory regions in most optimal way.
Important Notes#We will be using following patch on top of subscribe_publish example to log dynamic memory statistics.
Note: Change in core-id during task creation (from below patch) is for single core configuration that we are using for this particular example.
With default configuration (and heap task tracking feature enabled) we get following heap utilisation statistics (all values in bytes):
Task Heap Utilisation Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 63124 | 0 ||
|| tiT | 3840 | 0 ||
|| wifi | 31064 | 0 ||System Heap Utilisation Stats:
|| Minimum Free DRAM | Minimum Free IRAM ||
|| 152976 | 40276 ||
Single Core Config#
As mentioned earlier we will be using single core configuration for all our experiments. Please note that even in single core mode there is enough processing power available in ESP32 (close to 300 DMIPS), more than sufficient for typical IoT use-cases.
Corresponding configuration to be enabled in application:
CONFIG_FREERTOS_UNICORE=y
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilisation Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 63124 | 0 ||
|| tiT | 3892 | 0 ||
|| wifi | 31192 | 0 ||System Heap Utilisation Stats:
|| Minimum Free DRAM | Minimum Free IRAM ||
|| 162980 | 76136 ||
TLS Specific# Asymmetric TLS Content Length#As can be seen from above, we have gained ~10KB memory in DRAM, since some of services (e.g. idle, esp_timer tasks etc.) for second CPU core are not required anymore. In addition IPC service for inter processor communication is also no more required, so we gain from stacks and dynamic memory of that service. Increase in IRAM is due to the freeing up of 32KB cache memory of second CPU core and some of code savings due to disablement of above mentioned services.
This feature has been part of ESP-IDF from v4.0 onwards. This feature allows to enable asymmetric content length for TLS IN/OUT buffers. Thus application has an ability to reduce TLS OUT buffer from its default value of 16KB (maximum TLS fragment length per specification) to as small as say 2KB, and thus allowing 14KB of dynamic memory saving .
Please note that, it is not possible to reduce TLS IN buffer length from its default 16KB, unless you have direct control over server configuration or sure about server behaviour that it will never send inbound data (during handshake or actual data-transfer phase) over certain threshold.
Corresponding configuration to be enabled in application:
# Enable TLS asymmetric in/out content length
CONFIG_MBEDTLS_ASYMMETRIC_CONTENT_LEN=y
CONFIG_MBEDTLS_SSL_OUT_CONTENT_LEN=2048
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilisation Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 48784 | 0 ||
|| tiT | 3892 | 0 ||
|| wifi | 30724 | 0 ||System Heap Utilisation Stats:
|| Minimum Free DRAM | Minimum Free IRAM ||
|| 177972 | 76136 ||
Dynamic Buffer Allocation Feature#As can be seen from above, we have gained ~14KB memory from aws_iot_task and thus minimum free DRAM number has increased accordingly.
During TLS connection, mbedTLS stack keeps dynamic allocations active during entire session starting from initial handshake phase. These allocations includes, TLS IN/OUT buffers, peer certificate, client certificate, private keys etc. In this feature (soon to be part of ESP-IDF), mbedTLS internal APIs have been glued (using SHIM layer) and thus it is ensured that whenever resource usage (including data buffers) is complete relevant dynamic memory is immediately freed up.
This greatly helps to reduce peak heap utilisation for TLS connection. This will have small performance impact due to frequent dynamic memory operations (on-demand resource usage strategy). Moreover since memory related to authentication credentials (certificate, keys etc.) has been freed up, during TLS reconnect attempt (if required), application needs to ensure that mbedTLS SSL context is populated again.
Corresponding configuration to be enabled in application:
# Allow to use dynamic buffer strategy for mbedTLS
CONFIG_MBEDTLS_DYNAMIC_BUFFER=y
CONFIG_MBEDTLS_DYNAMIC_FREE_PEER_CERT=y
CONFIG_MBEDTLS_DYNAMIC_FREE_CONFIG_DATA=y
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilization Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 26268 | 0 ||
|| tiT | 3648 | 0 ||
|| wifi | 30724 | 0 ||System Heap Utilization Stats:
|| Minumum Free DRAM | Minimum Free IRAM ||
|| 203648 | 76136 ||
Networking Specific# WiFi/LwIP Configuration#As can be seen from above, we have gained ~22KB memory from aws_iot_task and thus minimum free DRAM number has increased accordingly.
We can further optimise the WiFi and LwIP configuration to reduce memory usage, at the cost of giving away some performance. Primarily we will reduce WiFi TX and RX buffers and try to balance it out by moving some critical code path from networking subsystem to instruction memory (IRAM).
To give some ballpark on performance aspect, with default networking configuration average TCP throughput is close to ~20Mbps, but with below configuration it will be close to ~4.5Mbps, which is still sufficient to cover typical IoT use-cases.
Corresponding configuration to be enabled in application:
# Minimal WiFi/lwIP configuration
CONFIG_ESP32_WIFI_STATIC_RX_BUFFER_NUM=4
CONFIG_ESP32_WIFI_DYNAMIC_TX_BUFFER_NUM=16
CONFIG_ESP32_WIFI_DYNAMIC_RX_BUFFER_NUM=8
CONFIG_ESP32_WIFI_AMPDU_RX_ENABLED=
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=16
CONFIG_LWIP_TCP_SND_BUF_DEFAULT=6144
CONFIG_LWIP_TCP_WND_DEFAULT=6144
CONFIG_LWIP_TCP_RECVMBOX_SIZE=8
CONFIG_LWIP_UDP_RECVMBOX_SIZE=8
CONFIG_ESP32_WIFI_IRAM_OPT=y
CONFIG_ESP32_WIFI_RX_IRAM_OPT=y
CONFIG_LWIP_IRAM_OPTIMIZATION=y
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilization Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 26272 | 0 ||
|| tiT | 4108 | 0 ||
|| wifi | 19816 | 0 ||System Heap Utilization Stats:
|| Minumum Free DRAM | Minimum Free IRAM ||
|| 213712 | 62920 ||
System Specific# Utilising RTC (Fast) Memory (single-core only)#As can be seen from above log, we have gained roughly ~9KB of additional DRAM for application usage. Impact (reduction) in total IRAM comes because we moved critical code path from networking subsystem to this region.
As can be seen from earlier memory breakup there is one useful 8KB RTC Fast memory (and reasonably fast) which has been sitting idle and not fully utilised. ESP-IDF will soon have feature to enable RTC Fast memory for dynamic allocation purpose. This option exists in single core configuration, as the RTC Fast memory is accessible to PRO CPU only.
It has been ensured that RTC Fast memory region will be utilised as first dynamic memory range, and most of startup, pre-scheduler code/services will occupy this range. This will allow to not have any performance impact in application code due to clock speed (slightly on slower side) of this memory.
Since there are no access restrictions to this region, capability wise we will call it as DRAM henceforth.
Letâs re-run our application with this feature and gather memory numbers.
Corresponding configuration to be enabled in application:
# Add RTC memory to system heap
CONFIG_ESP32_ALLOW_RTC_FAST_MEM_AS_HEAP=y
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilization Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 26272 | 0 ||
|| tiT | 4096 | 0 ||
|| wifi | 19536 | 0 ||System Heap Utilization Stats:
|| Minumum Free DRAM | Minimum Free IRAM ||
|| 221792 | 62892 ||
Utilising Instruction Memory (IRAM, single-core only)#As can be seen from above log, we have gained 8KB of additional DRAM for application usage.
So far we have seen different configuration options to allow end application to have more memory from DRAM (data memory) region. To continue along similar lines, it should be noted that there is still sufficient IRAM (instruction memory) left but it can not be used as generic purpose due to 32-bit address and size alignment restrictions.
In this particular feature from ESP-IDF, above mentioned unaligned accesses have been fixed through corresponding exception handlers and thus resulting in correct program execution. However, these exception handlers can take up-to 167 CPU cycles for each (restricted) load or store operation. So there could be significant performance penalty (as compared with DRAM access) while using this feature.
This memory region can be used in following ways:
Limitations wise:
While discussing usage of this memory region with understood performance penalty, TLS IN/OUT (per our configuration value of buffer 16KB/2KB) buffers were found to be one of the potential candidate to allocate from this region. In one of the experiments, for transfer of 1MB file over TLS connection, time increased from ~3 seconds to ~5.2 seconds with TLS IN/OUT buffers moved to IRAM.
It is also possible to redirect all TLS allocations to IRAM region but that may have larger performance impact and hence this feature redirects only buffers whose size is greater or equal than minimum of TLS IN or OUT buffer (in our case threshold would be 2KB).
Letâs re-run our application with this feature and gather memory numbers.
Corresponding configuration to be enabled in application:
# Allow usage of IRAM as 8bit accessible region
CONFIG_ESP32_IRAM_AS_8BIT_ACCESSIBLE_MEMORY=y
CONFIG_MBEDTLS_IRAM_8BIT_MEM_ALLOC=y
Updated heap utilisation statistics from application re-run as following:
Task Heap Utilization Stats:
|| Task | Peak DRAM | Peak IRAM ||
|| aws_iot_task | 17960 | 21216 ||
|| tiT | 3640 | 0 ||
|| wifi | 19536 | 0 ||System Heap Utilization Stats:
|| Minumum Free DRAM | Minimum Free IRAM ||
|| 228252 | 40432 ||
Summary#As can be seen from above log, we have gained another ~7KB of additional DRAM for application usage. Please note that, even though we have redirected all allocations above 2KB threshold to IRAM, there are still many smaller (and simultaneous) allocations happening from DRAM region (another local maxima) and hence effective gain is lower than what actually should have been. If additional performance impact is acceptable then it is possible to redirect all TLS allocation to IRAM and gain further at-least ~10â12KB memory from DRAM region.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4