I'm developing for an ARM Cortex-M7 microcontroller, running bare-metal without an operating system, and with limited resources:
- 1 Mbyte of FLASH
- No dedicated MMU (Memory Management Unit)
- Operation frequency up to 216 MHz
I've been using C language code, but I'm interested in modern C++ features, particularly OOP for a more modular design. However, I'm concerned about the overhead and potential pitfalls of using the STL library and other core C++ mechanisms on such a resource-constrained device.
- Is it advisable to use the STL library in a resource-constrained environment with no operating system? How to mitigate pitfalls like dynamical memory allocation, exceptions, RTTI, and templates?
- Are there specific workarounds or best practices for efficient STL usage in such limited resources (such as using ETL)?
- Can I use a different compiler like GCC or G++ with better STL support for the ARM Cortex-M7 microcontroller?
To address potential STL pitfalls, I'll use statically allocated memory pools, exercise caution with exception-related features, avoid RTTI, and rely on compiler optimizations to handle template code bloat.
I want to use STL to implement software design patterns and algorithms effectively in my embedded project. However, I'm also aware of discussions where people debate whether C++ with/without STL is suitable for such devices. I'd appreciate any advice and experiences you can share to help me decide whether I should explore STL and C++ or stick with using C-only for my embedded projects.
These are all great questions. I think that you'll find diving into C++ for embedded systems is extremely rewarding and will elevate the solutions you can come up with.
Here are some of my thoughts on topic:
1. You need to be very careful when you use the STL. In many cases, you'll find the STL using dynamic memory allocation. Before using any STL feature, I'd make sure you really understand what is going on behind the scenes.
In some instances, you can avoid dynamic memory allocation through static allocations. Consider using fixed-size containers or custom allocators that allocate memory statically or from pre-allocated memory pools.
In general for an embedded system, it's a good idea to avoid exceptions and RTTI. Exceptions usually will use the heap, which can be problematic.
Templates are a wonderful tool for embedded developers. As you are aware, you just have to be careful and understand how their use may affect code size.
2. ETL can be a good option. As I mentioned earlier, if you are using the STL, try to statically allocate or allocate once early in the program and use throughout to avoid heap fragmentation. Know what you are using.
3. Not sure what compiler you are currently using. I use arm-none-eabi-g++ for my embedded projects and training courses. Open source versions will probably not be as efficient. If you were to use EWARM from IAR or MDK from Keil, you'll find you have smaller, more efficient binaries.
On a general note, I've been using the following flags when I compile C++ applications for embedded:
-std=c++17 -fno-rtti -fno-exceptions -lstdc++
(I probably could try -std=c++20 at this point, but I just haven't seen the need).
I hope that helps a bit.
P.S. I should probably add that like with any other embedded project, keep track of your binary size and run-time performance as you go. If you choose a feature that is too big or slow, you'll probably notice right away.
Thank you for a fast and extensive response. I'm getting more and more aware of difficulties that STL brings into embedded-based projects and am looking forward to finding appropriate and efficient solutions for it, especially in terms of dynamical allocations.
In response to compiler use, you say you use ARM GCC? Maybe some licenced, professional version? I'm using arm-none-eabi-g++ (from GNU, I think) which is open-source and free. The point where I'm currently stuck is that this specific version of toolchain I'm using seems that doesn't support STL at all (I tried using std::mutex to try-out locking mechanism but my error log is telling me there is <mutex> file among toolchain's system files.
Can you tell whether some other versions of ARM GCC provide STL with their toolchain tools? Also, is manual inclusion of STL files into my C++ project somewhat an alternative solution?
I have just found MinGW arm-none-eabi-gcc variant which seems to does come with required STL library. However I'm stuck with some CMake issues and cannot progress further with it as we speak.
Any additional information would be really helpful at this point :)
I'm currently using the open source version that can be downloaded using: apt-get install -y --no-install-recommends gcc-arm-none-eabi . It looks like it's version:
gcc version 10.3.1 20210621 (release)
You might try running: arm-none-eabi-g++ -E -x c++ - -v < /dev/null
and then examine the output. When I do this, I get:
arm-none-eabi-g++ -E -x c++ - -v < /dev/null Using built-in specs. COLLECT_GCC=arm-none-eabi-g++ Target: arm-none-eabi Configured with: ../configure --build=x86_64-linux-gnu --prefix=/usr --includedir='/usr/lib/include' --mandir='/usr/lib/share/man' --infodir='/usr/lib/share/info' --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir='/usr/lib/lib/x86_64-linux-gnu' --libexecdir='/usr/lib/lib/x86_64-linux-gnu' --disable-maintainer-mode --disable-dependency-tracking --mandir=/usr/share/man --enable-languages=c,c++,lto --enable-multilib --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --enable-tls --build=x86_64-linux-gnu --target=arm-none-eabi --with-system-zlib --with-gnu-as --with-gnu-ld --with-pkgversion=15:10.3-2021.07-4 --without-included-gettext --prefix=/usr/lib --infodir=/usr/share/doc/gcc-arm-none-eabi/info --htmldir=/usr/share/doc/gcc-arm-none-eabi/html --pdfdir=/usr/share/doc/gcc-arm-none-eabi/pdf --bindir=/usr/bin --libexecdir=/usr/lib --libdir=/usr/lib --disable-libstdc++-v3 --host=x86_64-linux-gnu --with-headers=no --without-newlib --with-multilib-list=rmprofile,aprofile CFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' CPPFLAGS='-Wdate-time -D_FORTIFY_SOURCE=2' CXXFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' DFLAGS=-frelease FCFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' FFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' GCJFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -fstack-protector-strong' LDFLAGS='-Wl,-Bsymbolic-functions -flto=auto -Wl,-z,relro' OBJCFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' OBJCXXFLAGS='-g -O2 -ffile-prefix-map=/build/gcc-arm-none-eabi-hYfgK4/gcc-arm-none-eabi-10.3-2021.07=. -flto=auto -ffat-lto-objects -fstack-protector-strong' INHIBIT_LIBC_CFLAGS=-DUSE_TM_CLONE_REGISTRY=0 AR_FOR_TARGET=arm-none-eabi-ar AS_FOR_TARGET=arm-none-eabi-as LD_FOR_TARGET=arm-none-eabi-ld NM_FOR_TARGET=arm-none-eabi-nm OBJDUMP_FOR_TARGET=arm-none-eabi-objdump RANLIB_FOR_TARGET=arm-none-eabi-ranlib READELF_FOR_TARGET=arm-none-eabi-readelf STRIP_FOR_TARGET=arm-none-eabi-strip SED=/bin/sed SHELL=/bin/sh BASH=/bin/bash CONFIG_SHELL=/bin/bash Thread model: single Supported LTO compression algorithms: zlib gcc version 10.3.1 20210621 (release) (15:10.3-2021.07-4) COLLECT_GCC_OPTIONS='-E' '-v' '-mcpu=arm7tdmi' '-mfloat-abi=soft' '-marm' '-mlibarch=armv4t' '-march=armv4t' /usr/lib/gcc/arm-none-eabi/10.3.1/cc1plus -E -quiet -v -D__USES_INITFINI__ - -mcpu=arm7tdmi -mfloat-abi=soft -marm -mlibarch=armv4t -march=armv4t ignoring nonexistent directory "/usr/lib/gcc/arm-none-eabi/10.3.1/../../../arm-none-eabi/sys-include" #include "..." search starts here: #include <...> search starts here: /usr/lib/gcc/arm-none-eabi/10.3.1/../../../arm-none-eabi/include/c++/10.3.1 /usr/lib/gcc/arm-none-eabi/10.3.1/../../../arm-none-eabi/include/c++/10.3.1/arm-none-eabi /usr/lib/gcc/arm-none-eabi/10.3.1/../../../arm-none-eabi/include/c++/10.3.1/backward /usr/lib/gcc/arm-none-eabi/10.3.1/include /usr/lib/gcc/arm-none-eabi/10.3.1/include-fixed /usr/lib/gcc/arm-none-eabi/10.3.1/../../../arm-none-eabi/include
There's a lot there to digest, but look at your include directories:
This suggests that the STL is included. We can check these directories and see what headers are there. For example:
Looking here, I can see that <mutex> is included, along with containers like <vector>, <list>, <stack>, etc.
I have seen some posts where people had using using <mutex>, so maybe try a different STL feature first and then circle back to that one.
It might be that your compiler search paths are not set up correctly.
One option that I haven't tried, but looks interesting and is used by eCOS, is uSTL. Some details about it can be found at:
(You can manually include STL files, but I suspect you have a toolchain configuration issue over the support not being there).
I have tried re-installing two ARM GCC compilers - one that you say you use (downloaded here) and the one of newer version (downloaded here), although for the latter I'm not sure whether this is the same variant of the "arm-none-eabi" as the former.
I have tried compiling using both and using flags you mentioned. Other commonly seen C++ features like std::array or std::vector, etc. are compiled just fine with exception to std::mutex not compiling. It seems that "mutex.h" is intentionally excluded from build by either compiler options or some third party tool (no idea honestly).
Either way, at time of writing this question I didn't know anything about multithreading support and how that works for C++ but it seems (as mentioned by @maxpagani) that multithreading STL features aren't either supported for freestanding implementation or shouldn't used at all. To be honest I was just looking for a "fancy" locking mechanism right of the bat after entering STL domain but I guess I can find something else or implement less fancy methods on my own :D
Additional question for you since you seem to be very knowledgeable in this domain. I have found some stackoverflow topics on "freestanding" and it seems that there really isn't much support from STL developers in this regard as they see this as "not-normal" application use. The question: is C++ compiler really not capable of inlining all those STL function calls (for some reason)? My sources on this: here and here.
Thanks for all the info, I really appreciate it. My senior embedded firmware developers don't support the idea of C++ at all, let else STL, and rather despise it for multiple reasons. Its nice to find similar people with similar ideas though, like this discussion itself :D
The only times I’ve used multiprocessing and mutex in C++ has been when developing desktop applications. I suspect that the arm compiler doesn’t support it unless there is an OS included. I’ve not really investigated though, so take it with a grain of salt.
Usually if I am doing multithreading, I use the tools provided by my RTOS.
As far as your question about the compiler inlining, I’m honestly not sure. I think the capability to do this is going to be dependent on the compiler / vendor. I see very different results when I do comparisons between what GCC provides versus an IAR or Keil compiler.
- Iterators and standard algorithms can be made to work on many types (including C-style arrays and std::array) without dynamic allocation.
- std::optional is an excellent type for signaling failures or adding a sentinel value to a type.
- <type_traits> is very useful when writing libraries or introspecting types.
There is ongoing work by Ben Craig and others to improve the specification of the freestanding, os- and allocator-free, mode. Unfortunately, there are parts of the STL with inappropriate dependencies (e.g,. <bit_set> would be useful, but depends on <string>. I would recommend watching talks like this may be helpful.
Regarding library types like std::vector, you can instantiate them with your own allocator - either by passing it as a template parameter or by using a polymorphic memory resource - so it works with your memory pool. Your milage may vary, but it's worth looking into.
Others have already cautioned about heap/new/malloc, RTTI etc. I typically avoid using the new operator and prefer to statically allocate everything at compile-time since, in firmware, typically every object needed is known at compile-time. To prevent some library or template from using the 'new' operator, I re-define 'operator new' in my code (so the linker finds and uses it before searching libraries) and I either set a breakpoint or call assert so it becomes obvious, in the debugger, if/when/why/how anything calls new. When there's a useful template in STL that drags-in 'new', I sometimes need to redefine that template so that 'new' isn't called.
Hmm well, but then you must practically avoid most of the libraries that come with C++? I wouldn't try to avoid the issue with "new" and "delete" by forcefully preventing it to do anything at all and forcefull remove/modify anything that uses it.
I'm not so knowlegable on this topic yet but I would say that allocating required memory space within fixed-size memory container seems managable. However this calls for additional investment into providing sufficient mechanism for handling memory allocations within non-variable memory space.
May I ask which features of C++ have you used so far consistently and where you had to mangle with "new" and "delete" the way you say you do?
If this is just a toy project for learning, go ahead. But don't do this in a real thing.
Why not? Real life application are build using C++ and its features, and not just as hobby projects. Try to provide a bit of details on what makes you think what you say.
Hi, there are already great replies, I would like to add that - the problem with dynamic memory (in general, but more relevant for resource-constrained systems), is fragmentation and non-deterministic time for allocation.
About fragmentation - it could be a non-problem if you perform your allocation(s) at the beginning and then you don't change the allocation size. To state it better, if you have to allocate a dynamic structure that is never destroyed and never changes its size, then it is likely ok to use dynamic memory.
If that is not the case, then you'd better look into custom allocators, object pools, and so on.
Also, as properly pointed out, there are parts of the STL that don't use dynamic memory (std::array, many algorithms).
About your problem with std::mutex, if you have a freestanding implementation (https://en.cppreference.com/w/cpp/freestanding) as it can be expected in a resource-constrained system, then maybe std::mutex (and multithreading in general) is not supported.
* standing clear of RTTI is usually achievable unless you are using complex run-time polymorphism, not so frequent in embedded designs.
* standing clear of exceptions may require more effort if you are used to them. std::optional (or even std::expected, from C++23, but can be found as a library for other versions e.g. https://github.com/TartanLlama/expected). (shameless self-promotion - if you are into functional programming you may want to have a look at my ChefFun library providing Option and Either which offers some composable advantages w.r.t. std::optional/std::expected: https://gitlab.com/libchef/chef-fun).
* Optimization, inlining, and templates may work counter-intuitively on the binary size, so keeping an eye on the binary size (as already stated) is a very good idea and can be made automatically in the CI pipeline. You may find the bloaty tool useful - https://github.com/google/bloaty, I haven't used it but evaluated it a bit in the past, and looked fine.
All the best
It seems the memory allocations is one of the most (if not the most) difficult topic with C++ STL and freestanding implementation. This makes me wonder whether to use existing alternatives to malloc() since static allocation is definitely a must in this case.
I think trying to provide custom replacement for malloc wouldn't be a wise idea and could take a considerable amount of time. Any suggestions where I could start implementing a sufficient replacement for malloc() to provide effective, fast, and preferrably more time-deterministic mechanism?
Additionally, note that I'm currently progressing with CMSIS-RTOS2 and something like thread-safe memory pool could be used as an appropriate solution for "new" and "delete" overloading. However, usually when I'm developing an application (or its submodules) from scratch I'm not using RTOS (e.g. developing individual units before creating threads that use specific unit functionalities). And this would mean I still need to find (possibly create) a mechanism that would allow me to use STL in absence of RTOS. Therefore an alternative to something like malloc() would be most plausible solution.
Tell me what you think ;)
Possibly a general approach to memory management either shares the same issues new/delete (or malloc/free) have or requires a very large pool from where allocations are taken.
So, your life greatly simplifies when you add domain knowledge to the equation. Maybe preallocating a static buffer is enough, or if you have a number of small objects you need to keep allocating and freeing, then an object pool could be fine. Sometimes even allocating on the stack may get the job done.
Forcing these schemas onto the STL containers could be not easy (or always possible). In the past, I developed components with an interface similar to that of STL components - so I had a std::vector equivalent with a fixed maximum size, a queue with a fixed size, and a string component that used two object pools. Those helped me with all the needs I had in constrained-resources projects.
Keeping the interface almost compatible with STL containers allowed me to employ STL algorithms.
Using maximum fixed sizes also helps you in considering the worse case so as to put proper behavior to handle it.
I'm not that familiar with CMSIS-RTOS2, so I'm not sure whether it is used in the freestanding environment to power the thread part of the standard C++ library. For previous projects, I wrote my wrappers around FreeRTOS offering thread and mutexes (I didn't need much more), which got compiled into FreeRTOS primitives in the builds for the device and into standard C++ library in the builds for the PC (which I used for testing).
Just one final note, usually STL refers to the containers and algorithms part of the C++ standard library, while mutex, threads (and all the rest) are not usually referred as STL.
...your life greatly simplifies when you add domain knowledge to the equation.
Can you elaborate this part what did you meant by "domain knowledge"?
How do you approach overloading "new" and "delete" operators (if you're using mechanisms that do use those two)?
I still think that alternative libraries like ETL, uSTL, etc. could be used as non-complete replacement for STL. Also one of main reasons I'm insisting with STL is the community support. If there is a problem with my understanding of how things work or with code I can pretty much rely on the rest of C++ community. Whereas if I would be using non-popular alternative, which might be more commonly used only with embedded developers (and those are in minorty AFAIK), it would be much harder to troubleshoot specific issues that could come along the way.
I would try reserving a section in linker script for creating fixed-size memory container for STL. I would also try not to overuse STL to reduce these allocations and also take into account time performance of code. If, over time, it would seem that STL still consumes too much space, I would migrate to something like ETL, probably. And if even that doesn't work out for some reason, I guess I still have at least "C with classes" which is very restrictful compared to modern C++ but at least it gives you more than plain C :D
Can you elaborate this part what did you meant by "domain knowledge"?
The idea is that a general allocator must work with any order of allocation/deallocations and any size of allocated bytes. If you can reduce this freedom, you end up with simpler and usually faster dynamic memory management algorithms.
First example - you have many objects that are allocated and deallocated in any order, but they all fit the same size. In this case, you can use an object pool, where the free items are kept in a single linked list. The time needed to allocate/deallocate an object is constant and deterministic (well, up to the cache level).
Second example - The allocations are always made first, and the deallocations are performed in the reverse order. You can use a stack-like memory area. Allocations and deallocations are cheap and use constant time.
Overloading new and delete is an option based on the object type. Another option is to employ a factory that produces well-constructed objects. When you are done with the object you give it back to the factory for recycling.
If you want to change the dynamic memory management inside an STL container, you need to define your own Allocator class. Things here can get a bit difficult since it is not always obvious how the container is going to use memory and constraints you place on ordering or size may be violated.
This was also the reason I preferred to develop my own containers, but as you pointed out correctly, the closest you stick to the standard library, the more likely you can find help if needed.
I would try reserving a section in linker script for creating fixed-size memory container for STL
You can also define a large static array of char (or std::byte, or uint8_t) and manage that. Linker scripts can be cumbersome, but then you need to manage both linker script and code in order to keep them in sync.
With C++ you can tune quite precisely the cost of abstractions, so you can still use parts of the standard library with other container libraries and keep the other "zero-cost" abstractions.
I don't get this one exactly. Are you saying to a do allocations/deallocation in a LIFO (Last In First Out) manner? I think this wouldn't go because you cannot assume all allocated objects will be deleted in the same order as they were created.
I think that you can make a general, or rather a simplified variant of memory allocations however based on how many memory allocation algorithms exists, it might be best to try out an existing library for alternative memory allocator - most of the time it doesn't pay of to "reinvent the wheel" however sometimes, in order to optimize for a specific application, you have no choice but to do exactly that :)
Indeed this cannot be a general strategy for dynamic memory management. What I am suggesting is to leverage the pattern of allocation/deallocation of your objects, to manage dynamic memory in the most efficient way, without the burden of a general-purpose memory allocator.
Regardless you are using STL, custom libraries, or even C code if you need dynamic memory and you can't afford the generic memory manager, analyzing the usage patterns and solving that very specific problem, usually pays off.
OTOH if you need to stick with a generic allocator, I wouldn't expect much benefits in switching away from the default allocator on a resource-constrained system, because of the need to keep fragmentation low and be fast. But the best way to assess this is to make some tests on the real device.