Floating-point is a numeric representation that encodes a value as a sign, a significand (mantissa), and an exponent, allowing the decimal point to "float" to cover a wide dynamic range at the cost of variable precision. In embedded systems, floating-point arithmetic is most commonly implemented according to IEEE 754, which defines 32-bit single-precision and 64-bit double-precision formats.
In practice
On microcontrollers with a hardware floating-point unit (FPU), such as ARM Cortex-M4/M7/M33 cores found in STM32F4/F7/H7, nRF52, and SAM E/S/V families, single-precision IEEE 754 operations execute in a small number of cycles and are practical for real-time control and signal processing. On cores without an FPU, such as ARM Cortex-M0/M0+ or most 8-bit and 16-bit MCUs, the compiler falls back to a software floating-point library. Software float can be 10x to 100x slower than hardware float and also increases code size significantly, which matters on resource-constrained targets.
IEEE 754 single-precision (float) provides approximately 7 significant decimal digits and a dynamic range from roughly 1.2e-38 to 3.4e+38. Double-precision (double) provides about 15 significant digits. A common pitfall is assuming float gives exact results: values like 0.1 cannot be represented exactly in binary floating-point, and accumulated rounding error can cause subtle bugs in integrators, filters, or threshold comparisons. Equality comparisons on floating-point values (== or !=) are often unreliable when values are derived from computation; comparing against an epsilon tolerance is appropriate in those cases. However, exact equality is sometimes correct by construction, for example when comparing against a known sentinel value or checking for zero after a direct assignment.
Floating-point is often the most readable choice for algorithm prototyping, but the "Data Types for Control & DSP" and "Never use Float or Integer" posts on EmbeddedRelated both caution against using it uncritically in production embedded code. Fixed-point arithmetic can replace float on FPU-less cores with predictable, deterministic behavior and no dependency on a math library. The decision typically hinges on available hardware, cycle budget, required dynamic range, and whether the toolchain's soft-float ABI is acceptable for the application.
In safety-critical and hard-real-time contexts, floating-point introduces additional concerns: non-deterministic behavior around special values and IEEE 754 exceptional conditions (such as invalid operation, overflow, underflow, divide-by-zero, and inexact results, as well as subnormal/denormal operands), FPU register state that must be saved and restored in RTOS context switches, and compiler/tool qualification requirements under standards such as IEC 61508 or DO-178C. On ARM Cortex-M cores with an FPU, the lazy stacking feature can defer FPU register saves automatically on exception entry, but it must be understood and configured correctly to avoid context-corruption bugs in an RTOS.
Frequently asked
My Cortex-M4 has an FPU. Do I need to do anything to use it?
Yes. The
FPU on ARMv7-M cores (Cortex-M4/M7) is disabled at reset. You must enable it by setting the CP10 and CP11 bits in the CPACR register before any floating-point instruction executes. Most
MCU vendor HALs and CMSIS startup code handle this, but if you write your own startup, you need to do it explicitly. You also need to compile with the correct -mfpu and -mfloat-abi flags to tell the toolchain to emit hardware FPU instructions rather than software-float library calls.
What is the performance cost of software floating-point on an FPU-less MCU?
It varies by operation and library, but a single-precision multiply or divide in software on a Cortex-M0 typically costs tens to over a hundred cycles, compared to one to a few cycles on a Cortex-M4 with the
FPU enabled. Transcendental functions (sin, sqrt, exp) are even more expensive in software. On 8-bit MCUs like AVR or PIC, the cost is higher still. Profiling your specific target and compiler library before committing to float is worthwhile.
What is the difference between float and double in embedded C?
In standard C, float is typically 32-bit IEEE 754 single-precision and double is 64-bit double-precision. On desktop platforms double is the default for floating-point literals. In embedded code, using double on a core whose
FPU only handles single-precision (such as Cortex-M4) silently forces every double operation through a software library, even when the FPU is otherwise active. Use the f suffix on literals (e.g., 3.14f) and prefer float when single-precision is sufficient.
When should I prefer fixed-point over floating-point?
Fixed-point is worth considering when you are targeting an
FPU-less core and the cycle or code-size cost of software float is prohibitive, when you need strict determinism across platforms, or when your dynamic range is bounded and known at design time. The EmbeddedRelated posts 'How to Build a Fixed-Point PI Controller That Just Works' and 'Round Round Get Around: Why Fixed-Point Right-Shifts Are Just Fine' cover the practical mechanics. The trade-off is added complexity in managing scaling factors and avoiding overflow, discussed in 'Understanding and Preventing Overflow'.
Do I need to save FPU registers in an RTOS context switch?
Yes, if any task or
ISR uses floating-point. On ARM Cortex-M cores with an
FPU, the hardware supports lazy stacking, which defers saving FPU
registers on exception entry until the FPU state is actually needed, rather than saving it unconditionally on every exception entry. Most production RTOS ports (
FreeRTOS, Zephyr, ThreadX) provide FPU-aware context switch code, but it is usually opt-in. If you enable the FPU but use an RTOS port that does not save FPU state, you will get hard-to-reproduce data corruption in tasks that use floating-point.
Differentiators vs similar concepts
Floating-point is frequently contrasted with fixed-point representation. Fixed-point encodes a value as a scaled integer (e.g., Q15 or Q31 format), keeping the radix point at a statically defined bit position.
Fixed-point arithmetic uses standard integer hardware, making it efficient on any core, but requires the programmer to manage scaling, prevent overflow, and accept a dynamic range limited by the chosen format. Floating-point handles a vastly wider dynamic range automatically and is more readable, but requires either
FPU hardware or a slow software library. A separate contrast exists between float (32-bit, ~7 decimal digits of precision) and double (64-bit, ~15 decimal digits): on MCUs whose FPU only supports single-precision, using double reverts those operations to software execution.