EmbeddedRelated.com

volatile (keyword)

Category: Architecture

In C and C++, `volatile` is a type qualifier that tells the compiler a variable's value may change at any time outside the normal flow of the program, preventing the compiler from caching the value in a register or optimizing away repeated reads and writes. It is commonly applied to memory-mapped hardware registers, variables shared between ISRs and main code, and variables modified by DMA or other bus masters.

In practice

The most fundamental use of `volatile` in embedded code is qualifying pointers to memory-mapped peripheral registers. Without it, the compiler may legally read a register once, cache the result, and never re-read it, even inside a polling loop. Declaring the register pointer as `volatile uint32_t *` forces a real load or store instruction for every access the source code specifies. Most device header files provided by vendors (such as STM32 CMSIS headers or NXP SDK headers) already mark their register structs `volatile`, so application code inherits the qualifier automatically when using those headers.

A second common use is flagging variables that are written inside an interrupt service routine and read in the main loop (or vice versa). Without `volatile`, the compiler has no reason to believe the variable changes between two reads in the same execution context, and may hoist the read out of a loop entirely. Marking the variable `volatile` prevents that specific optimization. Note, however, that `volatile` does not provide atomicity or memory ordering guarantees: on MCUs with registers wider than the bus width, or on multicore SoCs, a `volatile` variable can still be read or written non-atomically, and a separate synchronization mechanism (atomic operations, critical sections, or mutexes) is still required for correctness.

A common pitfall is over-relying on `volatile` as a complete thread-safety solution. As discussed in "Important Programming Concepts (Even on Embedded Systems) Part III: Volatility" and "Global Variables vs. Safe Software," `volatile` prevents the compiler from optimizing accesses, but it says nothing about the order in which those accesses become visible to other cores or peripherals, and it does not prevent a preempting ISR from observing a partially updated multi-word variable. For shared state in concurrent contexts, `volatile` is typically a necessary but not sufficient condition for correctness.

Another pitfall is applying `volatile` indiscriminately as a substitute for understanding the real concurrency problem. Every `volatile` access disables certain optimizations and may generate more or slower code. On tight inner loops or performance-critical drivers, unnecessary `volatile` qualifiers can measurably impact throughput. Apply it deliberately, at the narrowest scope that actually requires it.

Discussed on EmbeddedRelated

Frequently asked

Does volatile guarantee atomic access?
No. `volatile` only instructs the compiler to emit an actual memory access each time the variable is read or written; it says nothing about whether that access is atomic at the hardware level. On a 32-bit MCU reading a 64-bit variable, or on any MCU reading a multi-byte variable that straddles a bus transaction, the access can still be interrupted mid-way. For atomic behavior, use compiler intrinsics, C11/C++11 `_Atomic` / `std::atomic`, or explicit critical sections.
Do I need volatile for every memory-mapped register?
For registers you poll or write in a loop, yes. Without it the compiler is permitted to eliminate redundant-looking accesses. Most vendor-supplied SDK headers (CMSIS for Cortex-M, NXP SDK, Microchip MCC-generated headers, etc.) already declare register structs with `volatile`, so you typically inherit the qualifier automatically. If you define your own register map, you are responsible for adding it.
Is volatile enough to safely share a variable between an ISR and main code?
It depends on the variable's size and the MCU's bus width. For a single variable that fits in one naturally-aligned bus transaction on your target (e.g., a `uint32_t` on a 32-bit ARM Cortex-M), `volatile` prevents the compiler from caching the value and is often sufficient for single-writer/single-reader patterns where a torn read is not possible. For multi-word values, or anywhere a read-modify-write sequence must be indivisible, you also need a critical section or an atomic operation. See "Scorchers, Part 3" for a concrete double-buffering pattern that avoids needing either.
Can volatile be used in C++ the same way as in C?
The semantics are the same in C++03 and C++11/14/17 for basic register and ISR-shared variable use cases. However, C++20 deprecated several uses of `volatile` (e.g., compound assignment on volatile objects) to push developers toward `std::atomic` and explicit memory ordering. For new C++ code on platforms with atomic support, prefer `std::atomic` or `std::atomic_ref` for concurrency and reserve `volatile` for genuine hardware-register access.
Does volatile affect compiler optimizations beyond simple caching?
Yes. The compiler must preserve the exact number and order of volatile accesses as written in the source, and it cannot reorder a volatile access with respect to another volatile access. However, it may still reorder non-volatile accesses around a volatile one. This means `volatile` alone does not provide the full memory barrier semantics that multicore or DMA scenarios often require; an explicit barrier or fence instruction is needed there.

Differentiators vs similar concepts

`volatile` is often confused with `const` (which prevents the programmer from modifying a variable but says nothing about hardware-driven changes) and with C11/C++11 `_Atomic` / `std::atomic` (which provides both the compiler-access guarantees of `volatile` and atomicity and memory-ordering guarantees that `volatile` lacks). On ARM Cortex-M and similar single-core MCUs without out-of-order execution, `volatile` often works correctly in simple ISR/main-loop sharing patterns, leading developers to incorrectly generalize it as a full concurrency primitive on multicore or weakly-ordered architectures where it is not.