In RTOS and embedded systems contexts, latency is the elapsed time between a triggering event and the system's first observable response to that event. It is a key metric for evaluating whether a system can meet real-time deadlines.
In practice
Latency surfaces in several distinct forms in embedded work. Interrupt latency is the time from a hardware interrupt being asserted to the first instruction of the ISR executing; this includes the time for the CPU to finish the current instruction, save context, and vector to the handler. On ARM Cortex-M cores, with the NVIC and no interrupt nesting complications, this is typically on the order of 12–40 cycles under ideal conditions (zero wait-state RAM execution, no stacking surprises), but the actual figure varies with core variant, memory wait states, flash acceleration, and implementation details. Scheduling latency (or preemption latency) is the additional time from the end of an ISR to when the RTOS actually runs the highest-priority ready task; this depends on whether a context switch is pending, what the scheduler overhead is, and whether a critical section is held.
In practice, worst-case latency matters more than average latency for hard real-time systems. An RTOS may advertise a typical interrupt latency of a few microseconds, but a poorly placed critical section, a long ISR, or a high-priority task with a burst of work can cause occasional spikes that violate timing requirements. Tools like RTOS trace recorders (e.g., Percepio Tracealyzer, Segger SystemView) are commonly used to measure and visualize latency distributions rather than relying on average figures.
Latency compounds across a signal processing or control pipeline. In a data acquisition system, total end-to-end latency includes sensor settling time, ADC conversion time, DMA transfer, task wakeup latency, and processing time before output. Each stage adds to the total, and jitter at any stage accumulates.
Linux-based embedded systems (e.g., i.MX 6, AM335x) running the mainline kernel introduce significant and variable scheduling latency -- often hundreds of microseconds to low milliseconds -- due to a combination of non-preemptible kernel sections, configuration choices, workload characteristics, and hardware; the exact range depends heavily on kernel version and system configuration. PREEMPT_RT patches reduce worst-case latency substantially but do not eliminate it. For hard real-time requirements, a dedicated RTOS or a dual-core asymmetric design (one core running Linux, one running a bare-metal or RTOS workload) is often used instead.
Frequently asked
What is the difference between interrupt latency and scheduling latency?
Interrupt latency is the time from a hardware interrupt signal to the first ISR instruction executing. Scheduling latency (sometimes called dispatch latency or preemption latency) is the time from the end of an ISR -- where the RTOS is notified a higher-priority task is ready -- to when that task actually begins running. Total response latency is the sum of both, plus any time spent in critical sections that defer the scheduler.
Why is worst-case latency more important than average latency for real-time systems?
A hard real-time system must guarantee its deadline is never missed, not just usually met. A system with a 2 us average latency but a 500 us worst-case spike will miss any hard deadline tighter than that worst case, regardless of how well it performs on average. Measuring and bounding worst-case latency -- ideally under realistic load and stress conditions -- is the relevant design target.
What are common sources of unexpected latency spikes in RTOS-based systems?
Common sources include: long or poorly scoped critical sections (disabling
interrupts or the scheduler), high-priority tasks that run for longer than expected bursts, ISRs that do too much work,
cache misses on processors with data and instruction caches, and
DMA or bus contention. On devices with an
MMU, TLB misses can also add cycles. Profiling with a trace tool is usually necessary to identify the actual source.
Can I trust an RTOS vendor's published interrupt latency figure?
Treat vendor figures as a best-case baseline measured under ideal conditions -- often on a specific CPU, with no other tasks running, no critical sections active, and
cache warmed. Your actual worst-case latency will typically be higher once you add your application workload, critical sections, and any peripheral-induced bus stalls. Measure on your target hardware under realistic conditions.
How does boot latency relate to real-time requirements?
Some systems must respond to external events shortly after power-on, making boot time a real-time constraint in itself. A system that takes two seconds to initialize an RTOS, drivers, and application state may miss an early event. There is a tradeoff between keeping a system in a low-power standby state (short wakeup latency) versus cold-booting (potentially lower idle power but longer response time).
Differentiators vs similar concepts
Latency is often conflated with jitter and throughput. Latency is the delay for a single event to produce a response. Jitter is the variation in that latency across repeated events -- a system can have low average latency but high jitter, which is equally problematic for real-time control. Throughput is the rate at which a system processes a sustained stream of events and is a separate metric; a system can have high throughput but poor (high) latency, or low latency but limited throughput.