EmbeddedRelated.com
Blogs

Mutex vs. Semaphores – Part 2: The Mutex & Mutual Exclusion Problems

Niall CoolingMay 15, 20197 comments

Part 1 of this series we looked at the history of the binary and counting semaphore, and then went on to discuss some of the associated problem areas. In this posting I aim to show how a different RTOS construct, the mutex, may overcome some, if not all, of these weaknesses.

To address the problems associated with semaphore, a new concept was developed during the late 1980’s. I have struggled to find it’s first clear definition, but the major use of the term mutex (another neologism based around MUTual EXclusion) appears to have been driven through the development of the common programming specification for UNIX based systems. In 1990 this was formalised by the IEEE as standard IEEE Std 1003.1 commonly known as POSIX.

The mutex is similar to the principles of the binary semaphore with one significant difference: the principle of ownership. Ownership is the simple concept that when a task locks (acquires) a mutex only it can unlock (release) it. If a task tries to unlock a mutex it hasn’t locked (thus doesn’t own) then an error condition is encountered and, most importantly, the mutex is not unlocked. If the mutual exclusion object doesn’t have ownership then, irrelevant of what it is called, it is not a mutex.

This article is available in PDF format for easy printing

The concept of ownership enables mutex implementations to address the problems discussed in part 1:

  1. Accidental release
  2. Recursive deadlock
  3. Task-Death deadlock
  4. Priority inversion
  5. Semaphore as a signal

Accidental Release

As already stated, ownership stops accidental release of a mutex as a check is made on the release and an error is raised if current task is not owner.

Recursive Deadlock

Due to ownership, a mutex can support relocking of the same mutex by the owning task as long as it is released the same number of times.

Priority Inversion

With ownership this problem can be addressed using one of the following priority inheritance protocols:

  • [Basic] Priority Inheritance Protocol
  • Priority Ceiling Protocol

The Basic Priority Inheritance Protocol enables a low-priority task to inherit a higher-priorities task’s priority if this higher-priority task becomes blocked waiting on a mutex currently owned by the low-priority task. The low priority task can now run and unlock the mutex – at this point it is returned back to its original priority.

The details of the Priority Inheritance Protocol and Priority Ceiling Protocol (a slight variant) are covered later in this article.

Death Detection

If a task terminates for any reason, the RTOS can detect if that task current owns a mutex and signal waiting tasks of this condition. In terms of what happens to the waiting tasks, there are various models, but two dominate:

  • All tasks readied with error condition;
  • Only one task readied; this task is responsible for ensuring integrity of critical region.

When all tasks are readied, these tasks must then assume critical region is in an undefined state. In this model no task currently has ownership of the mutex. The mutex is in an undefined state (and cannot be locked) and must be reinitialized.

When only one task is readied, ownership of the mutex is passed from the terminated task to the readied task. This task is now responsible for ensuring integrity of critical region, and can unlock the mutex as normal.

Mutual Exclusion / Synchronisation

Due to ownership a mutex cannot be used for synchronization due to lock/unlock pairing. This makes the code cleaner by not confusing the issues of mutual exclusion with synchronization.

Caveat

A specific Operating Systems mutex implementation may or may not support the following:

  • Recursion
  • Priority Inheritance
  • Death Detection

Mutual Exclusion Problems

The mutex is a significantly safer mechanism to use for implementing mutual exclusion around shared resources. Nevertheless, there are still a couple of problems that use of the mutex (in preference to the semaphore) will not solve. These are:

  • Circular deadlock
  • Non-cooperation

Circular Deadlock

Circular deadlock, often referred to as the “deadly embrace” problem is a condition where two or more tasks develop a circular dependency of mutual exclusion. Simply put, one task is blocked waiting on a mutex owned by another task. That other task is also block waiting on a mutex held by the first task.

So how can this happen? Take as an example a small control system. The system is made up of three tasks, a low priority Control task, a medium priority System Identification (SI) task and a high priority Alarm task. There is an analogue input shared by the Control and the SI tasks, which is protected by a mutex. There is also an analogue output protected by a different mutex.

The Control task waits for mutexes ADC and DAC:

mutex_lock (ADC);
mutex_lock (DAC);
/* critical section */
mutex_unlock (ADC);
mutex_unlock (DAC);

The SI Task waits for mutexes DAC and ADC:

mutex_lock (DAC); 
mutex_lock (ADC); 
/* critical section */ 
mutex_unlock (DAC); 
mutex_unlock (ADC);

Unfortunately, under certain timing conditions, this can lead to deadlock. In this example the Control task has locked the ADC, but before locking the DAC has been pre-empted by the higher priory SI task. The SI task then locks the DAC and tries to lock the ADC. The SI task is now blocked as the ADC is already owned by the Control task. The Control task now runs and tries to lock the DAC. It is blocked as the DAC is held by the SI task. Neither task can continue until the mutex is unlocked and neither mutex can be unlocked until either task runs – classic deadlock.

For circular deadlock to occur the following conditions must all be true:

  • A thread has exclusive use of resources (Mutual exclusion)
  • A thread can hold on to a resource(s) whilst waiting for another resource (Hold and wait)
  • A circular dependency of thread and resources is set up (Circular waiting)
  • A thread never releases a resource until it is completely finished with it (No resource preemption)

These conditions can be addressed in a number of ways. For example, a design policy may stipulate that if a task needs to lock more than one mutex it must either lock all or none.

Priority Ceiling Protocol

With the Priority Ceiling Protocol (PCP) method each mutex has a defined priority ceiling, set to that of the highest priority task which uses the mutex. Any task using a mutex executes at its own priority – until a second task attempts to acquire the mutex.  At this point it has its priority raised to the ceiling value, preventing suspension and thus eliminating the “hold and wait” condition.

In the deadlock example shown before, the significant point is when the SI task tries to lock the DAC. Before that succeeded and lead to cyclic deadlock. However with a PCP mutex, both the ADC and DAC mutex will have a ceiling priority equal to the SI’s task priority. When the SI task tries to lock the DAC, then the run-time system will detect that the SI’s task priority is not higher than the priority of the locked mutex ADC. The run-time system suspends the SI task without locking the DAC mutex. The control task now inherits the priority of the SI task and resumes execution.

Non-cooperation

The last, but most important aspect of mutual exclusion covered in these ramblings relies on one founding principle: we have to rely on all tasks to access critical regions using mutual exclusion primitives. Unfortunately this is dependent on the design of the software and cannot be detected by the run-time system. This final problem was addressed by Tony Hoare, called the Monitor.

The Monitor

The monitor is a mechanism  not typically supplied by the RTOS, but something the programmer tends to build (a notable exception is Ada95’s protected object mechanism). A monitor simply encapsulates the shared resource and the locking mechanism into a single construct (e.g. a C++ Object that encapsulates the mutex mechanism). Access to the shared resource, then, is through a controlled interface which cannot be bypassed (i.e. the application never explicitly calls the mutex, but calls upon access functions).

Finishing Off

This goal of these initial postings is to demonstrate that common terms used in the real-time programming community are open to ambiguity and interpretation. Hopefully you should now be clear about the core differences between the Binary Semaphore, General (counting) Semaphore and the Mutex.

The underlying difference between the Semaphores and the Mutex is the Principle of Ownership. Given the principle of ownership a particular implementation of a mutex may support RecursionPriority inheritance and Death Detection.

ENDNOTE 

An aspect of the mutex I haven’t covered here is that many operating systems support the concept of a condition variable. A condition variable allows a task to wait on a synchronization primitive within a critical region. The whole aspect Synchronization Patterns (e.g. semaphore as a signal) within the context of RTOSs will be the subject of a future posting.


[ - ]
Comment by QLMay 20, 2019

Very informative article. Many developers underestimate the risks and skills needed to use the various RTOS mechanisms safely and correctly and this article illuminates some of the problems and solutions.

I have a question regarding the use of a mutex: What do you think about thread calling a blocking mechanism (e.g., delay() or semaphore wait()) while already holding a mutex? How often do you see it in real-life projects?

[ - ]
Comment by NiallCoolingMay 20, 2019

Hi Miro, thanks for the comment.

The condition-object is there specifically to allow wait-semantics within a critical region. No thread should ever explicitly delay/wait in a critical region.

I've seen delays used but not sem-waits (I guess these get debugged pretty quickly). Actually, a more common, and more worrying, issue is that I've seen numerous companies using priority mechanisms instead of using mutual-exclusion! 

For most applications, the mutex is still too granular for most application programmers and should ideally hidden behind a monitor construct.

[ - ]
Comment by waydanMay 23, 2019

Thank you for the great article Niall.
I've read explanations of semaphores and mutexes in different textbooks, but your brief article is much clearer and easy to understand.

I also appreciate your reference to Ada and its built in protected object feature. This is one of the constructs that drew me to learn the language and one I think many application writers would benefit from.

[ - ]
Comment by NiallCoolingMay 23, 2019

Thanks you for your kind words, they are much appreciated.

I was disappointed that when the Threading APIs for C11 and C++11 were published, concepts such as protected objects and aync-pipes were not part of the model.

One of the things that attracts me to Rust (a really viable alternative to Ada, more so than C) is that concepts such as RwLock and Pipes are built in, but alas, not protected object.

[ - ]
Comment by jms_nhAugust 17, 2019
To address the problems associated with semaphore, a new concept was developed during the late 1980’s.
Try the late 1960s or early 1970s. I'm not sure of who came up with the term first, but it is present in several textbooks and papers in the early 1970's, for example Operating System Principles
[ - ]
Comment by NiallCoolingAugust 19, 2019

My understanding, but I'm always happy to be corrected, is that typically the examples used were still based around the 'type' semaphore but the variable name was 'mutex'. 

I certainly wasn't aware of Mutex as a 'type' until the late-80's (when I happened to be working for Ready Systems - remember VRTX?)

[ - ]
Comment by mr_banditNovember 21, 2019

Some commercial RTOSs have a mutex that temporarly bumps the priority of a lower task while the mutex is active. The goal is to prevent the 2-task deadlock as described in the article.

BTW - excellent article!

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: