UML Statechart tip: Handling errors when entering a state
This is my second post with advice and tips on designing software with UML statecharts. My first entry is here.
It has been nearly 20 years since I first studied UML statecharts. Since that initial exposure (thank you Samek!), I have applied event driven active object statechart designs to numerous projects [3]. Nothing has abated my preference for this pattern in my firmware and embedded software projects. Through the years I have taken note of a handful of common challenges when creating UML statechart based designs. This post tackles the question:
How does an engineer handle synchronous errors while entering a state?
First some context. The UML specifications ([1] 14.2.3.4.5 Entering a State) describe the behavior when a state is entered. Nowhere within that description does the UML specification describe error handling nor does it describe any possible state transitions as an immediate result of entering the state. As Samek [6] notes, “The UML does not allow transitions in entry or exit actions.”
Let’s dive into an example.
Our hypothetical product is required to maintain an internal audit log which is stored as a file in a filesystem. Additionally, the product is required to transmit the audit log to the appropriate backend server when certain events take place. Due to various restrictions, the software is only allowed to transmit 1 KiB of the log file at a time. An initial statechart to accommodate this requirement may appear similar to the following figure:
Given the above design, a software engineer may ask: “What happens if opening the audit file fails? How does the state machine design accommodate this failure?”
As always with software design, there are many possibilities. One option, perhaps all too commonly followed, is to simply ignore the error. For obvious reasons we will skip analysis on that option. The following design options are explored below:
- Explicit Transitions
- Failure-Event Self-Posting
- Asynchronous Service
Explicit Transitions
This option requires the developer to “Explicitly code two transitions with complementary guards and with different target states.” - Samek [6]. This approach is the preferred solution for many firmware projects with a small state-space. Taking this option, our preliminary statechart is modified to the following:
Benefits of this approach include:
- Avoids error handling in the enter-state handler for the State-of-TransmittingAuditFile.
- The event handling code is clear and easy to understand.
Disadvantages include:
- The need for an additional cache or intermediary to store the file handle for future use by the destination state.
- If additional event handlers in other states require a similar transition, then the code will potentially violate the DRY principle as developers copy and paste the transition code to other states.
- Additionally, the firmware may increase in code size if this pattern is needed in multiple states.
- In large projects with dozens if not hundreds of states and events, we are increasing the likelihood of overlooking this pattern of event handling, especially during maintenance.
- However, this concern may be mitigated through templates or macros or other helper functions to contain this common logic.
Despite the disadvantages, this approach is the least complicated and adheres nicely to the UML requirements. I personally use this approach primarily in smaller projects where I do not expect requirements for multiple transitions to the same destination state.
Failure-Event Self-Posting
In this option our enter-state handler properly confirms the success or failure of the file open function. If the operation fails, the handler self-posts a failure event to the active object’s corresponding event message queue, enabling a state transition as a result of the failure event. It is critical to note that this option should only be considered if the underlying framework or queue allows for a “high priority” or LIFO (Last In First Out) posting of an event. Examples include: QActive::postLIFO(), FreeRTOS xQueueSendToFront(), or even from the first major RTOS I used: pSOS’s q_urgent(). A negative example would be a state machine based on Qt’s QStateMachine [5], which would not enable this concept.
Why does this option require an underlying LIFO event queue? Our firmware designs typically handle many sources of asynchronous events, any of which may have already been posted to the event queue before this state is entered. If the newly entered state processes any of those events before processing the self-posted error event, then the state may accidentally process those events in an undefined state. Undefined behavior must be avoided.
Given this information and modifying our example state machine we find:
Benefits of this approach include:
- All logic related to opening the file and handling the error is fully contained in a single state.
- Maintenance mistakes are reduced.
- No intermediary API/storage is needed for the file’s handle.
- When multiple events across multiple states need to transition to this state, the code size will be smaller than the “Explicit Transitions” approach.
Disadvantages include:
- Not all statechart frameworks or underlying event queues support a LIFO event.
- This pattern would be discouraged by strict adherents to the UML statechart design.
- The pattern falls apart as soon as multiple error conditions may be generated during the entry-state handler.
This approach is probably the least preferred of the options presented. However, I have personally used this approach in mid-sized projects where the underlying framework supports a LIFO event queue, where multiple states and events need to transition to the same destination state, where I want to avoid maintenance issues involved as the firmware team size grows, and where firmware code size is constrained.
Asynchronous service
In the asynchronous service solution the firmware implements a separate asynchronous service for the purpose of transforming our synchronous file open method into an asynchronous operation. Along with this new service, this solution requires a more complex statechart design involving an additional intermediate state to initiate the asynchronous request and await its response. In some systems a common thread pool may already exist to enable equivalent behavior. Modifying our example state machine design to this solution, the design might now appear as shown in the following figure:
Benefits of this approach include:
- Logic is fully contained and all transitions to the required behavior use the same composite destination state.
- The asynchronous service creates clear success or failure events which may then create appropriate explicit transitions.
- The asynchronous service could be extended to other needs and could, in some systems, be the equivalent of a thread pool.
Disadvantages include:
- An additional asynchronous service must be implemented.
- More complex. Really, we are just trying to open a file! And yet, this is often the difference between naïve bug-prone software and robust commercially successful software.
Despite the increased complexity, this approach tends to be my preferred solution in larger firmware projects where multiple states and multiple events drive the state machine to the same destination state and where team size exceeds 8-10 software engineers.
I hope this was a useful post to all concerned. If interested in reading more on this topic, checkout Samek’s book [6] and my first related post.
What challenges have you faced with UML statechart design? Let us know in the comments!
References
- Comments
- Write a Comment Select to add a comment
At the end of the article, you conclude with "and where team size exceeds 8-10 software engineers."
I'm curious why you add team size as a consideration.
Thank you for the question. The larger the team size becomes the more I become concerned about maintenance issues... i.e. mistakes as multiple people maintain/modify/amend the code.
So, for example, if we stick with "explicit transitions", then everyone needs to remember to write code like this:
case NEW_EVENT_SIGNAL: //this signal requires a transition to our audit functionality if (fopen(..)) { rtn = TransitionTo( TransmittingAuditFile ); } else { rtn = TransitionTo( AuditFileAccessError ); } break;
instead, we end up with more maintenance resilient code with the recommended approach:
case NEW_EVENT_SIGNAL: //this signal requires a transition to our audit functionality rtn = TransitionTo( TransmitAuditFile ); break;
Additionally, a larger team size most likely indicates a larger more complex project where the "FAILURE-EVENT SELF-POSTING" approach may not be a pattern we want to encourage.
Hope that helps!
Matthew
https://covemountainsoftware.com/consulting/
Nice articles. I have a question, in your experience, how do you handle interrupts with a state machine?
There are two typical ways:
- The interrupt creates an appropriate event which is published or pushed onto an event queue, which feeds a statemachine in a separate thread context (an active object). i.e. the interrupt becomes a source of events for the active object.
- Or the interrupt creates an event which is processed immediately by the statemachine in the interrupt context. In this case the statemachine in question must be entirely owned by the interrupt in question and only process events in the ISR context.
I normally do the first, but have certainly implemented number two as well.
Hope that helps!
Matthew
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: