EmbeddedRelated.com
Blogs
The 2026 Embedded Online Conference

Beyond the Packet: Designing Reliable Serial Communication for Embedded Systems

Prabo SemasingheMay 4, 2026

Serial communication between microcontrollers sounds simple. You send bytes from one side and then you receive them from the otherside. Done, right?

It is really more than that. I've spent a good part of my career as a firmware engineer working on systems where multiple microcontrollers need to talk to each other reliably. Communication protocol is one of those things that can quietly break your entire system if you don't get the design right. Sending bytes on the wire is the easier part. The hard part is designing a reliable framework around it. How do you know where a packet starts? How do you detect corruption? What happens when an acknowledgment goes missing? How do you recover without bringing down the whole system?

These are firmware design problems, not hardware problems, which means we can prevent them with a solid upfront architecture.

This article is available in PDF format for easy printing

Why don't We Just Use an Existing Protocol?

This is a fair question. There are serial protocols like CAN and Modbus that are well established, well understood, well tested. But they're designed for specific ecosystems and constraints, and if your system doesn't share those constraints, you can end up with overhead that no longer add value.

There are solid reasons to design your own protocol. Maybe you have tight resource constraints such as limited bandwidth, limited MCU memory. Maybe you control both ends of the communication and don't need to conform to an external standard. Your system may have asymmetric communication patterns or specific timing requirements that don't fit well with an existing protocol.

To be clear, this isn't an argument that existing protocols are bad or that you should always roll your own protocol. This article is about how to understand the problem well enough to either choose or design the right solution. Because whether you pick a protocol off the shelf or build one from scratch, the same fundamental problems need to be solved.

The Five Problems Every Serial Protocol Has to Solve

Almost every serial protocol is trying to solve the following set of problems. 

  1. Framing: Where does a packet start and where does it end?
  2. Addressing: In a multipoint system, who is this packet for and from whom it is coming from?
  3. Integrity: Is the data correct or corrupted?
  4. Flow control: Who talks when? Who has priority when multiple nodes want to transmit at the same time?
  5. Recovery: How does the protocol respond when something fails?

Designing the Packet

The packet structure is the core of any communication protocol. Given below are some of the design decisions that matter most.

The start delimiter is how the receiver identifies the beginning of a packet. It needs to be unique enough that payload data can't be falsely detected as a start marker. You can use a reserved single byte that you know will never appear in the payload (harder than it sounds), or a multi-byte sequence to reduce the chance of false detection. You can also use techniques like byte stuffing where you escape any occurrence of the start byte within the payload. Having a length field also helps because once the receiver knows how long the payload is, accidental matches become much less likely.

Message IDs and segmentation: Each message gets a unique message ID that increments and rolls over after a maximum. This alone allows the receiver to identify the order that they messages were sent if each packet can have different latencies, e.g., The internet. In addition to that, if your system has bandwidth limitations, you may need to segment large messages, sending one piece at a time with a segment number field so the receiver can reconstruct the original packet in order. In that case, it is also a good idea to add CRC checks on each segment in addition to a CRC for the entire message.

Message types are important because not all data has the same characteristics. Different data such as sensor readings, event-driven alarms, non-frequent data such as firmware upgrades, health and diagnostics data, system metadata have different frequencies, different criticality levels, and different tolerance for latency. Defining distinct message types lets you handle each type accordingly instead of treating different data the same way.

Error detection is your final safeguard. Checksums, CRCs, and parity bits each have different trade-offs between detection strength and computational cost. You can place a single checksum at the end of the packet, or include separate checks for the header and payload. 

In addition to all above, planning for future scalability during initial design phase is also vital. You should be able to add new message types or modify the format for specific messages without breaking existing ones. Including a protocol version field that gets sent as the first message when the system starts lets the receiver know what format to expect and which reconstruction algorithm to use. This kind of backward compatibility is what keeps your system alive after version one is deployed in the field.

The overall Communication Framework

Designing a good packet structure is necessary, but it's not sufficient for the reliablity of the entire communication system. You also intentionally design the behavior of the entire communication framework.

You need to define command and response behavior, i.e., who initiates the communication, should the receiver respond, and if so when does it reply? On top of that, you need an acknowledgment mechanism (ACK/NAK) as your basic feedback loop. You also need to handle the case where an ACK itself goes missing, which means retries, and on top of retries, timeouts needs to be implemented to prevent infinite retry loops.

There's a subtle edge case here, i.e., how the system handles duplicated commands. If the receiver successfully processes a command and sends an ACK, but the ACK is lost, the transmitter will retry the command. Your system needs to be safe for this scenario. The receiver needs to recognize "I've already handled this" rather than executing the same command twice.

A heartbeat or keep-alive mechanism is also worth including depending on the application. It's a way for nodes to know if the link is still alive even when there are no data to send. 

Test for the Failures, Not Just the Features

If you didn't test the failure modes, you didn't test the system. Try to break the system as much as possible before shipping it to the field. There are multiple states of testing that needs to be included. Unit testing covers individual firmware functions. Feature testing validates end-to-end behavior of each feature such as receiving different data types, performing a firmware upgrade over the protocol, running file transfers back to back. Most importantly fault injection testing is where you find out if your protocol is actually robust. You can manually inject faults to create every edge case you can think of and verify that the system either handles it correctly or fails gracefully.

And long-term testing matters more than people usually expect. Having test systems running continuously, in-house and in the field if possible is very important to validate the  real-time behavior of the system over time. And this will allow you to catch the kind of intermittent issues that never show up in a 30-minute test run.

I cover all of these design steps in detail in my session at the Embedded Online Conference 2026, including the specific packet field layouts, communication framework design and testing considerations that I've found most effective across multiple products.


The 2026 Embedded Online Conference

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: