EmbeddedRelated.com
Blogs

The Asimov Protocol

Ido GendelJanuary 4, 2024

While the Internet is choke-full of explanations of basic data communication protocols, very little is said about the higher levels of packing, formatting, and exchanging information in a useful and practical way. This less-charted land is still fraught with strange problems, whose solutions may be found in strange places – in this example, a very short, 60 years old Science Fiction story.

It's about time

Here's a simplified description of a system. Consider everything in it to be non-negotiable. A Windows PC runs a program written in Python. The PC is connected through an adapter to an RS485 bus, half-duplex, 115200 baud. Also connected to the bus are one hundred identical MCU-based units, each identified by a random but unique address. Every unit monitors several sensors and logs significant changes in their outputs, as well as errors. The PC program needs to get all this information, about 10 times a second.

The raw numbers are easy: if we use UART (8-N-1) over the RS485, that's 10 bits per byte of data, so 11520Bps = 1152B per cycle, 11 whole bytes per unit. Plenty of bandwidth for the kind of data we need to transfer. But real-life protocols don't work like that. Especially since our bus is half-duplex, we have to manage and time data packets to prevent bus collisions (which make the ICs hot, ask me how I know) and to identify "queries" and "answers" to and from specific addresses.

This article is available in PDF format for easy printing

Here's what such a protocol might look like in action – conveniently translated to plain English:

PC: Unit A, do you have events to report?

Unit A: Yes.

PC: Unit A, Report events.

Unit A: Sensors 3 and sensor 8 crossed the threshold.

PC: Unit A, Acknowledged.

PC: Unit A, do you have errors to report?

Unit A: No.

PC: Unit B, do you have events to report?

etc.

There are plenty of ways to make the actual messages terse, but the worse time thieves are hidden, literally, between the lines. How long does it take for the PC program to send and to receive bytes from the bus? Even if we're savvy enough to decrease the latency of the COM port adapter driver (it can be done, at least for some drivers), we still have the non-real-time OS and Python to deal with, and there's only so much we can do with them. We're looking at, potentially, whole milliseconds of delay, sometimes even more, between query and answer. Multiply that by the number of units and you'll see we're in deep trouble. This will not be solved by cycle-counting in our MCU code. Time to hit the books!

Sci-Fi to the rescue

"My Son, the Physicist" from 1962 is, IMHO, not one of Isaac Asimov's finest stories, but that's beside the point. Spoiler ahead, here's a quick summary: Unexpectedly, there's a bunch of people on Pluto, and they may have vital information to share; but no one knows for how long the radio link will last. Given the unavoidable 11-hour lag between sending a message from Earth and receiving the reply, scientists scramble to come up with a questioning protocol to minimize misunderstanding and ambiguity. The physicist's mom, who happens to be visiting him, provides the workaround: "Just keep talking". Don't ask one thing and then wait for one answer; instead, leave the microphone open, say whatever you know, think, assume, want to ask. Let the other side do the same. Soon enough you'll probably learn all you need to learn, and a lot more.

While the transmission in the story is full-duplex, the fundamental bottleneck is the same as in our system: not the data packages but rather the forced delays between them. That's what we need to address. We can't control the duration of each delay, but we can save some back-and-forth and thereby decrease the number of delays. Just keep talking - or rather, cram as much as we can into every single transmission.

In the example above, the most obvious candidate for the new approach is the "Acknowledged" message. It doesn't have to be immediate; whatever the unit has to do with this confirmation, it can wait a little longer. Imagine, then, after a round of sensor and error polling, the PC sending this:

PC: Unit A, Acknowledged ; Unit B, Acknowledged ; […] Unit Z, Acknowledged

For N units, this alone will save N-1 delays. A similar grouping of messages can be done on the unit's side. Given that a typical delay is significantly longer than the time to transmit useful information, we might as well send the sensor and the error reports together, always (careful – it's going to require an optional "nothing new to report" message). Basically, anything that doesn't have to be immediate, or that can't be immediate anyway because of the communication delays, can be sent as part of a message group and thus save us enough time to make the system feasible.

One last detail: how will each side know when an incoming stream of messages is finished? A common solution is to use a "sentinel", a unique message which will only come at the end of a message stream. It doesn't have to be only a sentinel, though – it can carry useful information too. For example, a good candidate for a sentinel message from the PC side is the command to send data; it's there anyway, and by definition, it means that the PC is relinquishing control of the bus and is now awaiting a reply.

Something to consider

I first became aware of these PC-born delays when I wrote software for psychology experiments, back at the end of the previous millennium. Research in perception, for instance, tends to report response time variations down to the millisecond, but the simple keyboards used at the university's labs, coupled with the non-real-time OS, introduced delays and jitter that were probably on a ten-fold scale. I sure could do a lot better had I known embedded development back then! But whether we can do something about communication delays "at the source" or not, there's always more that can be done on the higher levels of the protocol design. Increasing the baud rate, compressing data, and similar tricks certainly have their place; yet depending on where the significant bottleneck is, sometimes all one has to do is just keep talking.


To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: