EmbeddedRelated.com
Blogs
The 2026 Embedded Online Conference

I Stopped Testing Embedded Systems by Hand. Here's What Replaced It.

Everardo GarciaApril 23, 2026

Here's a scene you'll probably recognize. You've finished a firmware build. You open minicom, or PuTTY, or whatever terminal flavor your team uses. You connect to the board. You type a command. You read the output. You copy and paste the result into a spreadsheet, or a notebook, or (if you're honest) your own head. And you do that again for every test cycle, every release, every change, minor or major.

Most embedded teams test their system-level behavior this way. The bugs it finds tend to show up late, in the QA phase, after everything has been integrated. By that point you're not dealing with one module anymore. You're dealing with the interaction between firmware, hardware initialization, cables, timing, and whatever else got merged in the last two weeks. Slow to reproduce, complex to debug, expensive to fix.

I've been doing embedded work across IoT and automotive systems for fifteen-plus years, and I've been on both sides of this. Projects where testing was solid, projects where it wasn't. A few years ago I stopped writing test procedures for humans and started writing them for Python, and the difference was bigger than I expected.

This article is available in PDF format for easy printing

The Gap in the Pyramid

The testing pyramid is a familiar picture. Unit tests on the bottom, fast and numerous. Integration tests in the middle. System-level and functional tests above them. Exploratory and user-acceptance at the top.

The testing pyramid, with a dashed line marking where most embedded teams stop automating: between integration tests and the functional / system-level layer above.

Most embedded teams do well at the bottom. Some do integration tests too. But the layer where you run full firmware on real hardware and check that the whole thing behaves like a product (that's where automation is usually missing). Whatever happens there happens by hand, or it happens only in QA, or it doesn't happen until something breaks.

That missing layer is the one I want to talk about.

What You Actually Need to Automate It

Four things:

  • A way to program the board. You already have one: JTAG, SWD, ST-Link, whatever you use.
  • A channel to talk to the firmware. UART is usually fine. CAN if that's your world.
  • A structure to define and run test cases. That's where OpenHTF comes in.
  • A way to validate responses against expected values. Also OpenHTF.

OpenHTF (Open Hardware Testing Framework) was originally built at Google for manufacturing-line testing. Factory floors, volume production, that kind of thing. But the core ideas translate almost directly to embedded development. You write the test logic: what to send, what to check, what to measure. OpenHTF manages the serial port lifecycle, checks values against limits, handles setup and teardown, and writes structured JSON with timestamps and a full trace. You write the what. The framework handles the how.

No CI server required. No dedicated test lab. If you have a laptop, a USB cable, and the board you already develop on, you have everything you need.

The Whole Setup Is Deliberately Boring

Demo architecture: a laptop running Python and OpenHTF connected by a single USB/UART cable to an STM32 Nucleo board that exposes version, voltage, and led firmware commands.

In my talk I demo this end-to-end with an STM32 Nucleo board. One cable, one laptop, one script. The cable doubles as programmer and serial console. The firmware on the board responds to a small command-line interface over UART. And the test runs like this:

  1. OpenHTF opens the serial port.
  2. It flashes a firmware binary via ST-Link.
  3. It captures the boot banner over UART and validates it against a regex.
  4. It sends a version command, waits for the response, and checks the version string against an expected contract.

Four phases, two plugs (OpenHTF's word for a hardware interface driver), around 150 lines of Python total. Every run writes a timestamped JSON report automatically; you never have to remember to save results. If the firmware version bumps and someone forgets to update the test, it fails. If the boot banner regresses on a refactor, it fails. Those are exactly the quiet regressions a human misses on the fifth consecutive day of manual verification.

One thing worth highlighting: all the configuration that changes between machines (serial port, baud rate, path to the programmer, firmware binary location) lives in YAML. The test logic itself doesn't change when you move the setup to another engineer's desk. That separation is the whole point. It's also the thing that turns "Ever's test setup" into "the team's test setup."

Where AI Actually Earns Its Keep

Automated testing is great. The part that usually kills it is writing the tests. A new plug for a new hardware interface is an afternoon of work, minimum. A new test phase is an hour or two. Multiply that by fifteen features and five peripherals and you can see why automation gets deferred, and why "later" often becomes "never."

This is where I started treating GitHub Copilot as a co-author, not as an autocomplete tool. The approach that worked for me was spec-driven: I write a short Markdown document describing what the test phase or plug should do, I point Copilot at an existing plug as the reference for project conventions, and I have it generate the scaffolding. The generated code matches the existing logging style, the existing config pattern, the existing class structure, because it has real examples of all of them to work from.

During the live demo I use this to add a new "verify temperature in range" phase mid-talk, and to generate a full CAN plug from a python-can snippet. The generated code worked. It also immediately flagged a firmware bug (the temperature came back at minus 36 instead of somewhere in the 25-to-32 range I expected) because the test caught the out-of-range value on the first run. That's the loop I want. The test tells me when I'm done, and it tells me when I've broken something, and the AI handles the boilerplate so I can stay focused on the validator logic and the edge cases.

The caveat is that you still review what Copilot gives you. You're still the boss, still responsible for what the thing does. But compared to writing a plug from scratch while reading documentation, the difference is not subtle.

If You Want to See It Run

I walk through the full live demo, including the code walkthrough, the failure cases, and the Copilot-assisted plug generation, in my session at the Embedded Online Conference. The repository is linked from the conference page, so you can clone it and run it against your own board.

If I had to leave you with one thing, it would be this. Start with one test. Run it daily. Make it part of your build. That is the whole workflow. The hard part is not running the tests; it's deciding to stop running them by hand.


The 2026 Embedded Online Conference

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: