EmbeddedRelated.com
Forums

Why CRC with EPROM?

Started by MaxMaxfield 3 years ago22 replieslatest reply 3 years ago814 views
On the one hand I feel guilty and embarrassed about asking all of my simplistic questions, but I always end up learning so much here.

This relates to my previous poser regarding how to initialize a struct in a union:  https://www.embeddedrelated.com/thread/13867/initi...

So, here’s the deal -- I’m in the process of building a 10-character display, where each character is a modern (tricolor LED-based) implementation of a 21-segment Victorian display from 1898. You can read more about this here: https://www.eejournal.com/article/recreating-retro...

I’m also using this as the basis for a series of Cool Beans Columns in the electronics and computing hobbyist magazine “Practical Electronics” in the UK: https://www.electronpublishing.com/

This little beauty is going to have a lot of modes and effects. This leads us to the settings, of which there will be a bunch, like the display format for the time mode (12 vs. 24 hour) and the display format for the date mode (YYYY/MM/DD vs. MM/DD/YYYY vs. DD/MM/YYYY) and which effect should be associated with which mode and…

The default settings will be stored in the program (in the DefSettings variable from my previous post) but the user can change these defaults and the updated versions will be stored in the MCU’s EEPROM.

Apart from anything else, this is providing us with a “teachable moment” about checksums in general and CRCs in particular, because we are going to generate our own CRC using an LFSR for the data stored in the EEPROM.

So, finally, my question… what reason should I give for having to use a CRC for the data in the EEPROM in the first place? Is EEPROM so unreliable? How can I position this to beginners?

Thanks as always for your taking the time to peruse and ponder my rambling musings -- Max
[ - ]
Reply by jmford94June 28, 2021

Hi Max.

There are a few reasons that I see for it.  One, erasing and writing the eeprom takes a long time (in microcontroller time) and many operations, and if the user does something like turn off the power in the middle of a write, it could end up with corruption.  

Also there's the matter of wearout of the eeprom, which could cause errors down the road, or quickly if the programmer does something like write to the eeprom every millisecond...

And then, there's the chance that the read itself might be corrupted by noise or an unfortunate cosmic ray event.

John

[ - ]
Reply by MaxMaxfieldJune 28, 2021

Hi John -- thanks so much -- this is just what I needed -- Max

[ - ]
Reply by tcfkatJune 28, 2021

To overcome data corruption (due to cell wearing/power failure/whatever) you can write always sequentially two identically data sets. If, when reading, one CRC is bad, you still have one good copy. If both CRCs are bad, you're stuck.

[ - ]
Reply by MaxMaxfieldJune 28, 2021

Ooh -- that is a great point -- I hadn't thought of that -- thanks for sharing.

[ - ]
Reply by uronrosaJune 28, 2021

There could be one third...security. We always forget it

[ - ]
Reply by jms_nhJune 28, 2021

What do you mean by security?

[ - ]
Reply by uronrosaJune 28, 2021

Security is when you are in a beach from the moon drinking a fresh bear and you Know that noone will change your bear for something.

That is security...jejeje

[ - ]
Reply by jms_nhJune 28, 2021

Sorry, I don't understand how security relates to CRCs.

[ - ]
Reply by uronrosaJune 28, 2021

Sorry It was a Joke.

Whit CRC you can test and repair data from EEPROM. If one hacker change the data from de EEPROM you could repair them. If dont you cant.

You could ask me about how. And I'd answer like my grandfather..

The only impossible is that you'll are in a beach from the moon....


Thanks and sorry 

[ - ]
Reply by tcfkatJune 28, 2021

With CRC you can:

test: yes

repair: no! You need redundancy to repair data ...

[ - ]
Reply by uronrosaJune 28, 2021

I'll see it. I think remember that was possible. Maxmaxfield havent sponken about that point.

Repass is good always. Anyway I'll see it.

Thank you very much

[ - ]
Reply by uronrosaJune 28, 2021

I just rapid take a look and the solution was Hamming code. Not CRC

a thousand sorries.

[ - ]
Reply by M65C02AJune 28, 2021

Another approach with similar error detection capabilities to CRCs, but which doesn't require bit-level manipulations and bit shifting, is the Fletcher's Checksum.

I've employed this approach in a couple of projects in order to avoid programmer unfamiliarity with those bit-level operations. Another reason to consider it is that certain programming languages don't provide the necessary operators to easily implement CRCs: Visual Basic.

Regardless, error detection of any critical data, critical for the mission / application, should be provided. The available devices are sufficiently large enough, that consideration should be given to whether error correction using Hamming codes or something as simple as Triple Modular Redundancy (TMR) should be used as well.

[ - ]
Reply by MaxMaxfieldJune 28, 2021
"Another approach [...] is the Fletcher's Checksum" I've never heard of that -- now I'm excited to learn something new -- I will Google that as soon as a get a free moment -- thanks for sharing -- Max
[ - ]
Reply by mrfirmwareJune 28, 2021

It's not that EEPROM is unreliable it's that you cannot guarantee that your writes will be coherent WRT the data structures you are writing out to the EEPROM. If I have a struct like this:

struct Entry {
   uint32_t version;
   uint32_t length;
   uint32_t length_complement;
   uint32_t crc;
   uint8_t  data[]; // Flexible array member
};

static uint8 s_buffer[4 * sizeof(uint32_t) + 32 * 1024)];
struct Entry *s_entry = (struct Entry *) s_buffer;

s_entry->version = 0x100;
s_entry->length = 0x8000;
s_entry->length_complement = ~s_length;
memset(s_entry->data, 'D', s_entry->length);
s_entry->crc = calc_crc(s_entry);

eeprom_write(<some-offset>, s_buffer, sizeof buffer);

And the erase page size is < 32k then eeprom_write() will need to erase and program more than one page to complete the write coherently. What if the CPU resets due to a brown-out condition or a poorly time user reset while writing these pages? The CRC will not match on read-back if you use the CRC and you'll know the entry is junk. If you assume things are fine you could be processing junk.

[ - ]
Reply by MaxMaxfieldJune 28, 2021

Hi Mr. Firmware -- this is great feedback -- thanks so much for taking the time to share. One thing I'm going to note in my column is that my project is a simple clock, but the reason for getting this right is to prepare us for creating mission-critical systems in the future.

[ - ]
Reply by mrfirmwareJune 28, 2021

You bet. Be sure to specify a C ISO standard, e.g. "C11" and then stick to it. Don't go back to the dark pre-C99 days.

[ - ]
Reply by MaxMaxfieldJune 28, 2021
"Don't go back to the dark pre-C99 days." But they were so exciting -- so much was unknown -- porting a program was full of mystery and adventure LOL
[ - ]
Reply by jms_nhJune 28, 2021

jmford94 gave you the right answer already...

... but obligatory links:

The CRC Wild Goose Chase: PPP Does What?!?!?!

https://www.embeddedrelated.com/showarticle/669.php 


Linear Feedback Shift Registers for the Uninitiated, Part XV: Error Detection and Correction

https://www.embeddedrelated.com/showarticle/1180.php

[ - ]
Reply by MaxMaxfieldJune 28, 2021

Awesome -- thanks as always for sharing -- Max

[ - ]
Reply by MatthewEshlemanJune 28, 2021

jmford94 and others have given the primary reasons. Other items/raw notes not explicitly noted:

  • Using a CRC or other checksum (I personally use one per logical grouping of settings) also helps the software determine, during boot up, if the EEPROM or Flash is "raw" and must be initialized with default values.
  • Always create a spreadsheet for the product that estimates the lifespan. Flash/EEPROM are typically the one piece of electronics in the product where the behavior of the user and/or software will impact the life of the product. I worked at Toshiba for over a decade, and they were rather religious about this. Every product had to have a written approval for EEPROM/Flash wear, with estimates based on software usage, behavior, etc. Toshiba required at least 7 years for our standard consumer products.
  • If a particular NVM setting will be modified frequently, and is therefore negatively impacting the estimated product lifespan, it is now time to introduce banks for that setting, for wear leveling. This is why I always have each logical setting or related group of settings abstracted behind a class or API. The API may be doing nothing more than reading/writing to a single location in EEPROM. Or... it may be searching an assigned set of banks for the correct bank to read, or write, to help with wear leveling. EEPROM lifespans generally make it easier than Flash, and of course Flash typically involves banks with minimum erase sizes, etc.
  • The CRC will also prove to be useful if you need banks for wear-leveling.
  • Many microcontrollers will have a CRC HW function. I personally will generally just use whatever the HW provides for my CRC.

Hope that helps!

Best regards,

Matthew

https://covemountainsoftware.com/services/


[ - ]
Reply by MaxMaxfieldJune 28, 2021

Awesome feedback as always -- thanks so much Mathew -- Max