Datalogger for an embedded system
Started by 6 years ago●21 replies●latest reply 2 years ago●1301 viewsHello all,
I have been facing following problem. I have implemented a datalogger i.e. software running on a MCU which creates periodic snapshots of a system state (values of some quantities of the system). These snapshots are stored in a circular buffer in external flash memory. This circular buffer causes that the oldest snapshots are overwritten by the newest ones which is desired behavior. Another requirement is that the datalogger has to work correctly also in case the MCU is restarted. Due to this fact there needs to be a mechanism how to recognize where in the external flash memory the last snapshot has been stored.
I have decided to fullfill this requirement in following way. The external flash memory is divided into sectors which consist of 512 bytes. So I collect several snapshots into one block and I append so called header at the beginning of this block. At beginning I have decided to use only 8 sectors of the external flash so the header contains block number which goes through values 1,2,…9. This algoritm works pretty well but only for small sizes of circular buffer. In case I have extended size of the circular buffer into several thousands of blocks (the external flash memory contains \( 2^{23} \) sectors) I have found that the time needed for founding sector for continuing in storing of the snapshots after MCU reset is unacceptably long. So I have been looking for another solution how to find sector for continuing in snapshot storage after MCU reset in quicker manner.
Does anybody have any idea? Thanks for any suggestions.
#datalogger #flash
Hi Steven,
If you are going to use a single sector or a couple of sectors, to store a reference to where to write next, you are going to have high wear rate on those sectors. Wear is caused by erasing, not by writing. (because you cannot overwrite without erasing first.) I suggest the following approach would be of value.
The erased value of a flash byte is 0xFF. You write to it by writing bits to zero value. You can "over write" the same byte 8 times without erasing by writing one bit at a time to ZERO. You can extend this idea to a whole sector. writing just one bit at a time to zero. You can over write the same sector by sector size times 8 bits.
So, if you divide your flash size by the sector size, in your case, it sounds like 512 bytes * 8 bits = 4096 bits. In other words, divide your flash size by 4096 and only write one bit to zero every time you progress past this incremental size. Then, on restart you look up where you are in the "bit index sector", and you only have to search from that point onwards for your write point as you are doing now. The benefit of this aproach is that you only erase the bit lookup sector once for every cycle the flash fills up. This means that the wear level of this bit lookup sector is exactly the same as the rest of the flash. This approach is also safe against power failure during the process of sector writes and updating the bit lookup sector. Just update the sector write first. If you power fail between the two. You are just going to search a bit longer.
Regards
Johan Hartman
Hello Johan,
thank you for your answer. I am not sure whether I understand your interesting idea. Do you mean that I should allocate several sectors which will create a large bit array? Each bit in this large bit array will correspond to one sector in the main buffer for the datalogs. Whenever I write into the main buffer I will zero one bit in the bit array. Next free sector in the main buffer can be determined based on the number of zero bits in the large bits array. Is my understanding correct?
Hi Steven,
I think you get the gist of it.
The idea is to reduce your search space at startup with the zero bits from the bit array from a single sector, or a couple of sectors.
Lets say your flash is 2^24 bytes or 16 megabytes.
A 512 byte sector is 2^9 and 8 bits per byte is 2^3, giving you 2^12 or 4096 bits. 4096 bits representing 512byte sectors means: you can divide 2^24 = 16 777 215 / 4096 = 4096 or 2^12. This repesents 8 * 512byte sectors. So, in this case using a bit array in a single sector means you can look up the offset and only have to search 8 sectors for your current write position at startup.
OR, to do it your way. Have one bit represent each sector of 512 bytes. This means you have 2^15 sectors, or 32767 sectors. With 4096 bits available in a sector this means 8 sectors gives you enough bits to do a direct lookup in the bit table for which sector you are currently at.
So, you can trade sectors for start up time.
The trick is to realise that you do not have to erase a sector in flash if you only want to write bits from state "1" to state "0". You can basically overwrite and get what you want.
Hope this helps to explain things better.
Regards
Johan.
Hello Johan,
my English is not very good. So I have attempted to express my current understanding via the below given picture.
As far as I understand correctly your clever idea it assumes the flash organization according to the picture i.e. one sector for the bits array, eight sectors for auxiliary circular buffer and the rest for the main circular buffer for the datalogger.
The bits array sector and sectors creating the auxiliary buffer contains all 0xFF before first usage. As time goes and datalogger starts working the sectors in the main buffer are gradually filled. As soon as one sector in the main buffer is filled then one bit in the first sector in the aux. buffer is cleared. Whenever whole first sector in the aux. buffer is cleared one bit in the bits array is cleared.
Is my understanding correct?
Another trick.
Have a ram copy of the main sector header and the current sector header + data (ie a buffer for the entire sector).
On the fresh init (never been booted) write your sector 0 map and your sector 1 header to the ram version. Once everything is written to the ram copy, flush the RAM to the FLASH.
When you boot, read sector 0 into its ram buffer to assure it was inited && find the starting sector to write to. Read the starting sector into its ram buffer.
Make the changes as needed to the RAM versions, then flush to the flash. This method is called a "write-thru cache". It minimizes the writes and the time to write.
Depending on the type of flash (eg can you write a byte at a time, or the entire sector), you can treat your sector buffer as a byte array. Run thru the ram buffer as a byte array, reading to corresponding byte out of the flash. If not ==, write the ram byte to the flash byte.
Note: it may be faster to just write the entire ram buffer to the flash. Look at the write times. Also instrument and actually measure the timing.
Keep a "dirty bit". If you write something to the ram buffer, set the dirty bit (really just a binary flag). If you are using a command loop, at he end of the loop you know if you need to write the ram to flash.
Perhaps an example would help.
Sector 0 is your "head sector map". sectors 1-8 are your data sectors.
Init is writing 0xFE (1111 1110) to sector 0, and writing a header of some sort to sector 1. (If you are going to store more than one data packet per sector, the sector header might also have the same type of bit field for the next free (or last filled) data block)
You are ready to write a data block. sector 0 == 0xFE, which maps to sector 1 (this is a bit field, not a count). Read the header in sector 1, determine where you want to write the data block, write it and update the sector header.
You fill sector 1. Write 0xFC (1111 1100) to sector 0.
Next data block, you read sector 0 == 0xFC == use sector 2. Write a "fresh" sector header to sector 2. Rinse, repeat.
If it helps, do a 1's complement on the bit fields when you want to use them. In this case, the sector bitmap == 0xFE (1111 1110) => 0x01 (0000 0001).
Now, you wrap the entire flash. You need to re-init the sector 0 bitmap, and the sector 1 header.
When you read the flash (dump the log), you know if a sector has data because the sector header looks correct - use some majic number as the header id (something like 0xDEAD or xBEEF or xABBA (depending on musical taste)). Also, init the sector header with some sort of timestamp to help keep things in order (you need this anyway for a data logger).
Now, as pointed out be jeghartman, you can use a byte array as the map. It is fast to scan that array (which should have a mirror copy in ram) to find the starting sector.
Hi Steven.
English is not my mother tongue either, so no problem.
You understnad correctly: You can do it the way you describe in you last paragrph.
But you really just need the either the bits array of size one sector. In this case you write one additional bit to zero for every 8 sectors in the main circular buffer that fills up. In this case on restart you use the number of zero bits times 8 sectors = 4K as the start of your search for the write pointer. And you need to search up to the following 8 sectors to find it.
OR, you use a bits array of 8 sectors in size. In which case the number of zero bits gives the exact sector in which you need to search your write pointer.
For your implementation. Remember that you can compare 8bits at once on an 8 bit processor, or 32 bits at once on a 32bit processor. Remember that divide by two is the same as "shift right by one" and multiply by two is the same as "shift left by one".
To search the bits array for the current point or byte where the zero bits end and the one bits begin. You do not have to do a linear search. You can take "total bit array size" (in bytes) and divide it by two to get an index into the array. Compare the byte at this point as follows:
if the value at this index is == to 0x00, then the write point lay above this index.
if the value at this index is == to 0xFF, then the write point lay below this index.
else, you are at the byte containing the 0bit to 1 bit change.
Then take the answer of the above comparisons, the the new half size block and do the process again, find the middle byte of this new reduced size block and compare its value as above.
Repeat as necessary until to arrive at the "else" condition.
The above is similar to the Newton Rapson method of finding the roots (zero crossing) of a function. See
https://en.wikipedia.org/wiki/Newton%27s_method
if you do not understand my description.
This is the fastest way of scanning the bits array at startup to find your write index.
Regards
Johan Hartman
Hi, It's an old topic but interesting.
I planned to implement something similar on the internal flash of the LPC5516 MCU, but can't fully agree with this statement: Wear is caused by erasing, not by writing.
Both write and erase operations can contribute to flash memory wear.
During a write operation, electrical charges are injected into the floating gate of the memory cell, and this process can cause damage to the oxide layer that isolates the floating gate from the rest of the circuit. Over time, repeated write operations can cause the oxide layer to break down, leading to data errors and eventually cell failure.
However, erase operations generally cause more wear than write operations because erasing involves applying higher voltages and can result in more stress on the memory cell. As a result, most flash memory wear is caused by erase operations.
So with this approach, the bit map sector should still wear quicker compared to others.
hello Steven,
i don't know if i have understood your current design correctly..
so could be that what i describe is exactly as you do it -
just out of my head i would do the following -
the layout in your flash:
use_next | slot_1 | slot_2 | slot_3 | slot_4 | slot_5 | ... | slot_n |
slot_3 | data_1 | data_2 | data_3 | data_4 | data_5 | ... | data_n |
in the first position of your flash you store where you want to write the next time.
this way you just need to look at exactly one known address in the flash to know where to write your data... i can't imaging a faster way..
the sizes of your 'use_next' field has to fit your highest possible 'slot count'.
the sizes of your 'slot_n' fields has to fit your content..
if you can use the sectors or blocks or something else your flash is divided into - or a size you can read / write in one go that makes it faster..
if what i have described is your current system i don't know what should take longer with higher slot counts...
i imagine you have to 'write' a command to your flash to set the pointer for the next data write..
but this should not really be a timing problem for later addresses...
eventually it would make sense to define what means 'unacceptably long' - are you speaking of minutes / seconds / milliseconds / microseconds ?? ;-)
sunny greetings
stefan
Hello stefan,
thank you for your answer. My current implementation does not directly store the next used sector. The next used sector is determined after MCU reset via going through the whole circular buffer in the external flash and looking for unexpected "jump" in header numbers.
As far as I understand your idea correctly you suggest to use one sector in the external flash memory for storage of the next used sector. So whenever I need to store another block of data I will read the next sector from this location. Is it correct?
Hi Steven,
I wrote essentially the same flash logger as you describe not that long ago. The total number of circular buffer slots was not large, so a linear search for max sequence number coupled with checksum validation was sufficient to figure out where to start writing on power-up.
For a large number of buffer slots, a binary search style algorithm may help you locate the jump position in an acceptably small amount of time. Consider the following example with 16 slots in the circular buffer and these header numbers in slots 0 through 15:
16 17 18 19 20 21 22 23 24 25 26 27 28 13 14 15
By looking at first/last header numbers in slots 0 and 7 (16, 23) you can easily determine that there is no unexpected jump in header numbers, so you look at first/last header numbers in slots 8 and 15 (24, 15) and you know there is a jump in the second half of the circular buffer.
Now look at header numbers in slots 8 and 11 (24, 27) and there is no jump, so look at header numbers in slots 12 and 15 (28, 15) and there is a jump.
Continue until you get down to two slots, or to an acceptably small number of slots and linearly scan those slots for the jump. This should give you close to O(logn) performance instead of O(n). For 4096 blocks I think you'll get worst case performance similar to a linear scan of 44 blocks: 2^12 = 4096, so 11 splits to get down to 2 slots, you'll have to look at two slot headers (first/last) in each split, and worst case you have to look at both halves of every split. If there is no jump in either split, then you either have a jump between the splits or you have perfectly ordered header numbers (start at slot 0).
This may be easier to implement if the number of buffer slots is a power of two, but that is not a requirement. This has the advantage of being generally compatible with what you are doing now, involving only a change to the start-up search algorithm. You don't have to add storage and handling for a separate master index, and you don't have to update an index with each data update.
If you are going to go with a master index approach, Johan Hartman's scheme is very clever in that it avoids wear issues with the index information and is not sensitive to unexpected loss of power. This master index scheme is no doubt going to give you faster start-up search performance than what I have suggested, particularly if you binary search the bit index data for the 1/0 boundary.
Hello Mathew,
thank you very much for your answer. I like your idea. I have prepared a series of header numbers for case my buffer consists of 8 sectors (header numbers go through 1,2,...8,9) and filling starts from erased flash:
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 2 0 0 0 0 0 0
1 2 3 0 0 0 0 0
1 2 3 4 0 0 0 0
1 2 3 4 5 0 0 0
1 2 3 4 5 6 0 0
1 2 3 4 5 6 7 0
1 2 3 4 5 6 7 8
9 2 3 4 5 6 7 8
9 1 3 4 5 6 7 8
9 1 2 4 5 6 7 8
9 1 2 3 5 6 7 8
9 1 2 3 4 6 7 8
9 1 2 3 4 5 7 8
9 1 2 3 4 5 6 8
9 1 2 3 4 5 6 7
8 1 2 3 4 5 6 7
8 9 2 3 4 5 6 7
8 9 1 3 4 5 6 7
8 9 1 2 4 5 6 7
8 9 1 2 3 5 6 7
8 9 1 2 3 4 6 7
In this case the jump from 9 to 1 in the last row is a "misleading" jump caused by correct wraparound. The correct jump which I have been searching for is the jump from 4 to 6. In my opinion I am not able to distinguish between the misleading and the correct jump by the algorithm you have described or I have missed some important idea?
Interesting. Not knowing any better, I had assumed that you would be looking for the wrap-around boundary associated with the most recently written header so you could continue writing from there.
I think the algorithm can be made to work with the numbering scheme you have but please see my separate follow-up comment below, I think there is a tweak to your numbering scheme that eliminates the skip, leaving only the wrap-around boundary.
With the existing numbering scheme you will find both the wrap-around boundary and the skip that you care about, and will have to modify the algorithm to ignore the wrap-around unless it coincides with the skip. When you look at the first/last header number and you have N slots with last == first + N - 1, you know you have a sequential block of header numbers. If that test fails, then I think you have two cases. 1) first < last which should indicate the skip you care about, particularly if last == first + N. 2) first > last in which case you definitely have a wrap-around boundary and you may also have a skip, so you'll have to split and analyze. If you have cases other than wrap-around and (N, N+2) skip, then you'll have to handle them in your first/last test and analyze logic.
This will add some overhead to the search, but you should still be O(logn) and run much faster in the worst case than brute force linear search.
If you know for a fact that there can be 1 and only 1 skip and the skip always appears as (N, N+2), then it seems that when you see the first/last relationship last == first + N you know this contains the skip with no wrap-around and can ignore the other half of the split.
There are going to be other cases as well, for example:
8 9 2 3 4 5 6 7
Here the wrap-around is the skip of interest. When you see first/last = 8/3 you get two splits with first/last 8/9 and 2/3, and the skip is between the last two splits.
You also have this case:
9 1 3 4 5 6 7 8
When you see first/last = 9/4 you get two splits with first/last 9/1 and 3/4. Again, the skip is between the last two splits. I think these special cases are to a certain extent an artifact of the way you are handling header numbers.
One thought on header numbers, I think there is a way to eliminate your "skip" case so you only have to detect a non-sequential header number boundary. Lets assume you have N total slots where N is a power of two. Let your header numbers run from 0 to 2N-1, and let your header numbering resume where it left off instead of starting again at header number 1. Suppose N=8 and header numbers run from 0 to 15. You might have:
9 10 11 4 5 6 7 8
You find the 11/4 boundary and after the next update you have:
9 10 11 12 5 6 7 8
Lets suppose you then have 6 more updates:
1 2 11 12 13 14 15 0
You now have two boundaries, 2/11 and 15/0. Here you would recognize that the 15/0 boundary is a normal sequence and ignore it. If you have a split of m headers (say first/last is 13/0 with m=4) and last < first, then you can test to see if last+2N == first+m-1. This basically adjusts for the modulo 2N wrap-around when testing for sequential header numbers, and should work for any split containing a 15/0 boundary. So you have two cases for sequential header numbers test:
1) first < last: last == first + m - 1
2) first > last: last + 2N == first + m - 1
If you don't see those cases, you have a non-sequential boundary that you care about. Note that you still have to detect and handle the case where the boundary lies between two splits, for example between 1/2 and 11/12 in the "after 6 updates" case above.
Since header numbering starts at 0, you might consider using an all-1s header number as an initial value for empty slots. I think the written/empty slot boundary may naturally fall out of the search algorithm.
I have been still thinking about the binary search you have suggested. One idea that I have is to use the binary search several times. During first usage I will determine the wrap-around position and based on this information I will divide the sequence of headers into two subsequences for another usage of the binary search. In another usage of the binary search I will determine position of the jump which I am looking for. What do you think about this idea?
I think something like that would work. The issue I see looking at your "skip" examples in an earlier post is that you've got many special cases to consider due to the relative skip vs. wrap-around proximity and last header number relative to fixed re-start value of 1. These will complicate the search algorithm and affect performance. It will take longer to code and test, and adds the possibility of introducing an obscure bug due to an unanticipated special case.
The solution becomes much easier if you adjust your header numbering to re-start from where it left off rather than re-starting at 1, using 2N header numbers (0 to 2N-1) for N buffer slots with wrap-around from 2N-1 back to 0. This eliminates the header number skip entirely, you find the non-sequential header number boundary and resume numbering from there. This boundary is unambiguous and is easily distinguished from the wrap-around boundary.
The search algorithm is a LOT cleaner. If N is a power of 2 you will wind up with one of three results:
1) the non-sequential boundary is at end/start (start writing at slot 0, continuing from the header number in the last slot)
4 5 6 7 8 9 10 11 (last == first + m - 1, with m = N = 8)
11 12 13 14 15 0 1 2 (last + 2N == first + m - 1, with m = N = 8)
2) you work down to a 2-slot split with a non-sequential boundary
9 10 11 4 5 6 7 8
3) you work down to two splits with no non-sequential boundary in either one, implying the boundary is between the two splits
1 2 11 12 13 14 15 0
11 12 13 14 7 8 9 10
If N is a power of 2 the search algorithm is dead easy. Test the slot list initially for case 1) before running the search. Then run the search until you're left with a 2-slot split containing the boundary or a split pair with no boundary in either.
If your total slot count N is not a power of two, then things work pretty much the same way except that you may wind up with a 2-slot or 3-slot split containing the non-sequential boundary.
This header numbering scheme feels to me like a much better answer. It eliminates the skip, and it gives you an unambiguous non-sequential boundary that can be located with a fairly straightforward binary search algorithm.
Have you got EEPROM area in the MCU or an external RTC with NVRAM? You could use them to store the actual sector address periodically and speed up searching.
Peter
I'll agree with S-Light but with one tweak: instead of having one "use_next" location always used for next data storage address, one sector is used as a rolling buffer of them. I'll call the buffers "index" and "data", where "index" is a buffer of pointer addresses into "data".
The data block gets a header added. It's the ordinal of the data entry - if 16 bit it'd be 0-65535. On startup, an initialization routine scans the "index" buffer looking for the highest number - the address associated is "use_next". You save "index", "use_next" , and the address of "index".
Storing the next block is: get "use_next", put data (incrementing "use_next"), get & increment "index", get & increment address of "index", put new "index", append new "use_next"
At design time, determine the "index" limit. It'll be based on data block size and available storage. At "index" limit, do a sector clear of (most flash have a "sector erase"), reset "index" and address of "index", and store the next block.
"index" is the only one to need clear and reset. "data" can just chase its tail forever. It does need sector address management for rollovers.
The only routine that takes time is the initial scan to find the last entry (highest number). Since the result of initialization is current buffer addresses, the management of the buffer is solely for startup, but it distributes the write cycles of the address.
-----------------------------
These can be combined for fixed blocks by appending the ordinal as data block header. Variable block lengths "can" do this, where the block is [ordinal], [block length], [data blocks], but the initialization scan is trickier.
P.S. hope this makes sense - STUPID text fields throw out anything bracketed by Greater Than and Less Than symbols - my standard for denoting labels
Hello CustomSarge,
thank you for your answer. As far as I understand your idea correctly you suggest to use a combination of my approach and approach suggested by stefan. So use an auxiliary circular buffer (of smaller size so the initial searching after MCU reset will not take a long time) containing the next used sector and main buffer containing the datalogs.
The auxiliary circular buffer will be filled in the same mechanism which I use for filling the datalog circular buffer i.e. header and data. The last inserted record (containing the next used sector) into this buffer will be determined also based on unexpected "jump" in header value.
The main buffer will be filled based on next used sector read from the auxiliary buffer after MCU reset which is then only incremented during program runtime. Please can you tell me whether my understanding of your idea is correct?
Verbally describing this kind of stuff is tough. I'll reapproach, this presumes a fixed length data block:
It can be done with separate buffers or all in one. I'll espouse all in one as easier to write / debug.
Block Put: Append the data block with the ordinal of itself (53rd or whatever). Get [DataBuffer] address from ram, put block to buffer (increments address as puts), save new [DataBuffer] address in ram, increment and save new ordinal in ram.
Initialization: get ordinal of 1st buffer entry, get ordinal of next buffer entry, if next .GT. (greater than) 1st, put next as previous, get new next and compare. Loop until next is .LT. previous = you're at the latest.
Address management can go several ways: Easiest is to use a binary integer block size (8, 16, 32 etc). That way sector end and rollover address management is straightforward. Worst case initialization runtime is loop time across buffer size / block size (ie buffer=1024, block=8: 128 loops).
If / when the ordinal is close to rollover, clear the buffer and reset the index to 0. You could also take the latest entry at initialization, stash it, clear buffer, reset index and move the latest block to the buffer base. The best way is task specific: buffer & block size, frequency of block puts and data uploads, etc. Hope that helps <<<)))