Paranoia is not such a bad thing, and there are definitely challenges
to 365 (or 366) days of continuous operation on battery. We develop
implanted medical devices which have analogous demands. Basically, it
HAS to work (and a minimum, it has to fail in a predictable, reliable
way).
We use a combination of the following strategies:
1. All code, calibration data, log data, etc. are stored with CRC
values. This doesn't keep you from failing, but lets you know when
you do fail. You could extend this to storing data redundantly, so if
one fails you can go with another. Or you could integrate codes in
your data to correct single-bit errors. These are well-described in
the literature.
2. Watchdog is employed. It must be serviced by all tasks or else it
will time-out and reset.
3. Accelerated life and stress tests are performed to validate the
flash memory (and other components).
4. Data are stored in RAM and once-per-day written to flash memory.
This is a compromise of saving battery power (flash erases and writes
cost a lot of memory), saving processing time, and safely writing the
memory. If there is a problem in run-time, we at most lose 24-hours
of data.
5. Lots of voltages on the board are self-tested by the microcontroller.
Our experience, even under high-stress environments (high temperature,
lots of erases / writes), is that we've NEVER flipped a bit in flash!
If you really REALLY need a reliable system, you can employee
aerospace / NASA methods of using multiple redundant processors. For
example, have three devices making calculations and if any differ, use
the majority vote.
Good luck.
Stuart
--- In m...@yahoogroups.com, Hardy Griech
wrote:
>
> Stuart_Rubin wrote:
> :
> > I think there has been a lot of good discussion / hints about this
> > matter. Now, I think everyone's interest has been piqued: what
> > exactly is the application and in what environment will it run?
> :
>
> Hey, if you are talking about my application: it is just a boring field
> device which has to run 24h a day, 365 days a year (+1 one for leap
> years), ... Additional difficulty is, that it is battery driven and
> also that I'm perhaps a little bit paranoid ;-)
>
> Hardy
>
------------------------------------

(You need to be a member of msp430 -- send a blank email to msp430-subscribe@yahoogroups.com ) Hi Stuart,
many thanks for your detailed explanation. Yet my intention is to
implement some of your measures:
1. CRC for checksumming. I will also implement marginal read
(MSP430F2xx)
2. watchdog just watching that a task does not take to long, because
the tasks are acyclic it is not possible to watch all which would
be the better approach
3. -
4. that's my intention too, also I will have some pre-power-fail
detection if the battery is being disconnected
5. some voltages are monitored to get an end-of-life detection of the
battery
6. fortunately no aerospace/NASA methods required!
Are you doing additionally cyclic resets of your µC (with keeping the
internal state of course)?
Hardy
------------------------------------

(You need to be a member of msp430 -- send a blank email to msp430-subscribe@yahoogroups.com )