So here's my problem. I've got a 16 bit SRAM connected to a microprocessor through an FPGA. This is for a manned lunar mission, so it needs to work right 100% of the time. The processor has a built in EDAC scheme, which uses 6 additional bits. This scheme detects and corrects all single bit errors, and detects all two bit errors. It can detect all odd numbers of bit errors. The problem is that it misses about 3% of 4, 6, etc., bit errors.
Now, some will say that the odds of a 4 bit error are very small - I'm talking about errors induced by radiation - a single event upset (SEU). As a charged particle passes through the SRAM, it sucks charge along with it, and if it hits just right, it's possible to affect multiple bits. The probability of this is actually very low, BUT - I'm not a fan of statistics when it comes to the life of the crew. So I'd like to get 100% coverage up to as many bits as possible.
Now, in order to accommodate the extra 6 bits in the EDAC scheme, I use a 32 bit wide SRAM. 16 + 6 = 22, so I have 10 bits left over to play with. I'm looking at implementing some sort of secondary EDAC scheme in the FPGA to catch the errors that the first scheme misses.
I only have a clock or two to do the calculations, so it can't be any fancy convolutional matrix operation, but I can do any kind of parity checking.
So - 16 bits of data, 10 bits of check code. Who knows a good algorithm for multiple bit error correction? Or better, who can point me to some basic principles that I can use to create my own?
Are you able to use a redundant storage scheme that ensures that the redundant image(s) are located in different physical locations on the silicon? This way any single event that along a straight line whacked multiple bits in one of the storage locations would be likely to miss the data stored in a different location on the chip. It might require knowledge and planning with regard to the memory array construction but it would seem plausible. Alternately, if you can't afford the redundancy, can you come up with a scheme, possibly using multiple memory chips, whereby you distribute the bits comprising one word between numerous chips and maybe not even in the same place in the different chips so that the data word is effectively spread out in a two or a three dimensional matrix? If any of this works for you please let me know!
The hardest thing to overcome, is not knowing that you don't know.
That was plan A - storing three copies (or four) of each 16 bit word in two locations in my 32 bit SRAM. But unfortunately, the timing for two SRAM accesses doesn't work out with our processor - it's still an option though, if I can't come up with anything else.
Multiple chips is an option that other projects have used - even using 16 different x1 chips, but we just don't have the room for multiple chips. That's why I'm keen to make use of the capability (the extra 10 bits) that I already have in my SRAM chip.
wiki hamming codes to get you started. its helped me before.
since you have double the width of the original word then you can scale up the existing scheme greatly, construct a hamming method capable of detecting and correcting 15 bit errors.
it could take up a good lump of FPGA though and will not detect a 16 bit error.
taken (stolen, borrowed) from the wiki page on edac
Hamming distance based checks
If we want to detect d bit errors in an n bit word we can map every n bit word into a bigger n+d+1 bit word so that the minimum Hamming distance between each valid mapping is d+1. This way, if one receives a n+d+1 word that doesn't match any word in the mapping (with a Hamming distance x <= d+1 from any word in the mapping) it can successfully detect it as an errored word. Even more, d
or fewer errors will never transform a valid word into another, because
the Hamming distance between each valid word is at least d+1, and such errors only lead to invalid words that are detected correctly. Given a stream of m*n bits, we can detect x <= d bit errors successfully using the above method on every n bit word. In fact, we can detect a maximum of m*d errors if every n word is transmitted with maximum d errors.
so with a 16 bit word, you get a d of 15 and one heap of error correction. this would require removal of he original 6 bit scheme and replacing it with this though.
ps I'm a tech not a mathematician .. i'm probably wrong. don't blame me if astronauts explode.