Why Teslas Are Dying, And How To Design A Product To Last Decades
The most expensive computers you can buy today — a Tesla — are showing signs of degradation. While the cars themselves can last a decade, the computer inside cannot. Now, with years of use behind them, the memory cells in the main computer cannot store data anymore, killing the car.
Several independent Tesla shops have reported a severe problem with older Tesla MCUs, the computer that is the brains of the entire car. These computers run on Linux, and by default, a lot of data is logged to the internal Flash. This data isn’t really needed outside a development environment, and is rarely downloaded by Tesla. However, because so much data is collected, the Flash chips in these computers are simply wearing out.
The chip in question, an 8GB e-MMC Flash chip on an nVidia Tegra single board computer, stores all of the firmware for the car and all of the logging data. At release, the firmware didn’t quite fill all of the chip. But with new features constantly being added to the firmware, that free space is shrinking. This leaves less space for the logs, which means those particular memory cells are used more often. Given enough time those memory cells wear out eventually resulting in a broken chip.
What the datasheets actually say
Taking a look at the datasheet of a standard e-MMC Flash chip, each individual memory cell in the chip is limited to about 3,000 write/erase cycles. This means the memory on this chip can be filled up and erased, then written to again, three thousand times before data cannot be reliably written to the chip anymore. Other e-MMC chips have lifetime similar lifetime ranges, but very few of them offer lifetimes above 10,000 write/erase cycles.
While this shouldn’t be an immediate problem for a 10-year-old car that’s only storing firmware and firmware updates, Tesla is famous for an incredible amount of logging data collected from every car. With the first Tesla Model Ss introduced in the last half of 2012, we’re quickly reaching the point where the read/write cycles of the e-MMC Flash are exceeding manufacturer’s recommendations.
How every other device solves broken e-MMC
All Flash memory has a limited number of write/erase cycles. From an SD card in your phone to your USB thumb drive, to the storage in your smart phone, to the SSD in your desktop uses, they’ll all go bad eventually, the only question is when.
While a single memory cell in a Flash chip may be put out of commission, there is a technique that can help alleviate the problem. Modern Flash memory uses a process called wear leveling, where the memory controller keeps track of bad blocks and pages of memory, and refuses to write to those sections anymore. Data is always being shuffled around by the memory controller, and with wear leveling, the controller increases the longevity of the Flash. There’s always a limit, though, and Tesla’s computers hit that limit.
Why this is vital to your hardware project
When designing any embedded project, you must consider the entire lifetime of the product. Will this product be used in ten or twenty years? If so, you’ll need to design with the lifetime of the part in mind. Flash memory only has so many read/erase cycles, you can only hit a button so many times, and the reason we don’t use USB mini connectors anymore is because the ports had a relatively short lifetime.
While the problem with Tesla’s MCU isn’t much of a problem anymore, it is a fantastic example of why you need to think about the lifetime of your components.