For most of us, software bugs are the annoying little things that we encounter in the form of small errors, like misaligned text or a clipped image. However, in rare situations, these small bugs can have massive ramifications. In this post, I will be discussing three such events: the Therac-25, the Mars Spirit Rover, and the June 2021 Fastly malfunction.
Therac-25The Therac-25 is a radiation therapy machine that was made in the 1980s. A later generation of the Therac machines, the Therac-25 was unique in that it was the first to only use software, not hardware, for its safety controls. While this seemed like an innovative idea at conception, the engineers quickly learned that it could result in catastrophic errors.
The Therac-25 operated in two modes: low power mode and high power mode. Low power mode used an electron beam that didn’t penetrate too deeply, making it perfect for treating skin cancers. High power mode used x-rays to penetrate more deeply and treat bone or lung cancers. The low and high power modes were beneficial for hospitals because doctors could now use one machine to treat multiple cancers, rather than needing to buy a new machine for each treatment type.
However, between 1985 and 1987, a malfunction with the low and high power modes, coupled with the lack of hardware safety controls, caused at least six people to die from radiation overdoses.
CausesThe radiation overdoses were caused by a myriad of factors. On the software development side, there were three main issues that, when combined, resulted in the malfunction:
Six people suffered from radiation overdose before the problems with the Therac-25 were caught. In addition to the three software issues identified above, the commission to investigate the deaths found the following problems:
The Therac-25 is an extreme example of how small errors, when combined, can cause devasting events.
Mars Spirit RoverSoftware bugs seem to be commonplace in space exploration. For example, the Mars Curiosity Rover also had a software malfunction during its time on Mars. However, for this post, I am going to focus on the software bug experienced by the engineers of the Mars Spirit Rover.
The Mars Spirit Rover landed on Mars on January 4, 2004. On January 21, 2004, the engineers discovered that they could no longer communicate with the rover because it was no longer entering sleep mode. Instead, it was stuck in a pattern of continually rebooting itself again and again.
The engineers received intermittent pings from the rover so that they knew it was still alive, but that it was in danger of wasting its precious battery reserves and overheating. They knew there was an issue in the software or hardware, but they were unsure what the issue might be.
Crunched for time, the engineers predicted that the rover was suffering from a problem with the flash memory. Without finding the root cause, they bypassed the flash memory during a reboot and were able to solve the issue temporarily.
CausesThe engineers performed a more intensive investigation once the rover was stable and discovered the root cause of the issue. The issue was created by three smaller compounding components:
The engineers eventually deleted some unused files (e.g. landing sequence) to reclaim space in the memory. Once the space was reclaimed, they were able to remotely install a file monitor system that permanently resolved the memory issues. These fixes lasted until the mission finally ended in 2010.
FastlyFastly is a CDN that enables companies to cache requests closer to the request server. While Fastly is not a company that many people know, it is nonetheless utilized by many large companies to enhance their users’ web experiences.
In June 2021, a large section of the internet went down thanks to a software bug at Fastly. In mid-May, there was a software deployment with a bug that was undetected. In June, a customer uploaded a valid configuration change that inadvertently activated the software bug. The software bug, though small enough to go undetected, nonetheless instantly wiped out 85% of the Fastly network.
Though the cause was detected within an hour and patched later that day, the damage to the internet was already done. While not much more is known about the Fastly software bug, this is nonetheless an example of how much the internet can be impacted by relatively unknown companies.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4