Avoiding software defects in safety critical systems
In embedded systems, safety relies on the integrity of code. In our monthly safety and security interview with Andrew Girson, Co-Founder and CEO of embedded consulting firm Barr Group, we discuss the potential ramifications of employing poorly written software and how to avoid them using testing methodologies and coding standards.
1. The most common vulnerabilities in embedded devices are the result of poorly written or buggy firmware or application code. Why is this such a problem, and what types of vulnerabilities can it expose?
GIRSON: Buggy software is volatile software. Such software can behave in unpredictable ways, and testing may not reveal latent defects that will only appear under a very narrow set of circumstances. So, herein lies the issue – as the developer, you need to plug all the holes, but the rogue hacker only needs to find one. Furthermore, as many embedded software systems are developed in unmanaged programming languages, such as C, the potential for defects is greater.
Consider the case of buffer overflow errors. Software that incorrectly manages a data memory buffer can overwrite data past the defined buffer boundaries. Even without a malicious actor, this is a bug that can cause system errors. For a hacker, this is also an opportunity to overwrite data (and perhaps even code). There are debugging and reverse engineering tools that are widely available that allow engineers to examine and experiment with values placed in certain memory locations. A knowledgeable hacker who understands the types of information stored past a poorly managed buffer’s boundary can implement techniques to ensure values get placed that will cause the device to act a certain way. For example, hackers could directly or indirectly facilitate the execution of their own malicious code.
Now, there are techniques that embedded systems developers can use to mitigate the possibility of buffer overflow errors being exploited. However, these techniques often require a more highly managed memory system (such as virtual memory and operating system task/memory management). Yet, how many lower-end embedded systems based on 8- and 16-bit processors can afford the cost and power consumption involved in multi-tasking and managed memory systems? Extrapolate this out to the many types of bugs that can occur and one can see that embedded systems developers have unique problems due to the limited environments in which their designs operate. These unique circumstances heighten the importance of catching bugs early.
2. For safety critical systems, what test methodologies must be used, and which aren't being utilized enough (static/dynamic analysis, peer, etc.)? What are the benefits each provides?
GIRSON: As we have discussed previously in this space, too many software development teams are not using static analysis and code reviews in a meaningful manner – especially as it relates to safety critical embedded devices that can kill or injure. While no process can eliminate all potential defects, these techniques have been well documented to improve code quality and reduce defect rates. There are many tools available at many price points that facilitate the use of these techniques.
Regression testing is one other area that Barr Group’s recent Embedded Systems Safety & Security Survey noted as needing more attention. Regression testing is very important, especially for safety-critical systems. Such testing generally identifies a broad array of misbehaviors, including old bugs that presumably already have been fixed. This ensures that as new features are added, defects previously discovered do not come back. Yet, survey results showed that of those developing safety-critical devices, 41% were not utilizing regression testing.
3. Can coding standards help? Which ones, and what sort of investment do these require across a development project/team to ensure compliance?
GIRSON: Coding standards are a valuable part of a well-implemented software development process. However, coding standards must be enforced. Some organizations document the use of coding standards in their process without implementing code reviews and static analysis to check and confirm their usage. It is important to both require the use of coding standards for your code and enforce their use.
Many coding standards exist. The most popular types of coding standards appear to be proprietary standards developed by in-house software development teams. Such standards are fine if created with inputs from multiple engineers and others associated with quality/reliability. As far as publicly available standards, CERT has a standard for C and other languages and is focused on creating secure software. MISRA has a standard originally developed for automotive applications that is broadly applicable and was recently updated for security considerations. Barr Group's Embedded C Coding Standard is a bug-reducing set of guidelines that is easy to follow and enforce, and is also compatible with MISRA's subset of the language.
At the end of the day, any reasonably designed coding standard will provide value in terms of quality, reliability, and maintainability for a code base if it is used and enforced through other processes.