Making static analysis a part of code review

Embedded Computing Design — June 16, 2009

2As static analysis tools have become more sophisticated, their role in the software development process has become a subject of debate. Can a project team use a static analysis tool instead of other, presumably more labor-intensive steps in the normal process of coding, testing, verifying, validating, and ultimately, certifying critical software? The answer is an unequivocal “yes.”

Source code review is a labor-intensive component of disciplined software development, from initial coding through certification. Developers must review code according to high-assurance software development standards such as DO-178B[1] for safety-critical systems or the Common Criteria[2] for high-security systems as well as identify security vulnerabilities such as those identified within the Common Weakness Enumeration (CWE)[3]. 

Code review is good for you, or so they say. The statistics are impressive. There is almost no activity that can find as many defects. Formal code reviews or inspections can find as many as 60 percent of all defects ultimately found in a system[4], whereas more conventional testing regimes often find only 30 percent[5]. Furthermore, these conventional testing regimes inevitably involve a review of the associated code, so they are not independent of code review.

Even informal code reviews, code walkthroughs, and peer code critiques have been measured to detect as many as 50 percent of all defects[6]. These defects are being found earlier, when they are less expensive to fix. The code review process also can help build a collaborative atmosphere and improve the structural integrity of the software by tapping into the collective experience of the reviewers.

Despite all of its recognized benefits, systematic code review is more the exception than the rule at many software organizations. Part of the problem is that the code review process can feel painfully slow and unproductive, and sometimes it is. Given their increasing availability, advanced tools might allow some tedious parts of code review to be automated, thereby making the code review process feel and be more productive.

The big picture

When faced with a relatively unfamiliar piece of code, one of the first things a reviewer needs to do is get a handle on its inputs and outputs (including global data and indirectly accessed objects), any assumptions the code makes about the state of the world, and what side effects are intended or expected. Unfortunately, getting the big picture may require reviewing the code itself, which creates a catch-22 wherein developers cannot complete productive reviews until they have reviewed the code. At best, this creates a need for an iterative process, and at worst it renders the process nearly impossible.

Ideally, the comments, design documents, general knowledge, and naming conventions would allow the reviewer to glean the big picture before delving into the code, but part of a reviewer’s job is checking if the code does what it is supposed to do. If it does not, the big picture may throw the reviewer off track and make the process even slower.

It is difficult for developers to make the review process productive when existing documentation is potentially missing or inaccurate. However, an advanced static analysis tool can provide the big picture by directly analyzing the code as written and identifying the program’s inputs, outputs, assumptions about the environment, and side effects.

Reviewing inputs and outputs

Inputs and outputs are the most fundamental characteristics of any algorithm and, at a deeper level, of the software components implementing the algorithm. It is typically complex work for a reviewer to identify all the globals and indirectly referenced data used or updated by a given piece of code. But an advanced static analysis tool can do this readily as well as annotate a program listing, demonstrated in the following example (note that examples are given using Ada syntax):

  function Random return Integer;

    --#input:  Seed

    --#output: Seed

    --#output: Random’Result

This indicates that the function has some inputs and outputs not mentioned in its parameter profile, which is empty here. In particular, the Random function both reads and writes a global variable named Seed. And there is little surprise that it also has the function result as an output.

A function named Random generally needs some internal state to be able to produce a sequence of pseudorandom numbers, but this information is rarely clear when reviewing a function with a less familiar name and potentially dozens or hundreds of lines of code. This job is made even harder when the function calls other code, some of which may be in a different module, potentially in a completely separate subsystem. On the other hand, this is relatively straightforward work for a static analysis tool, and presuming the tool produces annotated listings, the reviewer’s job is already simplified by having this information readily available.

Reviewing object allocations and memory management

When reviewing a resource-constrained system, developers must understand where memory is allocated or deallocated. Some critical systems are structured to avoid all dynamic memory allocation, but most systems of any complexity, and all systems built with languages such as Java or C#, will inevitably make significant use of dynamic memory allocation. Although garbage collection can relieve the programmer of worrying about dangling references, it does not alleviate the issue of excess memory allocation or ongoing memory growth. Complex data structures can end up growing unintentionally, or large numbers of objects can be repeatedly and wastefully allocated and reclaimed.

Programmers might assume that garbage collection removes the concern over storage leaks. It is true that type-accurate garbage collection will not leave inaccessible objects, but it can leave logical garbage: objects that are accessible in principle, but which the program does not and will not need to reference in the future. These memory leaks are often more difficult to find than leaks in a language with user-controlled memory allocation.

No tool can guarantee to find all memory allocation problems, but an advanced static analysis tool can help find storage leaks or inappropriate object allocations during code review. One approach is for the tool to annotate a routine indicating where new objects are created and the number of times such an allocation might occur per invocation of the routine. For example:

  type List is access List_Node;

    -- A variable of type List is a pointer to a

    -- dynamically allocated object of type List_Node

  ...

  function Copy_List(L : List) return List;

    --#input:   L

    --#input:   L...Val

    --#input:   L...Next

    --#new obj: new List_Node(#1), num obj >= 0

    --#output:  Copy_List’Result

    --#output:  new List_Node(#1).Val

    --#output:  new List_Node(#1).Next

In addition to identifying the inputs and outputs, the tool provides a new obj annotation, indicating the type of object being allocated as well as the number of times this allocation will occur in a given invocation of Copy_List. In this case, it merely shows that the first allocator for a List_Node may be executed zero or more times, presumably depending on the length of the incoming list L. Note that it also indicates which fields of the newly constructed object(s) will be initialized and which fields of the incoming list L are referenced to do the Copy_List operation. The notation L…Val indicates a possible reference to the Val field of any List_Node reachable starting from L.

Reviewing preconditions and postconditions

After understanding the set of inputs and outputs of a routine and the number of new objects being created, a reviewer will be interested in other issues:

What assumptions are being made about the state of the inputs

What updates are being performed on the outputs

Whether these assumptions and updates correspond to the expected behavior

This is difficult to determine by simply staring at the code, particularly if the review proceeds in a top-down order. In a top-down review, summary information is needed for any routines called from the routine under review to be able to understand the logic of the given routine without having to repeatedly jump over to the called routine and study its code. Human push-down stacks tend to be very small, so trying to review a piece of code that requires repeatedly reviewing called routines quickly overflows the reviewer’s mental stack.

The reviewer thus needs preconditions on the inputs and postconditions on the outputs for each routine being reviewed and for each routine called from a routine being reviewed. An advanced static analyzer can extract pre- and postconditions from the code itself:

  function Random return Integer;

    --#pre:  Seed in {0..216-1}

    --#post: Seed = (Seed’old*25_173 + 13_849) mod 216

    --#post: Seed in {0..216-1}

    --#post: Random’Result = Seed mod 215

    --#post: Random’Result in {0.. 215-1}

The tool has decorated the listing with pre and post annotations, which for each input indicate the expectations for the given input, and for each output indicate its final value as a function of the inputs and/or as a set of possible values. In this case, the Random function is expecting Seed to be a 16-bit unsigned value on entry, and the new Seed value on exit is a linear function of its old value, reduced modulo 216, with the unsurprising effect that its value is back in the range of a 16-bit unsigned value. Similarly, Random is computed from the (final) value of Seed, reduced further to be in the range of a 15-bit unsigned value.

The tool extracted this information from the code itself, giving reviewers an accurate representation of the as-built behavior rather than the behavior that the implementer might have hoped for. They can then compare this with their expectations to determine immediately if the function is doing the right thing. Furthermore, if this function is not under review, but is instead being called from a function that is under review, they do not need to look further at the code of Random to know its external assumptions and effects, allowing them to continue reviewing the original function that calls Random without overstressing their mental push-down stack.

On another level, one of the concerns with a random number generator is whether it uses an appropriate algorithm. Too many random number generators depend on the idea that any random algorithm will produce random results. In this case, reviewers can analyze the validity of the algorithm at a fundamental level simply by looking at the preconditions and postconditions, even before they look at the body or attempt to understand the specific code involved.

There are other cases in which detailed examination of the extracted pre- and postconditions can be revealing. For example, the following postcondition was extracted by the tool from sample user code:

  function Days_In_Month(Month : Month_Num) return Integer is  

    --#post: Days_In_Month’Result in

    --  {0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334}

  begin

    return Integer (Number_Of_Days_Since (Month));

  end Days_In_Month;

Looking at the post annotation and comparing it against the name of the function, a reviewer can immediately suspect that something is amiss. The possible values of the result of calling the function seem to differ from what one might expect from a function whose name suggests that it returns the number of days in a month. Instead, the function seems to be computing a count of days preceding the month in a given year. Without this post annotation, a reviewer might pass over this function without recognizing the nature of the flaw.

Even if further review determined that the purpose of this routine was to compute the number of days in the year preceding the month, the reviewer identified a flaw in the poor choice of the name. Ill-chosen names, though not directly a cause of a runtime failure, can cause problems in maintenance. One of the purposes of code review is to find and correct this type of flaw as well as runtime defects in the program.

Reviewing external calls and associated presumptions

One of the liabilities of developing and reviewing code top-down is that some of the routines called might not have been written yet. For a static analysis tool to be usable in such an environment, it needs to operate under those conditions as well. Furthermore, even in the absence of code, reviewers have some expectation as to how the lower-level routines will operate once available, and the static analysis tool should help with checking the existing code’s use of those as-yet-unavailable lower-level routines. The tool also can help with calls into unanalyzed code – be they unwritten, written in some other language, or simply unavailable – by identifying where such unanalyzed calls appear and what presumptions are being made about the results returned based on the contexts in which the calls are made:

  procedure DOS_Get_Immediate

    (Item : out Character; Avail : out Boolean);

    --#unanalyzed:  bioskey@6

    --#presumption: bioskey@6’Result in {0..255}

The tool provides annotations on the DOS_Get_Immediate procedure indicating that there is an unanalyzed call on bioskey. Furthermore, it indicates that the calling code is making the presumption that the result of this call is in the range 0..255. At this point, the reviewer can compare these presumptions against what is known about this external routine and determine whether the presumptions are reasonable. Without this kind of annotation, it might be difficult for the reviewer to locate and review all of the invocations of external or unavailable code.

Reviewing test coverage and software test vectors

In some cases, a code review may also include a review of the plan for testing a unit. Unit testing may entail a significant effort, given that certain certification standards require developers to demonstrate extensive code coverage. For example, the code coverage requirement for DO-178B Level A is the Modified Condition/Decision Coverage (MC/DC)[7]. With MCDC, the developer must exercise every individual Boolean condition and every control flow decision. Achieving this level of coverage can involve a huge effort and poses a concern at the earliest stages of development. By analyzing the control flow, an advanced static analysis tool can identify the interesting subranges of inputs, which will in combination exercise every condition and decision with fewer tests:

  procedure Histogram(Val_Array: Int_Array; Low, High: Integer);

    -- Creates a histogram of the values in Val_Array, using

    -- Low and High as the range of values histogrammed.

    -- All values less than Low will use histogram element "Low-1" and

    -- all values greater than High will use histogram element "High+1".

 

    --#test_vector: High - Low:         {-2,-1}, {0.. 231-4}

    --#test_vector: Val_Array[..]-High: {-231+4..0},{1..231-1}

    --#test_vector: Val_Array[..]-Low:  {-232+1..-1},{0..231-4}

The tool has added test_vector annotations to the listing. In this case, the annotations identify certain computations over the inputs and which subranges of the values of these computations lead the algorithm along different paths. Different algorithms might result in more or fewer test vectors with more or fewer distinct subranges. In this case, each of the three computations has two distinct subranges. To achieve full condition/decision coverage of this algorithm, the cross product of these three test vectors would suffice, leading to a total of 8 (23) test cases. For example, one test case might involve one value from the first subrange of each test vector, say High-Low = -1, Val_Array(…)-High = -7, and Val_Array(…)-Low = -8. The other 7 test cases would involve values chosen from the second subrange of different combinations of the test vectors.

Although not necessarily concerned with the particular values in each test vector subrange, the reviewer might assess whether the complexity of the algorithm could be reduced so that the number of test vectors or the number of subranges for a given test vector would be reduced. The test vector annotations provided by the tool can rapidly indicate the level of complexity and the possible places where the developer should focus the simplification effort.

It should be noted that the proper procedure for creating certification tests is to derive the set of tests from the requirements, not from the code. But in practice, the set of tests thus derived will be incomplete, and examining the test vectors can help point out the requirements that need to be tested more thoroughly to achieve the required completeness.

For example, if a routine takes an integer input, and the requirement is that the routine return its square root, it is not feasible to test all possible inputs. However, the code might use different algorithms for different ranges, and so to thoroughly test the code, the developer needs sample data in specific ranges to check these different computational methods. The reviewer might not know that this is necessary from the requirements alone, and in this case, the test vectors can point out what is necessary to complete the set of tests.

Reviewing assertions and runtime checks

After focusing on understanding the routine as a whole, the reviewer can confidently begin to review the code on a line-by-line level. The line-by-line review is the typical place where static analysis tools are used. Many tools focus exclusively on identifying individual low-level defects without necessarily providing information that helps with the big picture. The combination of the higher-level, unit-oriented annotations with the lower-level, line-by-line warnings provides significant support for productive code review. The line-by-line messages help identify all places where a user-written assertion or a language-based runtime rule might be violated, thereby highlighting those parts of the code that might deserve extra scrutiny. For example:

  66 ...

  67  Screen.MoveCursor(Column => 30 + Bx*2,

  68                    Row => 2 + By);

 

    precondition check   Medium Prob.  check that (30 + Bx*2) <= 80

    precondition check   Medium Prob.   check that (2 + By) <= 24

Here the tool highlights the possibility that the computed value for Column is greater than the 80 column limit, which is apparently a precondition imposed by MoveCursor, and also highlights the possibility that the computed Row is too great. The tool analyzed what it could on its own and was unable to prove that these checks would always succeed based on other parts of the algorithm or on preconditions propagated to the inputs of the given routine. Therefore, it assessed the probability of the precondition check failing to be Medium. It would indicate High if it could determine that the checks would always fail and give no message at all if it could determine the checks would always succeed. Low is reserved for messages that involve analyses where a high false-positive rate is seen empirically.

Reviewing threading and race conditions

After going through all the possible sequential code runtime failures, the reviewer then needs to attend to concurrent code issues. Reviewing code for concurrency problems is particularly difficult, and conventional testing for such problems is inherently problematic, as they are often load dependent and difficult to reproduce. Thus, any help a static analysis tool can provide is extremely valuable when it comes to problems with multithreaded code. Tools vary greatly in how they can address concurrency issues. One especially useful kind of analysis involves assessing the program’s threading structure and identifying any potential race conditions, where data visible to multiple threads might be concurrently accessed without sufficient synchronization.

Figure 1 illustrates the kind of race-condition report that an advanced static analysis tool can provide. The left-hand column shows the association between objects and locks, and the right-hand column contains an annotated source listing.

21
Figure 1: Race-condition report
(Click graphic to zoom by 1.9x)

In this report, the tool identifies an object, Arrival:Delay_Time, which is updated from multiple thread entry points (Timer and Speeder) without sufficient synchronization. It identifies each code location where the object is either R(ead) or W(ritten) and which locks are held at that point, presumably as a result of an Ada 95 protected action or a Java synchronized method call. In this case, no locks are held, so unless other unconventional synchronization methods are being used, the potential for a race condition exists. With this kind of information, a code reviewer can quickly identify the places where additional scrutiny of the threading logic might be valuable.

Reinventing code review

The benefits of code review are well known in almost all of its forms – formal or informal, highly structured or ad hoc, periodic or sporadic. Nevertheless, code review remains a hard sell in many organizations, as the actual process can be time-consuming and can feel unproductive. One of the reasons is the inherent difficulty of reading and understanding code that is unfamiliar, with potentially inadequate or out-of-date documentation, hidden assumptions, or undocumented dependencies.

An advanced static analysis tool has the potential to reduce the difficulty of reviewing unfamiliar code by providing annotations that summarize the high-level attributes of the code along with warnings that highlight areas deserving extra attention. This tool can identify all of the inputs, outputs, object creations, and any external calls; document the expectations about these inputs, outputs, and external calls in the form of preconditions, postconditions, and presumptions; and indicate the number and nature of the test vectors sufficient to achieve full coverage.

The tool can also provide lower-level indications of potential sequential and concurrent logic flaws. With this information, a reviewer can immediately and systematically compare requirements against the as-built characteristics, verify that test plans are adequate, and understand and evaluate the sequential and concurrent structure. This level of assistance makes the process of code review more productive, thereby increasing the likelihood that it will actually get done.

Furthermore, every time the code is modified, it is easy to rerun the tool to find out what has changed. Ideally, the tool would provide an incremental list of only the things that have changed. The tool is ready to review the entire application for a small change, something that is generally not true of a team of people doing a purely manual code review. If the tool identifies that nothing significant has changed, then the impact (and additional review effort) is minor. On the other hand, if the tool indicates that one of the higher-level annotations is now significantly different, then reviewers can spend time focusing on just that significant change.

The kinds of static analysis discussed in this article have been implemented in the context of a professional product named CodePeer, developed jointly by AdaCore and SofCheck, Inc. All of the annotated listings shown were generated by CodePeer from Ada source code input.

Of course, the use of a static analysis tool cannot entirely substitute for careful human review of the code. For example, existing technology for natural language comprehension falls far short of being able to review English language comments to see whether they are consistent with the code. Nevertheless, a well-designed tool can make a substantial contribution to the process of code review, both in finding problems that might otherwise go unnoticed and in reducing the overall manual effort required for this kind of detailed review.

References

[1] RTCA/DO-178B Software Considerations in Airborne Systems and Equipment Certification, RTCA, 1992.

[2] Common Criteria v3.1, CCRA, 2007; www.commoncriteriaportal.org/thecc.html

[3] Common Weakness Enumeration, MITRE, 2009; cwe.mitre.org

[4] Boehm, B., and Basili, V.R. 2001. “Software Defect Reduction Top 10 List,” Computer 34, 1 (Jan. 2001), 135-137; www.cs.umd.edu/projects/SoftEng/ESEG/papers/82.78.pdf

[5] Jones, C. “Measuring Defect Potentials and Defect Removal Efficiency,” CrossTalk, June 2008; www.stsc.hill.af.mil/crosstalk/2008/06/0806jones.html

[6] Cohen, J. Best Kept Secrets of Peer Code Review, SmartBearSoftware, 2006; codereviewbook.com

[7] Chilenski, J.J., and Miller, S.P. “Applicability of Modified Condition/Decision Coverage to Software Testing,” Software Engineering Journal, September 1994, Vol. 9, No. 5, 193-200.

S. Tucker Taft is founder and CTO of SofCheck, Inc., based in Burlington, Massachusetts. Tucker founded SofCheck in 2002 to provide tools and technology for enhancing software development quality and productivity. Prior to that, he was a Chief Scientist at Intermetrics, Inc., where he led the Ada 95 language revision effort. Tucker holds a BS in Chemistry from Harvard College.

Robert B.K. Dewar is cofounder, president, and CEO of AdaCore, based in New York City. He has had an established career as Professor of Computer Science at the Courant Institute of New York University. Having worked with programming language design and implementation throughout his career, Robert is a principal architect of AdaCore’s GNAT Ada technology. Robert holds a PhD in Chemistry from the University of Chicago.

SofCheck, Inc.
781-750-8068
info@sofcheck.com
www.sofcheck.com

AdaCore
212-620-7300
sales@adacore.com
www.adacore.com

Topics covered in this article