Computer Programming/Coding Style/Minimize nesting

< Computer Programming < Coding Style

Deeply nested code is a common feature of structured programming. While it has some advantages, discussed in that section, it is frequently considered hard to read and an anti-pattern: “Flat is better than nested”.[1]

Specifically, nested control flow – conditional blocks (if) or loops (for, while) – is hard to understand beyond three levels of nesting,[2][3] and has high cyclomatic complexity. This is known as “Dangerously Deep Nesting”[3] or, in the case of nested if statements, the “Arrow Anti Pattern”, due to the following shape:

 if
   if
     if
       if
         do something
       endif
     endif
   endif
 endif

This has a number of problems:

Other than refactoring or avoiding this code, one technique to handle deeply nested code is code folding in editors – this allows you to collapse a block, yielding abstraction and allowing you to see the surrounding code easily without the intervening code (so resource acquisition and cleanup are both visible).

Solutions

Solutions include the following.[4]

Refactor blocks into separate functions.

This is particularly common for bodies of loops.

Combine tests

If several if clauses are just tests (without any intervening code), these can be combined into a single test. Compare:

if a:
    if b:
        ...

to:

if a and b:
    ...
Inline function calls with boolean short-circuiting

If the only body of the if clause is a function call and assignment to perform a test, followed by another if clause, in language such as C where assignments are expressions (have a value) and boolean expressions are short-circuited, these can be combined:

if (a) {
    int b = f();
    if (b) {
        ...
    }
}
if (a && int b = f()) {
    ...
}
Auxiliary variables or functions

Auxiliary variables are useful when a complex expression is inlined in code, notably a boolean expression or anonymous functions.[5] Using an auxiliary expression both reduces the nesting, since it is no longer included within another expression, and the variable name documents the meaning of the expression. For complex boolean expressions, another alternative is a separate function which is called, rather than an auxiliary variable.

Early return

The most significant solution is early return, which has several forms, notably a guard clause.[4] Avoiding nested control flow is a fundamental reason for non-local control, notably: return (value), raise (exception), continue, and break. A common pattern is to replace an if-then or nested if ifs by if not/return-continue (return/raise if a function, continue/break if a loop body).

Compare:

if a:
    ...
    if b:
      ...
      ...

to:

if not a:
    return
...
if not b:
    return
...
...

Similarly, compare:

for i in l:
    if a:
        ...
        if b:
            ...
            ...

to:

for i in l:
    if not a:
        continue
    ...
    if not b:
        continue
    ...
    ...

This reduces the nesting and makes the flow more linear – either go further down the block, or return/continue.

This pattern is called a “guard clause” when the checks appear at the start of the code and check preconditions. However, it is used more generally to finish processing and return immediately once work is complete or a value has been computed:[3]

“Use a return when it enhances readability: In certain routines, once you know the answer, you want to return it to the calling routine immediately.”

However, early returns are potentially confusing and error-prone, notably due to the issues of cleanup, and go against a central tenant of structured programming, namely a single exit point per routine.[3]

“Minimize the number of returns in each routine: It’s harder to understand a routine when, reading it at the bottom, you’re unaware of the possibility that it returned somewhere above. For that reason, use returns judiciously–only when they improve readability.”

In the absence of cleanup – when a function is just computing a value or producing side effects – early returns have fewer potential problems. In the presence of cleanup, some languages have facilities that facilitate cleanup even with returns (such as “finally” clauses, “atexit” in Unix or Python, or “defer” in Go). Another option is to keep a single exit point at the end, following a cleanup clause, and instead of early returns, jump (goto) the cleanup clause.

In cases of complex nesting – nested if/then/else statements or multiple if statements at a given level – often the logic is simply multiple exclusive conditions, which can be handled by testing for each condition in turn, executing code if relevant and then returning or using an elif, allowing a flat structure and making the complete condition clear.

Compare:

if a:
    if b:
        f()
    else:
        g()
else:
    if b:
        h()
    else:
        i()

to:

if a and b:
    f()
    return
if a and not b:
    g()
    return
if not a and b:
    h()
    return
if not a and not b:
    i()
    return

or:

if a and b:
    f()
elif a and not b:
    g()
elif not a and b:
    h()
elif not a and not b:
    i()

Alternative control structures

Early return has a number of stylistic variants with other control structures, notably in eliminating else statements.

Omit else following return[6]

Compare:

if a:
    return ...
else:
    return ...

to:

if a:
    return ...
return ...

Compare:

if a:
    return ...
elif b:
    return ...
else:
    return ...

to:

if a:
    return ...
if b:
    return ...
return ...
Use switch

In languages with a switch statement, this can replace multi-way conditionals.

switch (x) {
case a:    
    return ...
case b:
    return ...
default:
    return ...
}
Use elseif

Some languages have elseif statements (elsif, elif) to reduce nesting in if clauses within else clauses, functioning similarly to a switch: Compare:

if a:
    return ...
else:
    if b:
        return ...
    else:
        return ...

to:

if a:
    return ...
elif b:
    return ...
else:
    return ...

Nested loops

Nested loops are natural for multidimensional data, but for sequential processing of single-dimensional data, nested loops are often unnatural and can be replaced by flatter structures.

Sequential loops

A subtler issue occurs when processing a sequence of data by first doing one thing on some of the data, and then switching to a different state and process the rest of the data. One can do this by nested loops, but more natural is to break the loop and then continue in a separate loop.

In many languages, such as C, this is done by having an auxiliary index variable that is shared between the two loops.

Compare:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    if (...) {
        for (int j = i; j < n; j++) {
            bar(a[j])
        }
    }
}

with:

int i = 0;
for (; i < n; i++) {
    foo(a[i]);
    if (...)
        break;
}
for (; i < n; i++) {
    bar(a[i])
}

This shows the sequential flow more clearly, and avoids the nesting.

In languages such as Python that implement iterators, this can be done without an auxiliary variable, since the index state is contained in the iterator:

l = iter(a)
for x in l:
    foo(x)
    if ...:
        break
for x in l:
    bar(x)

Switching between loops

A more complex example occurs when you want to switch back and forth between two ways of processing data, such as iterating through a string overall vs. within words. In general the most elegant solution is via mutually recursive coroutines (with tail calls), operating on a shared iterator (or index variable), though in languages without coroutines this is instead done via a state machine, or sometimes mutually recursive subroutines.

In simpler cases where there is a main loop and a secondary loop (such as looping through a string, secondarily operating on its words), there is a natural nested structure. In this case simply factoring the secondary loop into a separate function is sufficient to remove the nesting. Compare:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    while (...) {
        ...
    }
}

to:

for (int i = 0; i < n; i++) {
    foo(a[i]);
    other(a, &i, n);
}

void other(char *a, int *i, int n) {
    ...
}

Module complexity

While deeply nested code within a single function is undesirable, having separate modules, functions, and nested functions is an important form of modularity, particularly due to restricting scope. As a rule, it’s better to have all functions in a module be related to each other (for cohesion), which favors a high level of factoring and separate modules. That said, this can increase module complexity, particularly in extreme cases such as Java, with its restriction of one public top-level class per file.

Nested functions

Helper functions used only in one place are often nested within the function that calls them, in languages that support nested functions – this gives them access to the enclosing scope (reducing the need to pass parameters back and forth), and restricts their own scope.

However, in the case of long helper functions, it is generally clearer to have a separate, private, top-level function, particularly if they do not need access to the state of the containing function – nested functions are primarily useful for having access to the enclosing state, not because their own scope is restricted. This shortens the main function, and reduces the indentation and complexity of the helper function, because the state of the enclosing function is no longer accessible.

Compare:

def foo():
    def foo_helper():
        ...

    ...
    foo_helper()
    ...

to (note the explicit parameter passing):

def foo():
    ...
    _foo_helper(x)
    ...
 
def _foo_helper(x):
    ...

References

  1. Zen of Python
  2. Noam Chomsky and Gerald Weinberg (1986)
  3. 1 2 3 4 Code Complete, Steve McConnell
  4. 1 2 Flattening Arrow Code”, Coding Horror: programming and human factors, by Jeff Atwood, January 10, 2006
  5. Reducing Code Nesting”, Eric Florenzano’s Blog, Jan 02, 2012
  6. Refactoring, Martin Fowler
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.