How to refactor the “arrow type” code

10 min readJun 23, 2011

The article is a bit long, I will give relevant thoughts and summary comments at the end of the article, you can jump to the end.

The so-called arrow type code is basically the case shown in the picture below.

So, what is the problem with such “arrow-shaped” code? It looks pretty good and has a symmetrical beauty. but……

The following questions about the arrow type code are as follows:

1) My monitor is not wide enough, the arrow type code is too indented, I need to pull the horizontal scroll bar back and forth, which makes me quite uncomfortable while reading the code.

2) In addition to the width as well as length, and some code if-elsewhere if-elsein the if-elsecode too much, you do not know the middle of the middle read code is the result of what the layers of checks came here.

In short, if the “arrow type code” is too nested and the code is too long, it will be quite easy for the person who maintains the code (including myself) to get lost in the code, because when you see the innermost code, you don’t know the front. The condition of the layer is judged, how the code is run here, so the arrow code is very difficult to maintain and debug .

Cases with Guard Clauses

OK, let’s take a look at the example. If the code size is a little bigger and nested a little more, you will easily get lost in the condition (the following example is just a small one under the “big arrow”.

FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index != -1) {
        auto type = manager->expressionResolvings.Values()[index].type;
        if (! types.Contains(type.Obj())) {
            types.Add(type.Obj());
            if (auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true)) {
                int count = group->GetMethodCount();
                for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
                    if (method->IsStatic()) {
                        if (method->GetParameterCount() == 1 &&
                            method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
                            method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
                            symbol->typeInfo = CopyTypeInfo(method->GetReturn());
                            break;
                        }
                    }
                }
            }
        }
    }
}

The above code, you can write the conditions in reverse, and then you can solve the arrow type code, the reconstructed code is as follows:

FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index == -1)  continue;
     
    auto type = manager->expressionResolvings.Values()[index].type;
    if ( types.Contains(type.Obj()))  continue;
     
    types.Add(type.Obj());
 
    auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true);
    if  ( ! group ) continue;
  
    int count = group->GetMethodCount();
    for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
        if (! method->IsStatic()) continue;
        
        if ( method->GetParameterCount() == 1 &&
               method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
               method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
            symbol->typeInfo = CopyTypeInfo(method->GetReturn());
            break;
        }
    }
}

This code is refactored in a way called Guard Clauses.

Martin Fowler ‘s Refactoring website has a description of Replace Nested Conditional with Guard Clauses .
There is also an article on Coding Horror about this way of refactoring — “ Flattening Arrow Code “
There are also related questions on StackOverflow that say this way — “ Refactor nested IF statement for clarity “

The idea here is to let the error code return first, all the error judgments are judged in the front, and then the rest is the normal code .

Extract into a function

Some people said that the continue statement breaks the fluency of reading code. I think they must not read the code inside. In fact, we can see that all if statements are in the judgment of whether something goes wrong, so When maintaining the code, you can completely ignore these if statements, because they are all error-handling, and the rest of the code is normal function code, but it is easier to read. Of course, there must be such a situation in the above code, then, without continue, can we still refactor?

Of course, the pumping function:

bool CopyMethodTypeInfo(auto &method, auto &group, auto &symbol) 
{
    if (! method->IsStatic()) {
        return true;
    }
    if ( method->GetParameterCount() == 1 &&
           method->GetParameter(0)->GetType()->GetTypeDescriptor() == description::GetTypeDescriptor<DescriptableObject>() &&
           method->GetReturn()->GetTypeDescriptor() != description::GetTypeDescriptor<void>() ) {
        symbol->typeInfo = CopyTypeInfo(method->GetReturn());
        return false;
    }
    return true;
}
 
void ExpressionResolvings(auto &manager, auto &argument, auto &symbol) 
{
    int index = manager->expressionResolvings.Keys().IndexOf(argument.Obj());
    if (index == -1) return;
     
    auto type = manager->expressionResolvings.Values()[index].type;
    if ( types.Contains(type.Obj())) return;
 
    types.Add(type.Obj());
    auto group = type->GetTypeDescriptor()->GetMethodGroupByName(L"CastResult", true);
    if  ( ! group ) return;
 
    int count = group->GetMethodCount();
    for (int i = 0; i < count; i++) { auto method = group->GetMethod(i);
        if ( ! CopyMethodTypeInfo(method, group, symbol) ) break;
    }
}
 
...
...
FOREACH(Ptr<WfExpression>, argument, node->arguments) {
    ExpressionResolvings(manager, arguments, symbol)
}
...
...

When you appear, after the function is drawn, the code becomes easier to read and easier to maintain than before. No?

Some people say: “If the code is not shared, don’t extract it into a function!”, the person holding this view is too dead to study. A function is a wrapper or abstraction of code. It is not necessarily used for code sharing. Functions are used to shield details and allow other code to be coupled to interfaces rather than details. This will make our code simpler and simpler. It is easy to read and easy to maintain. This is what the function does.

Nested code outside of if

There are still people asking how the original code should be refactored if there is still code to execute after each if statement. For example, the following code.

for(....) {
    do_before_cond1()
    if (cond1) {
        do_before_cond2();
        if (cond2) {
            do_before_cond3();
            if (cond3) {
                do_something();
            }
            do_after_cond3();
        }
        do_after_cond2();
    }
    do_after_cond1();
}

The code above those do_after_condX()are the conditions, whether successful or not be executed. So, the code we flattened is as follows:

REFACTORING THE FIRST EDITION

for(....) {
    do_before_cond1();
    if ( !cond1 ) {
        do_after_cond1();
        continue
    } 
    do_after_cond1();
 
    do_before_cond2();
    if ( !cond2 ) { 
        do_after_cond2();
        continue;
    }
    do_after_cond2();
 
    do_before_cond3();
    if ( !cond3 ) {
        do_after_cond3();
        continue;
    }
    do_after_cond3();
 
    do_something();  
}

You will find that the above do_after_condXappeared in duplicate. This is the final version if the code in the if statement block changes the do_after_condXstate of some dependencies.

However, if they have no previous dependencies, we can keep only one copy according to the DRY principle, so it is good to fall directly to the if condition, as shown below:

REFACTORING THE SECOND EDITION

for(....) {
    do_before_cond1();
    do_after_cond1();
    if ( !cond1 ) continue;
  
    do_before_cond2();
    do_after_cond2();
    if ( !cond2 ) continue;
 
    do_before_cond3();
    do_after_cond3();
    if ( !cond3 ) continue;
 
    do_something();  
}

At this point, you would say that I rely on, actually, changed the order of execution, the conditions put do_after_condX()behind. Is this going to be a problem?

In fact, if you analyze the previous code, you will find that cond1 is to judge whether do_before_cond1() is faulty. If it succeeds, it will be executed. And do_after_cond1() is executed anyway. Logically, do_after_cond1() is actually independent of the execution result of do_before_cond1(), and cond1 is related to whether or not to execute do_before_cond2(). If I turn the line break into the following, the code logic is clearer.

REFACTORING THE THIRD EDITION

for(....) {
 
     do_before_cond1();
     do_after_cond1();
 
 
     if ( !cond1 ) continue; // <-- cond1 becomes the condition for the second statement block
     do_before_cond2();
     do_after_cond2();
 
     if ( !cond2 ) continue; // <-- cond2 becomes the condition for the third statement block
     do_before_cond3();
     do_after_cond3();
 
     if ( !cond3 ) continue; //<-- cond3 becomes the condition of whether to do the fourth statement block
     do_something();
  
}

As a result, when the code is maintained in the future, the maintainer can see at a glance where the code will be executed. At this time, you will find that by drawing these blocks into functions, the code will be cleaner and refactored:

REFACTORING THE FOURTH EDITION

bool do_func3() {
   do_before_cond2();
   do_after_cond2();
   return cond3;
}
 
bool do_func2() {
   do_before_cond2();
   do_after_cond2();
   return cond2;
}
 
bool do_func1() {
   do_before_cond1();
   do_after_cond1();
   return cond1;
}
 
// you can refactor like this for-loop
for (...) {
    bool cond = do_func1();
    if (cond) cond = do_func2();
    if (cond) cond = do_func3();
    if (cond) do_something();
}
 
// for-loop can also be refactored like this
for (...) {
    if ( ! do_func1() ) continue;
    if ( ! do_func2() ) continue;
    if ( ! do_func3() ) continue;
    do_something();
}

Above, I have given two versions of for-loop, which one do you like? I like the second one. At this time, because the code in the for-loop is very simple, even if you don’t like continue, the cost of reading such code is already very low.

State check nesting

Next, let’s look at another example. The following code forges a scene — pulling two people into a one-to-one chat room, because the status of both sides is checked, so the code may be written as “arrow type”.

int ConnectPeer2Peer(Conn *pA, Conn* pB, Manager *manager)
{
    if ( pA->isConnected() ) {
        manager->Prepare(pA);
        if ( pB->isConnected() ) {
            manager->Prepare(pB);
            if ( manager->ConnectTogther(pA, pB) ) {
                pA->Write("connected");
                pB->Write("connected");
                return S_OK;
            }else{
                return S_ERROR;
            }
 
        }else {
            pA->Write("Peer is not Ready, waiting...");
            return S_RETRY;
        }
    }else{
        if ( pB->isConnected() ) {
            manager->Prepare();
            pB->Write("Peer is not Ready, waiting...");
            return S_RETRY;
        }else{
            pA->Close();
            pB->Close();
            return S_ERROR;
        }
    }
    //Shouldn't be here!
    return S_ERROR;
}

Refactoring the above code, we can first analyze the above code, indicating that the above code is the “state” of the two states of PeerA and PeerB, “not connected” to do the combination “state” (Note: in practice The state should be more complicated than this, there may be “disconnection”, “error”, etc.), so we can write the code as follows, merge the above nesting conditions, for each combination Make judgments. In this way, the logic will be very clean and clear.

int ConnectPeer2Peer(Conn *pA, Conn* pB, Manager *manager)
{
    if ( pA->isConnected() ) {
        manager->Prepare(pA);
    }
 
    if ( pB->isConnected() ) {
        manager->Prepare(pB);
    }
 
    // pA = YES && pB = NO
    if (pA->isConnected() && ! pB->isConnected()  ) {
        pA->Write("Peer is not Ready, waiting");
        return S_RETRY;
    // pA = NO && pB = YES
    }else if ( !pA->isConnected() && pB->isConnected() ) {
        pB->Write("Peer is not Ready, waiting");
        return S_RETRY;
    // pA = YES && pB = YES
    }else if (pA->isConnected() && pB->isConnected()  ) {
        if ( ! manager->ConnectTogther(pA, pB) ) {
            return S_ERROR;
        }
        pA->Write("connected");
        pB->Write("connected");
        return S_OK;
    }
 
    // pA = NO, pB = NO
    pA->Close();
    pB->Close();
    return S_ERROR;
}

Extended thinking

For if-elsestatement it is, in general, is to check two things: errors and status .

check for errors

For checking errors, using Guard Clauses is a standard solution, but we also need to pay attention to the following things:

1) Of course, when there is an error, there will be situations where resources need to be released. You can use goto fail;this way, but the most elegant way should be the object-oriented C ++ RAII-style way.

2) Returning with error code is a relatively simple way. This method has some problems. For example, if there are too many error codes, the code that determines the error will be very complicated. In addition, the normal code and the wrong code will be mixed. , affecting readability. So, in language more high group, the use of try-catchan exception to capture the way, makes the code more readable number.

Check status

For the inspection status, there must be more complicated situations in practice, such as the following:

1) State changes like the two ends in the TCP protocol.

2) Various combinations of command options like shell commands.

3) Like a state change in the game (a very complex state tree).

4) State changes like syntax analysis.

For these complex state changes, you need to define a state machine, a query table for the combined state of the child states, or a state query analysis tree.

When writing code, the control state or business state of the code is an important reason that can confuse your code flow. A very important job of refactoring the “arrow” code is to reorganize and describe these states. Change relationship .

To sum up

Ok, here’s a summary: Here are a few ways to refactor the “arrow” code:

1) Use Guard Clauses . Try to get the error back first, so that you will get clean code later.

2) Extract the block of statements in the condition into a function . Some people say: “If the code is not shared, don’t extract it into a function!”, the person holding this view is too dead to study. A function is a wrapper or abstraction of code. It is not necessarily used for code sharing. Functions are used to shield details and allow other code to be coupled to interfaces rather than details. This will make our code simpler and simpler. It is easy to read and easy to maintain, and it is the original intention of refactoring code to write code that is easy to read and maintain !

3) For error handling, use try-catch exception handling and RAII mechanism . There are many problems with the error handling of the return code, such as: A) the return code can be ignored, B) the error handling code is mixed with the normally processed code, C) causes the function interface to be contaminated, such as the error code like atoi() A bad function that returns a value to share.

4) For the judgment and combination of multiple states, if it is complicated, you can use the “combined state table” or the design mode of the state machine plus Observer state subscription . Such code is decoupled, clean and simple, and also very scalable.

5) Refactoring the “arrow type” code is actually helping you to reorganize all the code and logic. This process is worth paying for . The process of refining the idea and trying everything to simplify the code itself can make people grow.