The Ultimate Question of Programming, Refactoring, and Everything. Part 2
Author: Andrey Karpov
Part 1 read here: https://medium.com/@Coder_HarryLee/the-ultimate-question-of-programming-refactoring-and-everything-part-1-36ccf2423d8b
Full PDF version: https://yadi.sk/i/zKHIOS84r87nk
Table of contents
22. Do not use #pragma warning(default:X)
23. Evaluate the string literal length automatically
24. Override and final specifiers should become your new friends.
25. Do not compare ‘this’ to nullptr anymore
26. Insidious VARIANT_BOOL
27. Guileful BSTR strings
28. Avoid using a macro if you can use a simple function
29. Use a prefix increment operator (++i) in iterators instead of a postfix (i++) operator
30. Visual C++ and wprintf() function
31. In C and C++ arrays are not passed by value
32. Dangerous printf
33. Never dereference null pointers
34. Undefined behavior is closer than you think
35. Adding a new constant to enum don’t forget to correct switch operators
36. If something strange is happening to your PC, check its memory.
37. Beware of the ‘continue’ operator inside do {…} while (…)
38. Use nullptr instead of NULL from now on
39. Why incorrect code works
40. Start using static code analysis
41. Avoid adding a new library to the project.
42. Don’t use function names with “empty”
Conclusion
22. Do not use #pragma warning(default:X)
The fragment is taken from TortoiseGIT project. The error is detected by the following PVS-Studio diagnostic: V665 Possibly, the usage of ‘#pragma warning(default: X)’ is incorrect in this context. The ‘#pragma warning(push/pop)’ should be used instead.
#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(default:4996)
Explanation
Programmers often assume that warnings disabled with the “pragma warning(disable: X)” directive earlier will start working again after using the “pragma warning(default : X)” directive. But it is not so. The ‘pragma warning(default : X)’ directive sets the ‘X’ warning to the DEFAULT state, which is quite not the same thing.
Suppose that a file is compiled with the /Wall switch used. The C4061 warning must be generated in this case. If you add the “#pragma warning(default : 4061)” directive, this warning will not be displayed, as it is turned off by default.
Correct code
#pragma warning(push)
#pragma warning(disable:4996)
LONG result = regKey.QueryValue(buf, _T(""), &buf_size);
#pragma warning(pop)
Recommendation
The correct way to return the previous state of a warning is to use directives “#pragma warning(push[ ,n ])” and “#pragma warning(pop)”. See the Visual C++ documentation for descriptions of these directives:Pragma Directives. Warnings.
Library developers should pay special attention to the V665 warning. Careless warning customization may cause a whole lot of trouble on the library users’ side.
A good article on this topic: So, You Want to Suppress This Warning in Visual C++
23. Evaluate the string literal length automatically
The fragment is taken from the OpenSSL library. The error is detected by the following PVS-Studio diagnostic: V666 Consider inspecting the third argument of the function ‘strncmp’. It is possible that the value does not correspond with the length of a string which was passed with the second argument.
if (!strncmp(vstart, "ASCII", 5))
arg->format = ASN1_GEN_FORMAT_ASCII;
else if (!strncmp(vstart, "UTF8", 4))
arg->format = ASN1_GEN_FORMAT_UTF8;
else if (!strncmp(vstart, "HEX", 3))
arg->format = ASN1_GEN_FORMAT_HEX;
else if (!strncmp(vstart, "BITLIST", 3))
arg->format = ASN1_GEN_FORMAT_BITLIST;
else
....
Explanation
It’s very hard to stop using magic numbers. Also, it would be very unreasonable to get rid of such constants as 0, 1, -1, 10. It’s rather difficult to come up with names for such constants, and often they will make reading of the code more complicated.
However, it’s very useful to reduce the number of magic numbers. For example, it would be helpful to get rid of magic numbers which define the length of string literals.
Let’s have a look at the code given earlier. The code was most likely written using the Copy-Paste method. A programmer copied the line:
else if (!strncmp(vstart, "HEX", 3))
After that “HEX” was replaced by “BITLIST”, but the programmer forgot to change 3 to 7. As a result, the string is not compared with “BITLIST”, only with “BIT”. This error might not be a crucial one, but still it is an error.
It’s really bad that the code was written using Copy-Paste. What’s worse is that the string length was defined by a magic constant. From time to time we come across such errors, where the string length does not correspond with the indicated number of symbols because of a typo or carelessness of a programmer. So it’s quite a typical error, and we have to do something about it. Let’s look closely at the question of how to avoid such errors.
Correct code
First it may seem that it’s enough to replace strncmp() call with strcmp(). Then the magic constant will disappear.
else if (!strcmp(vstart, "HEX"))
Too bad-we have changed the logic of the code work. The strncmp() function checks if the string starts with “HEX”, and the function strcmp() checks if the strings are equal. There are different checks.
The easiest way to fix this is to change the constant:
else if (!strncmp(vstart, "BITLIST", 7))
arg->format = ASN1_GEN_FORMAT_BITLIST;
This code is correct, but it is very bad because the magic 7 is still there. That’s why I would recommend a different method.
Recommendation
Such an error can be prevented if we explicitly evaluate the string length in the code. The easiest option is to use the strlen() function.
else if (!strncmp(vstart, "BITLIST", strlen("BITLIST")))In this case it will be much easier to detect a mismatch if you forget to fix one of the strings:
else if (!strncmp(vstart, "BITLIST", strlen("HEX")))But the suggested variant has two disadvantages:
- There is no guarantee that the compiler will optimize the strlen() call and replace it with a constant.
- You have to duplicate the string literal. It does not look graceful, and can be the subject of a possible error.
The first issue can be dealt with by using special structures for literal length evaluation during the compilation phase. For instance, you can use a macro such as:
#define StrLiteralLen(arg) ((sizeof(arg) / sizeof(arg[0])) - 1)
....
else if (!strncmp(vstart, "BITLIST", StrLiteralLen("BITLIST")))
But this macros can be dangerous. The following code can appear during the refactoring process:
const char *StringA = "BITLIST";
if (!strncmp(vstart, StringA, StrLiteralLen(StringA)))
In this case StrLiteralLen macro will return some nonsense. Depending on the pointer size (4 or 8 byte) we will get the value 3 or 7. But we can protect ourselves from this unpleasant case in C++ language, by using a more complicated trick:
template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define StrLiteralLen(str) (sizeof(ArraySizeHelper(str)) - 1)
Now, if the argument of the StrLiteralLen macro is a simple pointer, we won’t be able to compile the code.
Let’s have a look at the second issue (duplicating of the string literal). I have no idea what to say to C programmers. You can write a special macro for it, but personally I don’t like this variant. I am not a fan of macros. That’s why I don’t know what to suggest.
In C++ everything is fabulously awesome. Moreover, we solve the first problem in a really smart way. The template function will be of a great help to us. You can write it in different ways, but in general it will look like this:
template<typename T, size_t N>
int mystrncmp(const T *a, const T (&b)[N])
{
return _tcsnccmp(a, b, N - 1);
}
Now the string literal is used only once. The string literal length is evaluated during the compilation phase. You cannot accidentally pass a simple pointer to the function and incorrectly evaluate the string length. Presto!
Summary: try to avoid magic numbers when working with strings. Use macros or template functions; the code will become not only safer, but more beautiful and shorter.
As an example, you can look at the declaration of a function strcpy_s ():
errno_t strcpy_s(
char *strDestination,
size_t numberOfElements,
const char *strSource
);
template <size_t size>
errno_t strcpy_s(
char (&strDestination)[size],
const char *strSource
); // C++ only
The first variant is intended for the C language, or in the case of a buffer size not being known in advance. If we work with the buffer, created on the stack, then we can use the second variant in C++:
char str[BUF_SIZE];
strcpy_s(str, "foo");
There are no magic numbers, there is no evaluation of the buffer size at all. It’s short and sweet.
24. Override and final specifiers should become your new friends
The fragment is taken from the MFC library. The error is detected by the following PVS-Studio diagnostic: V301 Unexpected function overloading behavior. See first argument of function ‘WinHelpW’ in derived class ‘CFrameWndEx’ and base class ‘CWnd’.
class CWnd : public CCmdTarget {
....
virtual void WinHelp(DWORD_PTR dwData,
UINT nCmd = HELP_CONTEXT);
....
};
class CFrameWnd : public CWnd {
....
};
class CFrameWndEx : public CFrameWnd {
....
virtual void WinHelp(DWORD dwData,
UINT nCmd = HELP_CONTEXT);
....
};Explanation
When you override a virtual function it’s quite easy to make an error in the signature and to define a new function, which won’t be in any way connected with the function in the base class. There can be various errors in this case.
- Another type is used in the parameter of the overridden function.
- The overridden function has a different number of parameters, this can be especially crucial when there are many parameters.
- The overridden function differs in const modifier.
- The base class function is not a virtual one. It was assumed that the function in the derived class would override it in the base class, but in reality it hides it.
The same error can occur during the change of types or parameter quantity in the existing code, when the programmer changed the virtual function signature in almost the entire hierarchy, but forgot to do it in some derived class.
This error can appear particularly often during the porting process to the 64-bit platform when replacing the DWORD type with DWORD_PTR, LONG with LONG_PTR and so on. Details. This is exactly our case.
Even in the case of such an error the 32-bit system will work correctly, as both DWORD andDWORD_PTR are synonyms of unsigned long; but in 64-bit version there will be an error becauseDWORD_PTR is a synonym of unsigned __int64 there.
Correct code
class CFrameWndEx : public CFrameWnd {
....
virtual void WinHelp(DWORD_PTR dwData,
UINT nCmd = HELP_CONTEXT) override;
....
};Recommendation
Now we have a way to protect ourselves from the error we described above. Two new specifiers were added in C++11:
- Override — to indicate that the method is overriding a virtual method in a base class
- Final — to indicate that derived classes do not need to override this virtual method.
We are interested in the override specifier. This is an indication for the compiler to check if the virtual function is really overriding the base class function, and to issue an error if it isn’t.
If override was used when determining the function WinHelp in the CFrameWndEx class, we would have an error of compilation on a 64-bit version of an application. Thus the error could have been prevented at an early stage.
Always use the override specifier (or final), when overriding virtual functions. More details about overrideand final can be seen here:
- Cppreference.com. override specifier (since C++11)
- Cppreference.com. final specifier (since C++11)
- Wikipedia.org. Explicit overrides and final.
- Stackoverflow.com. ‘override’ in c++11.
25. Do not compare ‘this’ to nullptr anymore
The fragment is taken from CoreCLR project. This dangerous code is detected by the following PVS-Studio diagnostic: V704 ‘this == nullptr’ expression should be avoided — this expression is always false on newer compilers, because ‘this’ pointer can never be NULL.
bool FieldSeqNode::IsFirstElemFieldSeq()
{
if (this == nullptr)
return false;
return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}
Explanation
People used to compare this pointer with 0 / NULL / nullptr. It was a common situation when C++ was only in the beginning of its development. We have found such fragments doing “archaeological” research. I suggest reading about them in an article about checking Cfront. Moreover, in those days the value of this pointer could be changed, but it was so long ago that it was forgotten.
Let’s go back to the comparison of this with nullptr.
Now it is illegal. According to modern C++ standards, this can NEVER be equal to nullptr.
Formally the call of the IsFirstElemFieldSeq() method for a null-pointer this according to C++ standard leads to undefined behavior.
It seems that if this==0, then there is no access to the fields of this class while the method is executed. But in reality there are two possible unfavorable ways of such code implementation. According to C++ standards, this pointer can never be null, so the compiler can optimize the method call, by simplifying it to:
bool FieldSeqNode::IsFirstElemFieldSeq()
{
return m_fieldHnd == FieldSeqStore::FirstElemPseudoField;
}
There is one more pitfall, by the way. Suppose there is the following inheritance hierarchy.
class X: public Y, public FieldSeqNode { .... };
....
X * nullX = NULL;
X->IsFirstElemFieldSeq();Suppose that the Y class size is 8 bytes. Then the source pointer NULL (0x00000000) will be corrected in such a way, so that it points to the beginning of FieldSeqNode sub object. Then you have to offset it to sizeof(Y) byte. So this in the IsFirstElemFieldSeq() function will be 0x00000008. The “this == 0” check has completely lost its sense.
Correct code
It’s really hard to give an example of correct code. It won’t be enough to just remove this condition from the function. You have to do the code refactoring in such a way that you will never call the function, using the null pointer.
Recommendation
So, now the “if (this == nullptr)” is outlawed. However, you can see this code in many applications and libraries quite often (MFC library for instance). That’s why Visual C++ is still diligently comparing this to 0. I guess the compiler developers are not so crazy as to remove code that has been working properly for a dozen years.
But the law was enacted. So for a start let’s avoid comparing this to null. And once you have some free time, it will be really useful to check out all the illegal comparisons, and rewrite the code.
Most likely the compilers will act in the following way. First they will give us comparison warnings. Perhaps they are already giving them, I haven’t studied this question. And then at some point they’ll fully support the new standard, and your code will cease working altogether. So I strongly recommend that you start obeying the law, it will be helpful later on.
P.S. When refactoring you may need the Null object pattern.
Additional links on the topic:
26. Insidious VARIANT_BOOL
The fragment is taken from NAME project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V721 The VARIANT_BOOL type is utilized incorrectly. The true value (VARIANT_TRUE) is defined as -1. Inspect the first argument.
virtual HRESULT __stdcall
put_HandleKeyboard (VARIANT_BOOL pVal) = 0;
....
pController->put_HandleKeyboard(true);
Explanation:
There is quite a witty quote:
We all truck around a kind of original sin from having learned Basic at an impressionable age. (C) P.J. Plauger
And this hint is exactly on the topic of evil. VARIANT_BOOL type came to us from Visual Basic. Some of our present day programming troubles are connected with this type. The thing is that “true” is coded as -1 in it.
Let’s see the declaration of the type and the constants denoting true/false:
typedef short VARIANT_BOOL;
#define VARIANT_TRUE ((VARIANT_BOOL)-1)
#define VARIANT_FALSE ((VARIANT_BOOL)0)
It seems like there is nothing terrible in it. False is 0, and truth is not 0. So, -1 is quite a suitable constant. But it’s very easy to make an error by using true or TRUE instead of VARIANT_TRUE.
Correct code
pController->put_HandleKeyboard(VARIANT_TRUE);
Recommendation
If you see an unknown type, it’s better not to hurry, and to look up in the documentation. Even if the type name has a word BOOL, it doesn’t mean that you can place 1 into the variable of this type.
In the same way programmers sometimes make mistakes, when they use HRESULT type, trying to compare it with FALSE or TRUE and forgetting that:
#define S_OK ((HRESULT)0L)
#define S_FALSE ((HRESULT)1L)
So I really ask you to be very careful with any types which are new to you, and not to hasten when programming.
27. Guileful BSTR strings
Let’s talk about one more nasty data type — BSTR (Basic string or binary string).
The fragment is taken from VirtualBox project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V745 A ‘wchar_t *’ type string is incorrectly converted to ‘BSTR’ type string. Consider using ‘SysAllocString’ function.
....
HRESULT EventClassID(BSTR bstrEventClassID);
....
hr = pIEventSubscription->put_EventClassID(
L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}");
Explanation
Here’s how a BSTR type is declared:
typedef wchar_t OLECHAR;
typedef OLECHAR * BSTR;
At first glance it seems that “wchar_t *” and BSTR are one and the same things. But this is not so, and this brings a lot of confusion and errors.
Let’s talk about BSTR type to get a better idea of this case.
Here is the information from MSDN site. Reading MSDN documentation isn’t much fun, but we have to do it.
A BSTR (Basic string or binary string) is a string data type that is used by COM, Automation, and Interop functions. Use the BSTR data type in all interfaces that will be accessed from script. BSTR description:
- Length prefix. A four-byte integer that contains the number of bytes in the following data string. It appears immediately before the first character of the data string. This value does not include the terminating null character.
- Data string. A string of Unicode characters. May contain multiple embedded null characters.
- Terminator. Two null characters.
A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix.BSTRs are allocated using COM memory allocation functions, so they can be returned from methods without concern for memory allocation. The following code is incorrect:
BSTR MyBstr = L"I am a happy BSTR";
This code builds (compiles and links) correctly, but it will not function properly because the string does not have a length prefix. If you use a debugger to examine the memory location of this variable, you will not see a four-byte length prefix preceding the data string. Instead, use the following code:
BSTR MyBstr = SysAllocString(L"I am a happy BSTR");
A debugger that examines the memory location of this variable will now reveal a length prefix containing the value 34. This is the expected value for a 17-byte single-character string that is converted to a wide-character string through the inclusion of the “L” string modifier. The debugger will also show a two-byte terminating null character (0x0000) that appears after the data string.
If you pass a simple Unicode string as an argument to a COM function that is expecting a BSTR, the COM function will fail.
I hope this is enough to understand why we should separate the BSTR and simple strings of “wchar_t *”type.
Additional links:
- MSDN. BSTR.
- StackOverfow. Static code analysis for detecting passing a wchar_t* to BSTR.
- StackOverfow. BSTR to std::string (std::wstring) and vice versa.
- Robert Pittenger. Guide to BSTR and CString Conversions.
- Eric Lippert. Eric’s Complete Guide To BSTR Semantics.
Correct code
hr = pIEventSubscription->put_EventClassID(
SysAllocString(L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}"));
Recommendation
The tip resembles the previous one. If you see an unknown type, it’s better not to hurry, and to look it up in the documentation. This is important to remember, so it’s not a big deal that this tip was repeated once again.
28. Avoid using a macro if you can use a simple function
The fragment is taken from ReactOS project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V640 The code’s operational logic does not correspond with its formatting. The second statement will always be executed. It is possible that curly brackets are missing.
#define stat64_to_stat(buf64, buf) \
buf->st_dev = (buf64)->st_dev; \
buf->st_ino = (buf64)->st_ino; \
buf->st_mode = (buf64)->st_mode; \
buf->st_nlink = (buf64)->st_nlink; \
buf->st_uid = (buf64)->st_uid; \
buf->st_gid = (buf64)->st_gid; \
buf->st_rdev = (buf64)->st_rdev; \
buf->st_size = (_off_t)(buf64)->st_size; \
buf->st_atime = (time_t)(buf64)->st_atime; \
buf->st_mtime = (time_t)(buf64)->st_mtime; \
buf->st_ctime = (time_t)(buf64)->st_ctime; \
int CDECL _tstat(const _TCHAR* path, struct _stat * buf)
{
int ret;
struct __stat64 buf64;
ret = _tstat64(path, &buf64);
if (!ret)
stat64_to_stat(&buf64, buf);
return ret;
}
Explanation
This time the code example will be quite lengthy. Fortunately it’s rather easy, so it shouldn’t be hard to understand.
There was the following idea. If you manage to get file information by means of _tstat64() function, then put these data into the structure of _stat type. We use a stat64_to_stat macro to save data.
The macro is incorrectly implemented. The operations it executes are not grouped in blocks with curly brackets { }. As a result the conditional operator body is only the first string of the macro. If you expand the macro, you’ll get the following:
if (!ret)
buf->st_dev = (&buf64)->st_dev;
buf->st_ino = (&buf64)->st_ino;
buf->st_mode = (&buf64)->st_mode;
Consequently the majority of the structure members are copied regardless of the whether the information was successfully received or not.
This is certainly an error, but in practice it’s not a fatal one. The uninitialized memory cells are just copied in vain. We had a bit of luck here. But I’ve come across more serious errors, connected with such poorly written macros.
Correct code
The easiest variant is just to add curly brackets to the macro. To add do { …. } while (0) is a slightly better variant. Then after the macro and the function you can put a semicolon ‘;’.
#define stat64_to_stat(buf64, buf) \
do { \
buf->st_dev = (buf64)->st_dev; \
buf->st_ino = (buf64)->st_ino; \
buf->st_mode = (buf64)->st_mode; \
buf->st_nlink = (buf64)->st_nlink; \
buf->st_uid = (buf64)->st_uid; \
buf->st_gid = (buf64)->st_gid; \
buf->st_rdev = (buf64)->st_rdev; \
buf->st_size = (_off_t)(buf64)->st_size; \
buf->st_atime = (time_t)(buf64)->st_atime; \
buf->st_mtime = (time_t)(buf64)->st_mtime; \
buf->st_ctime = (time_t)(buf64)->st_ctime; \
} while (0)
Recommendation
I cannot say that macros are my favorite. I know there is no way to code without them, especially in C. Nevertheless I try to avoid them if possible, and would like to appeal to you not to overuse them. My macro hostility has three reasons:
- It’s hard to debug the code.
- It’s much easier to make an error.
- The code gets hard to understand especially when some macros use another macros.
A lot of other errors are connected with macros. The one I’ve given as an example shows very clearly that sometimes we don’t need macros at all. I really cannot grasp the idea of why the authors didn’t use a simple function instead. Advantages of a function over a macro:
- The code is simpler. You don’t have to spend additional time writing it and, aligning some wacky symbols \.
- The code is more reliable (the error given as an example won’t be possible in the code at all)
Concerning the disadvantages, I can only think of optimization. Yes, the function is called but it’s not that serious at all.
However, let’s suppose that it’s a crucial thing to us, and meditate on the topic of optimization. First of all, there is a nice keyword inline which you can use. Secondly, it would be appropriate to declare the function as static. I reckon it can be enough for the compiler to build in this function and not to make a separate body for it.
In point of fact you don’t have to worry about it at all, as the compilers have become really smart. Even if you write a function without any inline/static, the compiler will build it in; if it considers that it’s worth doing it. But don’t really bother going into such details. It’s much better to write a simple and understandable code, it’ll bring more benefit.
To my mind, the code should be written like this:
static void stat64_to_stat(const struct __stat64 *buf64,
struct _stat *buf)
{
buf->st_dev = buf64->st_dev;
buf->st_ino = buf64->st_ino;
buf->st_mode = buf64->st_mode;
buf->st_nlink = buf64->st_nlink;
buf->st_uid = buf64->st_uid;
buf->st_gid = buf64->st_gid;
buf->st_rdev = buf64->st_rdev;
buf->st_size = (_off_t)buf64->st_size;
buf->st_atime = (time_t)buf64->st_atime;
buf->st_mtime = (time_t)buf64->st_mtime;
buf->st_ctime = (time_t)buf64->st_ctime;
}
Actually we can make even more improvements here. In C++ for example, it’s better to pass not the pointer, but a reference. The usage of pointers without the preliminary check doesn’t really look graceful. But this is a different story, I won’t talk about it in a section on macros.
29. Use a prefix increment operator (++i) in iterators instead of a postfix (i++) operator
The fragment is taken from the Unreal Engine 4 project. Ineffective code is detected by the following PVS-Studio diagnostic: V803 Decreased performance. In case ‘itr’ is iterator it’s more effective to use prefix form of increment. Replace iterator++ with ++iterator.
void FSlateNotificationManager::GetWindows(....) const
{
for( auto Iter(NotificationLists.CreateConstIterator());
Iter; Iter++ )
{
TSharedPtr<SNotificationList> NotificationList = *Iter;
....
}
}
Explanation
If you hadn’t read the title of the article, I think it would’ve been quite hard to notice an issue in the code. At first sight, it looks like the code is quite correct, but it’s not perfect. Yes, I am talking about the postfix increment — ‘Iter++’. Instead of a postfix form of the increment iterator, you should rather use a prefix analogue, i.e. to substitute ‘Iter++’ for ‘++Iter’. Why should we do it, and what’s the practical value of it? Here is the story.
Effective code:
for( auto Iter(NotificationLists.CreateConstIterator());
Iter; ++Iter)
Recommendation
The difference between a prefix and a postfix form is well known to everybody. I hope that the internal structure distinctions (which show us the operational principles) are not a secret as well. If you have ever done the operator overloading, then you must be aware of it. If not — I’ll give a brief explanation. (All the others can skip this paragraph and go to the one, which follows the code examples with operator overloading)
The prefix increment operator changes an object’s state, and returns itself in the changed form. No temporary objects required. Then the prefix increment operator may look like this:
MyOwnClass& operator++()
{
++meOwnField;
return (*this);
}
A postfix operator also changes the object’s state but returns the previous state of the object. It does so by creating a temporary object, then the postfix increment operator overloading code will look like this:
MyOwnClass operator++(int)
{
MyOWnCLass tmp = *this;
++(*this);
return tmp;
}
Looking at these code fragments, you can see that an additional operation of creating a temporary object is used. How crucial is it in practice?
Today’s compilers are smart enough to do the optimization, and to not create temporary objects if they are of no use. That’s why in the Release version it’s really hard to see the difference between ‘it++’ and ‘++it’.
But it is a completely different story when debugging the program in the Debug-mode. In this case the difference in the performance can be really significant.
For example, in this article there are some examples of estimation of the code running time using prefix and postfix forms of increment operators in the Debug-version. We see that is almost 4 times longer to use the postfix forms.
Those, who will say, “And? In the Release version it’s all the same!” will be right and wrong at the same time. As a rule we spend more time working on the Debug-version while doing the Unit-tests, and debugging the program. So quite a good deal of time is spent working with the Debug version of software, which means that we don’t want to waste time waiting.
In general I think we’ve managed to answer the question — “Should we use the prefix increment operator (++i) instead a of postfix operator (i++) for iterators”. Yes, you really should. You’ll get a nice speed-up in the Debug version. And if the iterators are quite “heavy”, then the benefit will be even more appreciable.
References (reading recommendation):
- Is it reasonable to use the prefix increment operator ++it instead of postfix operator it++ for iterators?
- Pre vs. post increment operator — benchmark
30. Visual C++ and wprintf() function
The fragment is taken from Energy Checker SDK. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V576 Incorrect format. Consider checking the second actual argument of the ‘wprintf’ function. The pointer to string of wchar_t type symbols is expected.
int main(void) {
...
char *p = NULL;
...
wprintf(
_T("Using power link directory: %s\n"),
p
);
...
}Explanation
Note: The first error is in the usage of _T for specifying a string in wide-character format. To use L prefix will be the correct variant here. However this mistake is not a crucial one and is not of a big interest to us. The code simply won’t be compiled if we don’t use a wide-character format and _T will expand into nothing.
If you want a wprintf() function to print a char* type string, you should use “%S” in the format string.
Many Linux programmers don’t see where the pitfall is. The thing is that Microsoft quite strangely implemented such functions as wsprintf. If we work in Visual C++ with the wsprintf function, then we should use “%s” to print wide-character strings, at the same time to print char* strings we need “%S”. So it’s just a weird case. Those who develop cross platform applications quite often fall into this trap.
Correct code
The code I give here as a way to correct the issue is really not the most graceful one, but I still want to show the main point of corrections to make.
char *p = NULL;
...
#ifdef defined(_WIN32)
wprintf(L"Using power link directory: %S\n"), p);
#else
wprintf(L"Using power link directory: %s\n"), p);
#endif
Recommendation
I don’t have any particular recommendation here. I just wanted to warn you about some surprises you may get if you use functions such as wprintf().
Starting from Visual Studio 2015 there was a solution suggested for writing a portable code. For compatibility with ISO C (C99), you should point out to the preprocessor a _CRT_STDIO_ISO_WIDE_SPECIFIERS macro.
In this case the code:
const wchar_t *p = L"abcdef";
const char *x = "xyz";
wprintf(L"%S %s", p, x);
is correct.
The analyzer knows about _CRT_STDIO_ISO_WIDE_SPECIFIERS and takes it into account when doing the analysis.
By the way, if you turn on the compatibility mode with ISO C (the _CRT_STDIO_ISO_WIDE_SPECIFIERS macro is declared), you can get the old behavior, using the specifier of “%Ts” format.
In general the story about the wide — character symbols is quite intricate, and goes beyond the frames of one short article. To investigate the topic more thoroughly, I recommend doing some reading on the topic:
- Bug 1121290 — distinguish specifier s and ls in the printf family of functions
- MBCS to Unicode conversion in swprintf
- Visual Studio swprintf is making all my %s formatters want wchar_t * instead of char *
31. In C and C++ arrays are not passed by value
The fragment is taken from the game ‘Wolf’. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V511 The sizeof() operator returns size of the pointer, and not of the array, in ‘sizeof (src)’ expression.
ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
memcpy( mat, src, sizeof( src ) );
}Explanation
Sometimes programmers forget that in C/C++ you cannot pass an array to a function by value. This is because a pointer to an array is passed as an argument. Numbers in square brackets mean nothing, they only serve as a kind of hint to the programmer, which array size is supposed to be passed. In fact, you can pass an array of a completely different size. For example, the following code will be successfully compiled:
void F(int p[10]) { }
void G()
{
int p[3];
F(p);
}Correspondingly, the sizeof(src) operator evaluates not the array size, but the size of the pointer. As a result, memcpy() will only copy part of the array. Namely, 4 or 8 bytes, depending on the size of the pointer (exotic architectures don’t count).
Correct code
The simplest variant of such code can be like this:
ID_INLINE mat3_t::mat3_t( float src[ 3 ][ 3 ] ) {
memcpy(mat, src, sizeof(float) * 3 * 3);
}Recommendation
There are several ways of making your code more secure.
The array size is known. You can make the function take the reference to an array. But not everyone knows that you can do this, and even fewer people are aware of how to write it. So I hope that this example will be interesting and useful:
ID_INLINE mat3_t::mat3_t( float (&src)[3][3] )
{
memcpy( mat, src, sizeof( src ) );
}
Now, it will be possible to pass to the function an array only of the right size. And most importantly, thesizeof() operator will evaluate the size of the array, not a pointer.
Yet another way of solving this problem is to start using std::array class.
The array size is not known. Some authors of books on programming advise to use std::vectorclass, and other similar classes, but in practice it’s not always convenient.
Sometimes you want to work with a simple pointer. In this case you should pass two arguments to the function: a pointer, and the number of elements. However, in general this is bad practice, and it can lead to a lot of bugs.
In such cases, some thoughts given in “C++ Core Guidelines” can be useful to read. I suggest reading “Do not pass an array as a single pointer”. All in all it would be a good thing to read the “C++ Core Guidelines” whenever you have free time. It contains a lot of useful ideas.
32. Dangerous printf
The fragment is taken from TortoiseSVN project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V618 It’s dangerous to call the ‘printf’ function in such a manner, as the line being passed could contain format specification. The example of the safe code: printf(“%s”, str);
BOOL CPOFile::ParseFile(....)
{
....
printf(File.getloc().name().c_str());
....
}
Explanation
When you want to print or, for example, to write a string to the file, many programmers write code that resembles the following:
printf(str);
fprintf(file, str);
A good programmer should always remember that these are extremely unsafe constructions. The thing is, that if a formatting specifier somehow gets inside the string, it will lead to unpredictable consequences.
Let’s go back to the original example. If the file name is “file%s%i%s.txt”, then the program may crash or print some rubbish. But that’s only a half of the trouble. In fact, such a function call is a real vulnerability. One can attack programs with its help. Having prepared strings in a special way, one can print private data stored in the memory.
More information about these vulnerabilities can be found in this article. Take some time to look through it; I’m sure it will be interesting. You’ll find not only theoretical basis, but practical examples as well.
Correct code
printf("%s", File.getloc().name().c_str());Recommendation
Printf()-like functions can cause a lot of security related issues. It is better not to use them at all, but switch to something more modern. For example, you may find boost::format or std::stringstream quite useful.
In general, sloppy usage of the functions printf(), sprintf(), fprintf(), and so on, not only can lead to incorrect work of the program, but cause potential vulnerabilities, that someone can take advantage of.
33. Never dereference null pointers
This bug was found in GIT’s source code. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V595 The ‘tree’ pointer was utilized before it was verified against nullptr. Check lines: 134, 136.
void mark_tree_uninteresting(struct tree *tree)
{
struct object *obj = &tree->object;
if (!tree)
return;
....
}
Explanation
There is no doubt that it’s bad practice to dereference a null pointer, because the result of such dereferencing is undefined behavior. We all agree about the theoretical basis behind this.
But when it comes to practice, programmers start debating. There are always people who claim that this particular code will work correctly. They even bet their life for it — it has always worked for them! And then I have to give more reasons to prove my point. That’s why this article topic is another attempt to change their mind.
I have deliberately chosen such an example that will provoke more discussion. After the tree pointer is dereferenced, the class member isn’t just using, but evaluating, the address of this member. Then if (tree == nullptr), the address of the member isn’t used in any way, and the function is exited. Many consider this code to be correct.
But it is not so. You shouldn’t code in such a way. Undefined behavior is not necessarily a program crash when the value is written at a null address, and things like that. Undefined behavior can be anything. As soon as you have dereferenced a pointer which is equal to null, you get an undefined behavior. There is no point in further discussion about the way the program will operate. It can do whatever it wants.
One of the signs of undefined behavior is that the compiler can totally remove the “if (!tree) return;” — the compiler sees that the pointer has already been dereferenced, so the pointer isn’t null and the compiler concludes that the check can be removed. This is just one of a great many scenarios, which can cause the program to crash.
I recommend having a look at the article where everything is explained in more details:http://www.viva64.com/en/b/0306/
Correct code
void mark_tree_uninteresting(struct tree *tree)
{
if (!tree)
return;
struct object *obj = &tree->object;
....
}
Recommendation
Beware of undefined behavior, even if it seems as if everything is working fine. There is no need to risk that much. As I have already written, it’s hard to imagine how it may show its worth. Just try avoiding undefined behavior, even if it seems like everything works fine.
One may think that he knows exactly how undefined behavior works. And, he may think that this means that he is allowed to do something that others can’t, and everything will work. But it is not so. The next section is to underline the fact that undefined behavior is really dangerous.
34. Undefined behavior is closer than you think
This time it’s hard to give an example from a real application. Nevertheless, I quite often see suspicious code fragments which can lead to the problems described below. This error is possible when working with large array sizes, so I don’t know exactly which project might have arrays of this size. We don’t really collect 64-bit errors, so today’s example is simply contrived.
Let’s have a look at a synthetic code example:
size_t Count = 1024*1024*1024; // 1 Gb
if (is64bit)
Count *= 5; // 5 Gb
char *array = (char *)malloc(Count);
memset(array, 0, Count);
int index = 0;
for (size_t i = 0; i != Count; i++)
array[index++] = char(i) | 1;
if (array[Count - 1] == 0)
printf("The last array element contains 0.\n");
free(array);
Explanation
This code works correctly if you build a 32-bit version of the program; if we compile the 64-bit version, the situation will be more complicated.
A 64-bit program allocates a 5 GB buffer and initially fills it with zeros. The loop then modifies it, filling it with non-zero values: we use “| 1” to ensure this.
And now try to guess how the code will run if it is compiled in x64 mode using Visual Studio 2015? Have you got the answer? If yes, then let’s continue.
If you run a debug version of this program, it’ll crash because it’ll index out of bounds. At some point the index variable will overflow, and its value will become −2147483648 (INT_MIN).
Sounds logical, right? Nothing of the kind! This is an undefined behavior, and anything can happen.
To get more in-depth information, I suggest the following links:
- Integer overflow
- Understanding Integer Overflow in C/C++
- Is signed integer overflow still undefined behavior in C++?
An interesting thing — when I or somebody else says that this is an example of undefined behavior, people start grumbling. I don’t know why, but it feels like they assume that they know absolutely everything about C++, and how compilers work.
But in fact they aren’t really aware of it. If they knew, they would’t say something like this (group opinion):
This is some theoretical nonsense. Well, yes, formally the ‘int’ overflow leads to an undefined behavior. But it’s nothing more but some jabbering. In practice, we can always tell what we will get. If you add 1 to INT_MAX then we’ll have INT_MIN. Maybe somewhere in the universe there are some exotic architectures, but my Visual C++ / GCC compiler gives an incorrect result.
And now without any magic, I will give a demonstration of UB using a simple example, and not on some fairy architecture either, but a Win64-program.
It would be enough to build the example given above in the Release mode and run it. The program will cease crashing, and the warning “the last array element contains 0” won’t be issued.
The undefined behavior reveals itself in the following way. The array will be completely filled, in spite of the fact that the index variable of int type isn’t wide enough to index all the array elements. Those who still don’t believe me, should have a look at the assembly code:
int index = 0;
for (size_t i = 0; i != Count; i++)
000000013F6D102D xor ecx,ecx
000000013F6D102F nop
array[index++] = char(i) | 1;
000000013F6D1030 movzx edx,cl
000000013F6D1033 or dl,1
000000013F6D1036 mov byte ptr [rcx+rbx],dl
000000013F6D1039 inc rcx
000000013F6D103C cmp rcx,rdi
000000013F6D103F jne main+30h (013F6D1030h)
Here is the UB! And no exotic compilers were used, it’s just VS2015.
If you replace int with unsigned, the undefined behavior will disappear. The array will only be partially filled, and at the end we will have a message — “the last array element contains 0”.
Assembly code with the unsigned:
unsigned index = 0;
000000013F07102D xor r9d,r9d
for (size_t i = 0; i != Count; i++)
000000013F071030 mov ecx,r9d
000000013F071033 nop dword ptr [rax]
000000013F071037 nop word ptr [rax+rax]
array[index++] = char(i) | 1;
000000013F071040 movzx r8d,cl
000000013F071044 mov edx,r9d
000000013F071047 or r8b,1
000000013F07104B inc r9d
000000013F07104E inc rcx
000000013F071051 mov byte ptr [rdx+rbx],r8b
000000013F071055 cmp rcx,rdi
000000013F071058 jne main+40h (013F071040h)
Correct code
You must use proper data types for your programs to run properly. If you are going to work with large-size arrays, forget about int and unsigned. So the proper types are ptrdiff_t, intptr_t, size_t,DWORD_PTR, std::vector::size_type and so on. In this case it is size_t:
size_t index = 0;
for (size_t i = 0; i != Count; i++)
array[index++] = char(i) | 1;
Recommendation
If the C/C++ language rules result in undefined behavior, don’t argue with them or try to predict the way they’ll behave in the future. Just don’t write such dangerous code.
There are a whole lot of stubborn programmers who don’t want to see anything suspicious in shifting negative numbers, comparing this with null or signed types overflowing.
Don’t be like that. The fact that the program is working now doesn’t mean that everything is fine. The way UB will reveal itself is impossible to predict. Expected program behavior is one of the variants of UB.
35. Adding a new constant to enum don’t forget to correct switch operators
The fragment is taken from the Appleseed project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V719 The switch statement does not cover all values of the ‘InputFormat’ enum: InputFormatEntity.
enum InputFormat
{
InputFormatScalar,
InputFormatSpectralReflectance,
InputFormatSpectralIlluminance,
InputFormatSpectralReflectanceWithAlpha,
InputFormatSpectralIlluminanceWithAlpha,
InputFormatEntity
};
switch (m_format)
{
case InputFormatScalar:
....
case InputFormatSpectralReflectance:
case InputFormatSpectralIlluminance:
....
case InputFormatSpectralReflectanceWithAlpha:
case InputFormatSpectralIlluminanceWithAlpha:
....
}
Explanation
Sometimes we need to add a new item to an existing enumeration (enum), and when we do, we also need to proceed with caution — as we will have to check where we have referenced the enum throughout all of our code, e.g., in every switch statement and if chain. A situation like this can be seen in the code given above.
InputFormatEntity was added to the InputFormat — I’m making that assumption based on the fact that the constant has been added to the end. Often, programmers add new constants to the end of enum, but then forget to check their code to make sure that they’ve dealt with the new constant properly throughout, and corrected the switch operator.
As a result we have a case when “m_format==InputFormatEntity” isn’t handled in any way.
Correct code
switch (m_format)
{
case InputFormatScalar:
....
case InputFormatSpectralReflectance:
case InputFormatSpectralIlluminance:
....
case InputFormatSpectralReflectanceWithAlpha:
case InputFormatSpectralIlluminanceWithAlpha:
....
case InputFormatEntity:
....
}
Recommendation
Let’s think, how can we reduce such errors through code refactoring? The easiest, but not a very effective solution is to add a “default:”, that will cause a message to appear, e.g.:
switch (m_format)
{
case InputFormatScalar:
....
....
default:
assert(false);
throw "Not all variants are considered"
}
Now if the m_format variable is InputFormatEntity, we’ll see an exception. Such an approach has two big faults:
1. As there is the chance that this error won’t show up during testing (if during the test runs, m_format is not equal to InputFormatEntity), then this error will make its way into the Release build and would only show up later — during runtime at a customer’s site. It’s bad if customers have to report such problems!
2. If we consider getting into default as an error, then you have to write a case for all of the enum’s possible values. This is very inconvenient, especially if there are a lot of these constants in the enumeration. Sometimes it’s very convenient to handle different cases in the default section.
I suggest solving this problem in the following way; I can’t say that it’s perfect, but at least it’s something.
When you define an enum, make sure you also add a special comment. You can also use a keyword and an enumeration name.
Example:
enum InputFormat
{
InputFormatScalar,
....
InputFormatEntity
//If you want to add a new constant, find all ENUM:InputFormat.
};
switch (m_format) //ENUM:InputFormat
{
....
}
In the code above, when you change the InputFormat enum, you are directed to look for “ENUM:InputFormat” in the source code of the project.
If you are in a team of developers, you would make this convention known to everybody, and also add it to your coding standards and style guide. If somebody fails to follow this rule, it will be very sad.
36. If something strange is happening to your PC, check its memory
I think you got pretty tired looking at numerous error patterns. So this time, let’s take a break from looking at code.
A typical situation — your program is not working properly. But you have no idea what’s going on. In such situations I recommend not rushing to blame someone, but focus on your code. In 99.99% of cases, the root of the evil is a bug that was brought by someone from your development team. Very often this bug is really stupid and banal. So go ahead and spend some time looking for it!
The fact that the bug occurs from time to time means nothing. You may just have a Heisenbug
Blaming the compiler would be an even worse idea. It may do something wrong, of course, but very rarely. It will be very awkward if you find out that it was an incorrect use of sizeof(), for example. I have a post about that in my blog: The compiler is to blame for everything
But to set the record straight, I should say that there are exceptions. Very seldom the bug has nothing to do with the code. But we should be aware that such a possibility exists. This will help us to stay sane.
I’ll demonstrate this using an example of a case that once happened with me. Fortunately, I have the necessary screenshots.
I was making a simple test project that was intended to demonstrate the abilities of the Viva64 analyzer (the predecessor of PVS-Studio), and this project was refusing to work correctly.
After long and tiresome investigations, I saw that one memory slot is causing all this trouble. One bit, to be exact. You can see on the picture that I am in debug mode, writing the value “3” in this memory cell.

After the memory is changed, the debugger reads the values to display in the window, and shows number 2: See, there is 0x02. Although I’ve set the “3” value. The low-order bit is always zero.

A memory test program confirmed the problem. It’s strange that the computer was working normally without any problems. Replacement of the memory bank finally let my program work correctly.
I was very lucky. I had to deal with a simple test program. And still I spent a lot of time trying to understand what was happening. I was reviewing the assembler listing for more than two hours, trying to find the cause of the strange behavior. Yes, I was blaming the compiler for it.
I can’t imagine how much more effort it would take, if it were a real program. Thank God I didn’t have to debug anything else at that moment.
Recommendation
Always look for the error in your code. Do not try to shift responsibility.
However, if the bug reoccurs only on your computer for more than a week, it may be a sign that it’s not because of your code.
Keep looking for the bug. But before going home, run an overnight RAM test. Perhaps, this simple step will save your nerves.
37. Beware of the ‘continue’ operator inside do {…} while (…)
Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V696 The ‘continue’ operator will terminate ‘do { … } while (FALSE)’ loop because the condition is always false.
do {
....
if (appType.InitCheck() == B_OK
&& appType.GetAppHint(&hintRef) == B_OK
&& appRef == hintRef)
{
appType.SetAppHint(NULL);
// try again
continue;
}
....
} while (false);Explanation
The way continue works inside the do-while loop, is not the way some programmers expect it to. Whencontinue is encountered, there will always be a check of loop termination condition. I’ll try to explain this in more details. Suppose the programmer writes code like this:
for (int i = 0; i < n; i++)
{
if (blabla(i))
continue;
foo();
}
Or like this:
while (i < n)
{
if (blabla(i++))
continue;
foo();
}
Most programmers by intuition understand that when continue is encountered, the controlling condition (i < n) will be (re)evaluated, and that the next loop iteration will only start if the evaluation is true. But when a programmer writes code:
do
{
if (blabla(i++))
continue;
foo();
} while (i < n);
the intuition often fails, as they don’t see a condition above the continue, and it seems to them that thecontinue will immediately trigger another loop iteration. This is not the case, and continue does as it always does — causes the controlling condition to be re-evaluated.
It depends on sheer luck if this lack of understanding of continue will lead to an error. However, the error will definitely occur if the loop condition is always false, as it is in the code snippet given above, where the programmer planned to carry out certain actions through subsequent iterations. A comment in the code “//try again” clearly shows their intention to do so. There will of course be no “again”, as the condition is always false, and so once continue is encountered, the loop will terminate.
In other words, it turns out that in the construction of this do {…} while (false), the continue is equivalent to using break.
Correct code
There are many options to write correct code. For example, create an infinite loop, and use continue to loop, and break to exit.
for (;;) {
....
if (appType.InitCheck() == B_OK
&& appType.GetAppHint(&hintRef) == B_OK
&& appRef == hintRef)
{
appType.SetAppHint(NULL);
// try again
continue;
}
....
break;
};Recommendation
Try to avoid continue inside do { … } while (…). Even if you really know how it all works. The thing is that you could slip and make this error, and/or that your colleagues might read the code incorrectly, and then modify it incorrectly. I will never stop saying it: a good programmer is not the one who knows and uses different language tricks, but the one who writes clear understandable code, that even a newbie can comprehend.
38. Use nullptr instead of NULL from now on
New C++ standards brought quite a lot of useful changes. There are things which I would not rush into using straight away, but there are some changes which need to be applied immediately, as they will bring with them, significant benefits.
One such modernization is the keyword nullptr, which is intended to replace the NULL macro.
Let me remind you that in C++ the definition of NULL is 0, nothing more.
Of course, it may seem that this is just some syntactic sugar. And what’s the difference, if we write nullptror NULL? But there is a difference! Using nullptr helps to avoid a large variety of errors. I’ll show this using examples.
Suppose there are two overloaded functions:
void Foo(int x, int y, const char *name);
void Foo(int x, int y, int ResourceID);
A programmer might write the following call:
Foo(1, 2, NULL);
And that same programmer might be sure that he is in fact calling the first function by doing this. It is not so. As NULL is nothing more than 0, and zero is known to have int type, the second function will be called instead of the first.
However, if the programmer had used nullptr no such error would occur and the first function would have been called. Another common enough use of NULL is to write code like this:
if (unknownError)
throw NULL;
To my mind, it is suspicious to generate an exception passing the pointer. Nevertheless sometimes people do so. Apparently, the developer needed to write the code in this way. However, discussions on whether it is good or bad practice to do so, go beyond the scope of this note.
What is important, is that the programmer decided to generate an exception in the case of an unknown error and “send” a null pointer into the outer world.
In fact it is not a pointer but int. As a result the exception handling will happen in a way that the programmer didn’t expect.
“throw nullptr;” code saves us from misfortune, but this does not mean that I believe this code to be totally acceptable.
In some cases, if you use nullptr, the incorrect code will not compile.
Suppose that some WinApi function returns a HRESULT type. The HRESULT type has nothing to do with the pointer. However, it is quite possible to write nonsensical code like this:
if (WinApiFoo(a, b, c) != NULL)
This code will compile, because NULL is 0 and of int type, and HRESULT is a long type. It is quite possible to compare values of int and long type. If you use nullptr, then the following code will not compile:
if (WinApiFoo(a, b, c) != nullptr)
Because of the compiler error, the programmer will notice and fix the code.
I think you get the idea. There are plenty such examples. But these are mostly synthetic examples. And it is always not very convincing. So are there any real examples? Yes, there are. Here is one of them. The only thing — it’s not very graceful or short.
This code is taken from the MTASA project.
So, there exists RtlFillMemory(). This can be a real function or a macro. It doesn’t matter. It is similar to the memset() function, but the 2nd and 3rd argument switched their places. Here’s how this macro can be declared:
#define RtlFillMemory(Destination,Length,Fill) \
memset((Destination),(Fill),(Length))
There is also FillMemory(), which is nothing more than RtlFillMemory():
#define FillMemory RtlFillMemory
Yes, everything is long and complicated. But at least it is an example of real erroneous code.
And here’s the code that uses the FillMemory macro.
LPCTSTR __stdcall GetFaultReason ( EXCEPTION_POINTERS * pExPtrs )
{
....
PIMAGEHLP_SYMBOL pSym = (PIMAGEHLP_SYMBOL)&g_stSymbol ;
FillMemory ( pSym , NULL , SYM_BUFF_SIZE ) ;
....
}
This code fragment has even more bugs. We can clearly see that at least the 2 and 3 arguments are confused here. That’s why the analyzer issues 2 warnings V575:
- V575 The ‘memset’ function processes value ‘512’. Inspect the second argument. crashhandler.cpp 499
- V575 The ‘memset’ function processes ‘0’ elements. Inspect the third argument. crashhandler.cpp 499
The code compiled because NULL is 0. As a result, 0 array elements get filled. But in fact the error is not only about this. NULL is in general not appropriate here. The memset() function works with bytes, so there’s no point in trying to make it fill the memory with NULL values. This is absurd. Correct code should look like this:
FillMemory(pSym, SYM_BUFF_SIZE, 0);
Or like this:
ZeroMemory(pSym, SYM_BUFF_SIZE);
But it’s not the main point, which is that this meaningless code compiles successfully. However, if the programmer had gotten into the habit of using nullptr instead of NULL and written this instead:
FillMemory(pSym, nullptr, SYM_BUFF_SIZE);
the complier would have emitted a error message, and the programmer would realize that they did something wrong, and would pay more attention to the way they code.
Note. I understand that in this case NULL is not to blame. However, it is because of NULL that the incorrect code compiles without any warnings.
Recommendation
Start using nullptr. Right now. And make necessary changes in the coding standard of your company.
Using nullptr will help to avoid stupid errors, and thus will slightly speed up the development process.
39. Why incorrect code works
This bug was found in Miranda NG’s project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V502 Perhaps the ‘?:’ operator works in a different way than was expected. The ‘?:’ operator has a lower priority than the ‘|’ operator..
#define MF_BYCOMMAND 0x00000000L
void CMenuBar::updateState(const HMENU hMenu) const
{
....
::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED);
....
}
Explanation
We have seen a lot of cases that lead to incorrect working of the program, this time I would like to raise a different thought-provoking topic for discussion. Sometimes we see that totally incorrect code happens, against all odds, to work just fine! Now, for experienced programmers this really comes as no surprise (another story), but for those that have recently started learning C/C++, well, it might be a little baffling. So today, we’ll have a look at just such an example.
In the code shown above, we need to call CheckMenuItem() with certain flags set; and, on first glance we see that if bShowAvatar is true, then we need to bitwise OR MF_BYCOMMAND with MF_CHECKED — and conversely, with MF_UNCHECKED if it’s false. Simple!
In the code above the programmers have chosen the very natural ternary operator to express this (the operator is a convenient short version of if-then-else):
MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED
The thing is that the priority of |operator is higher than of ?: operator. (see Operation priorities in C/C++). As a result, there are two errors at once.
The first error is that the condition has changed. It is no longer — as one might read it — “dat->bShowAvatar”, but “MF_BYCOMMAND | dat->bShowAvatar”.
The second error — only one flag gets chosen — either MF_CHECKED or MF_UNCHECKED. The flag MF_BYCOMMAND is lost.
But despite these errors the code works correctly! Reason — sheer stroke of luck. The programmer was just lucky that the MF_BYCOMMAND flag is equal to 0x00000000L. As the MF_BYCOMMAND flag is equal to 0, then it doesn’t affect the code in any way. Probably some experienced programmers have already gotten the idea, but I’ll still give some comments in case there are beginners here.
First let’s have a look at a correct expression with additional parenthesis:
MF_BYCOMMAND | (dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED)
Replace macros with numeric values:
0x00000000L | (dat->bShowAvatar ? 0x00000008L : 0x00000000L)
If one of the operator operands | is 0, then we can simplify the expression:
dat->bShowAvatar ? 0x00000008L : 0x00000000L
Now let’s have a closer look at an incorrect code variant:
MF_BYCOMMAND | dat->bShowAvatar ? MF_CHECKED : MF_UNCHECKED
Replace macros with numeric values:
0x00000000L | dat->bShowAvatar ? 0x00000008L : 0x00000000L
In the subexpression “0x00000000L | dat->bShowAvatar” one of the operator operands | is 0. Let’s simplify the expression:
dat->bShowAvatar ? 0x00000008L : 0x00000000L
As a result we have the same expression, this is why the erroneous code works correctly; another programming miracle has occurred.
Correct code
There are various ways to correct the code. One of them is to add parentheses, another — to add an intermediate variable. A good old if operator could also be of help here:
if (dat->bShowAvatar)
::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
MF_BYCOMMAND | MF_CHECKED);
else
::CheckMenuItem(hMenu, ID_VIEW_SHOWAVATAR,
MF_BYCOMMAND | MF_UNCHECKED);
I really don’t insist on using this exact way to correct the code. It might be easier to read it, but it’s slightly lengthy, so it’s more a matter of preferences.
Recommendation
My recommendation is simple — try to avoid complex expressions, especially with ternary operators. Also don’t forget about parentheses.
As it was stated before in chapter N4, the ?: is very dangerous. Sometimes it just slips your mind that it has a very low priority and it’s easy to write an incorrect expression. People tend to use it when they want to clog up a string, so try not to do that.
40. Start using static code analysis
It is strange to read such big pieces of text, written by a developer of a static code analyzer, and not to hear recommendations about the usage of it. So here it is.
Fragment taken from the Haiku project (inheritor of BeOS). The code contains an error that PVS-Studio analyzer diagnoses in the following way: V501 There are identical sub-expressions to the left and to the right of the ‘<’ operator: lJack->m_jackType < lJack->m_jackType
int compareTypeAndID(....)
{
....
if (lJack && rJack)
{
if (lJack->m_jackType < lJack->m_jackType)
{
return -1;
}
....
}
Explanation
It’s just a usual typo. Instead of rJack it was accidentally written lJack in the right part of the expression.
This typo is a simple one indeed, but the situation is quite complicated. The thing is that the programming style, or other methods, are of no help here. People just make mistakes while typing and there is nothing you can do about it.
It’s important to emphasize that it’s not a problem of some particular people or projects. No doubt, all people can be mistaken, and even professionals involved in serious projects can be. Here is the proof of my words. You can see the simplest misprints like A == A, in such projects as: Notepad++, WinMerge, Chromium, Qt, Clang, OpenCV, TortoiseSVN, LibreOffice, CoreCLR, Unreal Engine 4 and so on.
So the problem is really there and it’s not about students’ lab works. When somebody tells me that experienced programmers don’t make such mistakes, I usually send them this link.
Correct code
if (lJack->m_jackType < rJack->m_jackType)
Recommendation
First of all, let’s speak about some useless tips.
- Be careful while programming, and don’t let errors sneak into your code (Nice words, but nothing more)
- Use a good coding style (There isn’t s a programming style which can help to avoid errors in the variable name)
What can really be effective?
- Code review
- Unit tests (TDD)
- Static code analysis
I should say right away, that every strategy has its strong and weak sides. That’s why the best way to get the most efficient and reliable, code is to use all of them together.
Code reviews can help us to find a great deal of different errors, and on top of this, they help us to improve readability of the code. Unfortunately shared reading of the text is quite expensive, tiresome and doesn’t give a full validity guarantee. It’s quite hard to remain alert, and find a typo looking at this kind of code:
qreal l = (orig->x1 - orig->x2)*(orig->x1 - orig->x2) +
(orig->y1 - orig->y2)*(orig->y1 - orig->y1) *
(orig->x3 - orig->x4)*(orig->x3 - orig->x4) +
(orig->y3 - orig->y4)*(orig->y3 - orig->y4);
Theoretically, unit tests can save us. But it’s only in theory. In practice, it’s unreal to check all the possible execution paths; besides that, a test itself can have some errors too :)
Static code analyzers are mere programs, and not artificial intelligence. An analyzer can skip some errors and, on the contrary, display an error message for code which in actuality, is correct. But despite all these faults, it is a really useful tool. It can detect a whole lot of errors at an early stage.
A static code analyzer can be used as a cheaper version of Code Review. The program examines the code instead of a programmer doing it, and suggests checking certain code fragments more thoroughly.
Of course I would recommend using PVS-Studio code analyzer, which we are developing. But it’s not the only one in the world; there are plenty of other free and paid tools to use. For example you can start with having a look at a free open Cppcheck analyzer. A good number of tools is given on Wikipedia: List of tools for static code analysis.
Attention:
- A static analyzer can hurt your brain if not used correctly. One of the typical mistakes is to “get the maximum from the check mode options, and drown in the stream of warnings messages”. That’s one of many recommendations I could give, so to get a bigger list, could be useful to go to A, B.
- A static analyzer should be used on a regular basis, not just from time to time, or when everything gets really bad. Some explanations: C, D.
Really, try using static code analyzers, you’ll like them. It’s a very nice sanitary tool.
Finally I would recommend reading an article by John Camrack: Static Code Analysis.
41. Avoid adding a new library to the project
Suppose you need to implement an X functionality in your project. Theorists of software development will say that you have to take the already existing library Y, and use it to implement the things you need. In fact, it is a classic approach in software development — reusing your own or others’ previously created libraries (third-party libraries). And most programmers use this way.
However, those theorists in various articles and books, forget to mention what hell it will become to support several dozen third-party libraries in about 10 years.
I strongly recommend avoiding adding a new library to a project. Please don’t get me wrong. I am not saying that you shouldn’t use libraries at all, and write everything yourself. This would be insufficient, of course. But sometimes a new library is added to the project at the whim of some developer, intending to add a little cool small “feature” to the project. It’s not hard to add a new library to the project, but then the whole team will have to carry the load of its support for many years.
Tracking the evolution of several large projects, I have seen quite a lot of problems caused by a large number of third-party libraries. I will probably enumerate only some of the issues, but this list should already provoke some thoughts:
- Adding new libraries promptly increases the project size. In our era of fast Internet and large SSD drives, this is not a big problem, of course. But, it’s rather unpleasant when the download time from the version control system turns into 10 minutes instead of 1.
- Even if you use just 1% of the library capabilities, it is usually included in the project as a whole. As a result, if the libraries are used in the form of compiled modules (for example, DLL), the distribution size grows very fast. If you use the library as source code, then the compile time significantly increases.
- Infrastructure connected with the compilation of the project becomes more complicated. Some libraries require additional components. A simple example: we need Python for building. As a result, in some time you’ll need to have a lot of additional programs to build a project. So the probability that something will fail increases. It’s hard to explain, you need to experience it. In big projects something fails all the time, and you have to put a lot of effort into making everything work and compile.
- If you care about vulnerabilities, you must regularly update third-party libraries. It would be of interest to violators, to study the code libraries to search for vulnerabilities. Firstly, many libraries are open-source, and secondly, having found a weak point in one of the libraries, you can get a master exploit to many applications where the library is used.
- One the libraries may suddenly change the license type. Firstly, you have to keep that in mind, and track the changes. Secondly, it’s unclear what to do if that happens. For example, once, a very widely used library softfloat moved to BSD from a personal agreement.
- You will have troubles upgrading to a new version of the compiler. There will definitely be a few libraries that won’t be ready to adapt for a new compiler, you’ll have to wait, or make your own corrections in the library.
- You will have problems when moving to a different compiler. For example, you are using Visual C++, and want to use Intel C++. There will surely be a couple of libraries where something is wrong.
- You will have problems moving to a different platform. Not necessarily even a totally different platform. Let’s say, you’ll decide to port a Win32 application to Win64. You will have the same problems. Most likely, several libraries won’t be ready for this, and you’ll wonder what to do with them. It is especially unpleasant when the library is lying dormant somewhere, and is no longer developing.
- Sooner or later, if you use lots of C libraries, where the types aren’t stored in namespace, you’ll start having name clashes. This causes compilation errors, or hidden errors. For example, a wrong enumconstant can be used instead of the one you’ve intended to use.
- If your project uses a lot of libraries, adding another one won’t seem harmful. We can draw an analogy with the broken windows theory. But consequently, the growth of the project turns into uncontrolled chaos.
- And there could be a lot of other downsides in adding new libraries, which I’m probably not aware of. But in any case, additional libraries increase the complexity of project support. Some issues can occur in a fragment where they were least expected to.
Again, I should emphasize; I don’t say that we should stop using third-party libraries at all. If we have to work with images in PNG format in the program, we’ll take the LibPNG library, and not reinvent the wheel.
But even working with PNG we need to stop and think. Do we really need a library? What do we want to do with the images? If the task is just to save an image in *.png file, you can get by with system functions. For example, if you have a Windows application, you could use WIC. And if you’re already using an MFC library, there is no need to make the code more sophisticated, because there’s a CImage class (see the discussion on StackOverflow). Minus one library — great!
Let me give you an example from my own practice. In the process of developing the PVS-Studio analyzer, we needed to use simple regular expressions in a couple of diagnostics. In general, I am convinced that static analysis isn’t the right place for regular expressions. This is an extremely inefficient approach. I even wrote an article regarding this topic. But sometimes you just need to find something in a string with the help of a regular expression.
It was possible to add existing libraries, but it was clear that all of them would be redundant. At the same time we still needed regular expressions, and we had to come up with something.
Absolutely coincidentally, exactly at that moment I was reading a book “Beautiful Code” (ISBN 9780596510046). This book is about simple and elegant solutions. And there I came across an extremely simple implementation of regular expressions. Just a few dozen strings. And that’s it!
I decided to use that implementation in PVS-Studio. And you know what? The abilities of this implementation are still enough for us; complex regular expressions are just not necessary for us.
Conclusion: Instead of adding a new library, we spent half an hour writing a needed functionality. We suppressed the desire to use one more library. And it turned out to be a great decision; the time showed that we really didn’t need that library. And I am not talking about several months, we have happily used it for more than five years.
This case really convinced me that the simpler solution, the better. By avoiding adding new libraries (if possible), you make your project simpler.
Readers may be interested to know what the code for searching regular expressions was. We’ll type it here from the book. See how graceful it is. This code was slightly changed when integrating to PVS-Studio, but its main idea remains unchanged. So, the code from the book:
// regular expression format
// c Matches any "c" letter
//.(dot) Matches any (singular) symbol
//^ Matches the beginning of the input string
//$ Matches the end of the input string
# Match the appearance of the preceding character zero or
// several times
int matchhere(char *regexp, char *text);
int matchstar(int c, char *regexp, char *text);
// match: search for regular expression anywhere in text
int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
// matchhere: search for regexp at beginning of text
int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
// matchstar: search for c*regexp at beginning of text
int matchstar(int c, char *regexp, char *text)
{
do { /* * a * matches zero or more instances */
more instances */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
Yes, this version is extremely simple, but for several years there was need to use more complex solutions. It really has got limited functionality, but there was no need to add anything more complicated, and I don’t think there will be. This is a good example of where a simple solution turned out to be better than a complex one.
Recommendation
Don’t hurry to add new libraries to the project; add one only when there is no other way to manage without a library.
Here are the possible workarounds:
- Have a look if the API of your system, or one of the already used libraries has a required functionality. It’s a good idea to investigate this question.
- If you plan to use a small piece of functionality from the library, then it makes sense to implement it yourself. The argument to add a library “just in case” is no good. Almost certainly, this library won’t be used much in the future. Programmers sometimes want to have universality that is actually not needed.
- If there are several libraries to resolve your task, choose the simplest one, which meets your needs. As I have stated before, get rid of the idea “it’s a cool library — let’s take it just in case”
- Before adding a new library, sit back and think. Maybe even take a break, get some coffee, discuss it with your colleagues. Perhaps you’ll realsie that you can solve the problem in a completely different way, without using third-party libraries.
P.S. The things I speak about here may not be completely acceptable to everyone. For example, the fact that I’m recommending the use of WinAPI, instead of a universal portable library. There may arise objections based on the idea that going this way “binds” this project to one operating system. And then it will be very difficult to make a program portable. But I do not agree with this. Quite often the idea “and then we’ll port it to a different operating system” exists only in the programmer’s mind. Such a task may even be unnecessary for managers. Another option — the project will kick the bucket due to the complexity and universality of it before gaining popularity and having the necessity to port. Also don’t forget about point (8) in the list of problems, given above.
42. Don’t use function names with “empty”
The fragment is taken from WinMerge project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V530 The return value of function ‘empty’ is required to be utilized.
void CDirView::GetItemFileNames(
int sel, String& strLeft, String& strRight) const
{
UINT_PTR diffpos = GetItemKey(sel);
if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
{
strLeft.empty();
strRight.empty();
}
....
}
Explanation
A programmer wanted to clean the strLeft and strRight strings. They have String type, which is nothing else than std::wstring.
For this purpose he called the empty() function. And this is not correct. The empty() function doesn’t change the object, but returns the information if the string is empty or not.
Correct code
To correct this error you should replace the empty() function with clear() or erase (). WinMerge developers preferred erase() and now the code looks like this:
if (diffpos == (UINT_PTR)SPECIAL_ITEM_POS)
{
strLeft.erase();
strRight.erase();
}
Recommendation
In this case the name “empty()” is really inappropriate. The thing is that in different libraries, this function can mean two different actions.
In some libraries the emply() function clears the object. In other ones, it returns the information if the object is empty or not.
I would say that the word “empty” is lame in general, because everybody understands it differently. Some think it’s an “action”, others that it’s “information inquiry”. That’s the reason for the mess we can see.
There is just one way out. Do not use “empty” in the class names.
- Name the function for cleaning as “erase” or “clear”. I would rather use “erase”, because “clear” can be quite ambiguous.
- Choose another name for the function which gets information, “isEmpty” for instance.
If you for some reason think that it’s not a big deal, then have a look here. It’s quite a widespread error pattern. Of course it’s slightly late to change such classes as std::string, but at least let’s try not to spread the evil any longer.
Conclusion
I hope you enjoyed this collection of tips. Of course, it is impossible to write about all the ways to write a program incorrectly, and there is probably no point in doing this. My aim was to warn a programmer, and to develop a sense of danger. Perhaps, next time when a programmer encounters something odd, he will remember my tips and won’t haste. Sometimes several minutes of studying the documentation or writing simple/clear code can help to avoid a hidden error that would make the life of your colleagues and users miserable for several years.
I also invite everybody to follow me on Twitter @Code_Analysis
Bugless coding!
Sincerely, Andrey Karpov.