Modern C++ In-Depth — Is string_view Worth It?
std::string_view makes it easier to write generic code that can accept read-only references to character sequences, regardless of the underlying container that holds that data. String parsing and tokenization workflows may improve performance by avoiding unnecessary copies. However, like normal references, there is potential for misuse. This post will examine what string_view
is, and more importantly what it is not, so you can make the best choices for your programs.
A brief history of strings at FactSet
The history of string data types in C++ is long and winding. Prior to C++98, there was no standardized string class included in the language. In those days, C++ relied on the null-terminated char arrays it inherited from C. Compiler vendors and early C++ adopters filled in the gap with their own proprietary string classes to provide automatic memory management and richer interfaces for operating on string data.
At FactSet, we adopted C++ in the early 1990’s during the pre-ISO-standard era. We initially used the string classes included with the compilers supplied by our OS vendors. Unfortunately, by the time a mature implementation of std::string
became available to us, these string classes were heavily embedded throughout our multi-million line code base. Code that deals with more than one string type must either incur the performance penalty that comes with converting between them, or regress to C-style const char*
as the lingua franca.
Enter string_view
Introduced in C++17, std::string_view
provides a better way to pass
references to string data around in your program. Conceptually, a string_view
is a reference to read-only string data. Just like a reference, a string_view
does not take ownership of the data it refers to. The lifetime of that memory must be managed external to the string_view
. It can be thought of as a const char*
and length, or as a pair of begin/end const char*
pointers.
Usage
Use string_view
anywhere you would have previously used a non-owning pointer or reference to const
string data. For example, string_view
is often a good replacement for:
const char*
and a lengthconst std::string&
- begin/end pair of
const_iterator
from a string-like class (requires C++20)
As an illustration, consider a function that tests if a string begins with a
given prefix. We might write it like this:
bool has_prefix(const char* str, const char* prefix)
{
return std::strncmp(str, prefix, std::strlen(prefix)) == 0;
}
This code will work with most string types but has a few drawbacks:
- the original strings must be converted to
const char*
- only works for null-terminated strings
- slower than necessary for string types that store their length (
strlen
recomputes it)
To get around these issues, we might be tempted to write multiple overloads of this function for each string type. Or we might further complicate the interface by introducing a prefix_size
parameter. With string_view
, this function becomes simpler:
bool has_prefix(std::string_view str, std::string_view prefix)
{
return str.substr(0, prefix.size()) == prefix;
}
The substr
function creates a new string_view
referring to a subset of the original string in constant time. The operator==
then compares the contents referred to by the two views. Note: in C++20, this entire function could be replaced with string_view::starts_with()
.
When using string_view
as a function parameter or return value, prefer to pass it by value rather than by reference. It is small, and is designed to
mimic a reference:
void inspect_string(std::string_view s); // DO THIS
void insepct_string(const std::string_view& s) // NOT THIS
Gotchas
There are a few gotchas to be aware of when using string_view
.
Lifetime
std::string_view
models a non-owning reference, so we must ensure the string data out-lives the string_view
object. All the usual safety rules for references apply equally to string_view
. For example, be careful not to
return a string_view
from a function if it refers to a function-local string
object.
Also, be aware that string_view
will not extend the lifetime of a temporary object like a normal reference to const
will. Suppose we had a function get_name()
that returned a string object:
// SAFE - lifetime extended to scope of 'longer_name'
const std::string& longer_name = get_name() + " foo";
// ERROR - 'bad_name' refers to temporary whose scope ends on this line
const std::string_view bad_name = get_name() + " foo";
Nulls
std::string_view
is not null terminated. It is not a generic wrapper
around a proper string object. This gives it the flexibility to refer to a fragment of a larger string, enables efficient slicing operations (e.g., substr
, remove_prefix
, remove_suffix
), and also allows for embedded null characters (which std::string
supports).
This also explains why there is no c_str()
function, only data()
andsize()
. Any call to data()
without a corresponding call to size()
is
likely a coding error.
Guidance
How should you get started using string_view
? What if you’re modernizing an existing code base that already uses std::string
references everywhere?
- Functions that accept a
const char*
orconst string&
(of any string type) parameter, consider replacing withstring_view
unless:
- you’re passing the argument to a function requiringconst string&
or other null-terminated string (e.g.,fopen
orprintf
)
- you’re copying the data to a new string object (see below) string_view
knows how to print itself withoperator<<
- Standard associative containers using strings as keys will accept
string_view
in their lookup functions. Support for unordered containers was added in C++20. - A
string_view
can be stored in a container, where a normal reference cannot. Be aware of the lifetime of the underlying character sequence.
The bit about copying string data requires some explanation. Yes, you’re only reading from it. But some string types (including std::string
on certain platforms) have copy-on-write semantics. Or perhaps the caller already has a temporary object of the needed type and could have moved from it. Forcing an explicit copy operation in those cases would prevent such optimizations.
Instead, consider accepting the destination string type by value. This allows the caller access to the full set of constructors to efficiently perform
the copy. You can then efficiently move it into place.
struct Person
{
std::string m_name;
void set_name(std::string name)
{
m_name = std::move(name);
}
};
Other posts in this series
The Modern C++ In-Depth series has explored some of the more technically challenging features of C++11 and beyond. Other topics we have covered previously:
- Move semantics, part 1 and part 2
- Perfect forwarding
- Variadic templates
- Lambda expressions, part 1, part 2, and part 3
- User-defined literals
Acknowledgments
Special thanks to all who contributed to this blog post:
Authors: Jim Arena and Michael Kristofik
Reviewers: James Abbatiello, Jennifer Ma, Jens Maurer, and Jason Wang