Member preview

Variable Expansion in Strings

Ben Key: Ben.Key@YekNeb.com

December 6, 2013; November 7, 2018


A common task in C and C++ is to build a string out of a template string containing variable placeholders, often called format specifiers, and additional data. This article describes several options that are available for solving this problem and introduces two versions of an ExpandVars function as an alternative to those solutions.

Problem Description

The simplest description of the problem can be best described via an expansion of the classic Hello World program that is so often the first program one learns to write when learning a new programming language.

The simplest implementation of Hello World in C++ is as follows.


#include <iostream>

int main()
{
std::cout << "Hello, world!\n";
return 0;
}

Source: Variable Expansion in Strings — Example 1


What if you wanted to modify this program to first ask you for your name and then display a more personal greeting? One way to do this is as follows.


#include <iostream>
#include <string>

int main()
{
std::cout << "What is your name?\n";
std::string name;
std::getline(std::cin, name);
std::cout << "Hello, " << name << "!\n";
return 0;
}

Source: Variable Expansion in Strings — Example 2


The problem with this approach is that it is not very extensible. It can also be very unwieldy when you have multiple variables that you need to print out.

For example, imagine you have the following person structure and you want to display a message containing all of the fields in the person structure.


struct person
{
std::string firstName;
std::string middleName;
std::string lastName;
std::string streetAddress1;
std::string streetAddress2;
std::string city;
std::string state;
std::string zip;
};

Source: Variable Expansion in Strings — Example 3


One possible explanation of a function to display a message containing all of the fields in the person structure is as follows.


void PrintPersonWithStream(const person& p)
{
std::cout
<< "First Name: " << p.firstName << "\n"
<< "Middle Name: " << p.middleName << "\n"
<< "Last Name: " << p.lastName << "\n"
<< "Street Address 1: " << p.streetAddress1 << "\n"
<< "Street Address 2: " << p.streetAddress2 << "\n"
<< "City: " << p.city << "\n"
<< "State: " << p.state << "\n"
<< "Zip: " << p.zip << "\n";
}

Source: Variable Expansion in Strings — Example 3


As you can see, this function is rather unwieldy. It would be far simpler to be able to write something like the following.


void PrintPerson(const person& p)
{
const char FormatString[] = R"(
First Name: {FormatSpecifier}
Middle Name: {FormatSpecifier}
Last Name: {FormatSpecifier}
Street Address 1: {FormatSpecifier}
Street Address 2: {FormatSpecifier}
City: {FormatSpecifier}
State: {FormatSpecifier}
Zip: {FormatSpecifier}
)";
std::string message = SomeFormatFunction(
FormatString,
p.firstName, p.middleName, p.lastName,
p.streetAddress1, p.streetAddress2,
p.city, p.state, p.zip);
std::cout << message << "\n";
}

In the above example {FormatSpecifier} will be replaced with a bit of text that causes the text of the appropriate variable to be inserted at the appropriate place in the final string. It will vary depending on the solution you use.

The benefit of this type of solution is that it is far less verbose and it is far less work to change the order of variables in the final output and to add a variable.

You might ask why being able to change the order of variable counts. A simple answer is if your program supports several languages and you need to change the order of items such as dates to account for standards used by a given language.

Using the Standard C Library Functions printf or sprintf

One option is to simply make use of the Standard C library functions printf, or if you need to store the output in a string, sprintf. The printf function writes formatted data to stdout. The sprintf function writes formatted data to a string. These functions can be used as follows.

First, add the following function to the person structure.


static std::string GetPrintfFormatString()
{
static const char FormatString[] = R"(
First Name: %s
Middle Name: %s
Last Name: %s
Street Address 1: %s
Street Address 2: %s
City: %s
State: %s
Zip: %s
)";
return FormatString;
}

Source: Variable Expansion in Strings — Example 4


Then the functions can be defined as follows.


void PrintPersonWithPrintf(const person& p)
{
::printf(
p.GetPrintfFormatString().c_str(),
p.firstName.c_str(), p.middleName.c_str(),
p.lastName.c_str(), p.streetAddress1.c_str(),
p.streetAddress2.c_str(), p.city.c_str(),
p.state.c_str(), p.zip.c_str());
}

void PrintPersonWithSPrintf(const person& p)
{
std::string formatString = p.GetPrintfFormatString();
size_t outputLen = formatString.length() + p.firstName.length()
+ p.middleName.length() + p.lastName.length()
+ p.streetAddress1.length() + p.streetAddress2.length()
+ p.city.length() + p.state.length() + p.zip.length()
+ 20;
std::vector<char> buffer(outputLen, 0);
::sprintf(
buffer.data(), formatString.c_str(),
p.firstName.c_str(), p.middleName.c_str(),
p.lastName.c_str(), p.streetAddress1.c_str(),
p.streetAddress2.c_str(), p.city.c_str(),
p.state.c_str(), p.zip.c_str());
std::string out = buffer.data();
std::cout << out;
}

Source: Variable Expansion in Strings — Example 4


Limitations of sprintf

One limitation of using the sprintf function is that it is not very flexible for international applications. Often the order of words differ from one language to another. One often discussed example is a time and date string.

For example, in the United States date strings are written as {Month}/{Day}/{Year} while in France date strings are written as {Day}/{Month}/{Year} and in Japan date strings are written as {Year}/{Day}/{Month}. There are many other instance in which word order varies from language to language. For more information refer to the Word order article on Wikipedia, [The origin and evolution of word order][], and The Typology of the Word Order of Languages.

One problem with the sprintf function is that it is not possible to change the order of words in the final output by simply changing the order of words in the format string. That is due to the fact that the order of parameters in the code would need to be changed as well.

One solution to this problem is the use of positional specifiers for format strings.

Positional Specifiers for Format Strings

POSIX compatible systems implement an extension to the printf family of functions to add support for positional specifiers for format strings. This extension allows the conversion specifier character % to be is replaced by the sequence “%n$”, where n is a decimal integer in the range [1, {NL_ARGMAX}], giving the position of the argument in the argument list. For more information see the following resources.

The problem for this solution is that this is not universally supported. For example, on Microsoft Windows, the printf family of functions does not support positional specifiers for format strings. Instead this functionality is supported in the printf_p family of functions: see printf_p Positional Parameters and _sprintf_p, _sprintf_p_l, _swprintf_p, _swprintf_p_l. This makes writing cross platform code unnecessarily difficult.

The following code demonstrates the use of positional specifiers for format strings to write a function that will properly format a date string for the United States, France, and Japan.


std::string GetDateFormatString(const std::string& langCode)
{
if (
0 == langCode.compare(0, 2, "en")
|| 0 == langCode.compare(0, 2, "EN")
)
{
return std::string("%1$i/%2$i/%3$i");
}
else if (
0 == langCode.compare(0, 2, "fr")
|| 0 == langCode.compare(0, 2, "FR")
)
{
return std::string("%2$i/%1$i/%3$i");
}
else if (
0 == langCode.compare(0, 2, "ja")
|| 0 == langCode.compare(0, 2, "JA")
|| 0 == langCode.compare(0, 2, "jp")
|| 0 == langCode.compare(0, 2, "JP")
)
{
return std::string("%3$i/%2$i/%1$i");
}
return std::string("%1$i/%2$i/%3$i");
}

std::string GetDateString(
const std::string& langCode,
int month, int day, int year)
{
std::string fmt = GetDateFormatString(langCode);
std::array<char, 32> buffer;
buffer.fill(0);
#if defined(_WIN32)
::_sprintf_p(buffer.data(), 32, fmt.c_str(), month, day, year);
#else
::sprintf(buffer.data(), fmt.c_str(), month, day, year);
#endif
std::string ret = buffer.data();
return ret;
}

Source: Variable Expansion in Strings — Example 5


Note that the there is one major drawback of the above GetDateString function, the presence of that nasty #if/#else/#endif block. This is far from ideal. Unfortunately, due to the fact that the _sprintf_p function expects an additional sizeOfBuffer parameter. Therefore you cannot simply do the following.


#if !defined(_WIN32)
# define _sprintf_p sprintf
#endif
std::string GetDateString(
const std::string &langCode,
int month, int day, int year)
{
std::string fmt = GetDateFormatString(langCode);
std::array<char, 32> buffer;
buffer.fill(0);
// This will not compile on non Windows systems due to the
// extra parameter.

_sprintf_p(buffer.data(), 32, fmt.c_str(), month, day, year);
std::string ret = buffer.data();
return ret;
}

The following will work as an acceptable alternative, however.


#if defined(_WIN32)
# define sprintfp _sprintf_p
#else
# define sprintfp snprintf
#endif
std::string GetDateString(
const std::string &langCode,
int month, int day, int year)
{
std::string fmt = GetDateFormatString(langCode);
std::array<char, 32> buffer;
buffer.fill(0);
sprintfp(buffer.data(), 32, fmt.c_str(), month, day, year);
std::string ret = buffer.data();
return ret;
}

Source: Variable Expansion in Strings — Example 6


This leaves one problem that all of the solutions I have discussed so far unsolved. This function uses C style strings. That is, the first parameter of _sprintf_p is expected to be a pre-allocated char array. It does not natively make use of the C++ basic_string class.

The Boost Format library

The Boost C++ Libraries are a collection of free peer-reviewed portable C++ source libraries that work well with the C++ Standard Library and enhance the capabilities of the C++ Standard Library. In fact, some of the features of the C++ Standard Library were first implemented in the Boost C++ Libraries and the Boost C++ Libraries are designed so that they are suitable for eventual standardization.

One of the components of Boost is The Boost Format library. The Boost home page describes The Boost Format library as follows.

The format library provides a class for formatting arguments according to a format-string, as does printf, but with two major differences:
* format sends the arguments to an internal stream, and so is entirely type-safe and naturally supports all user-defined types.
* The ellipsis (…) can not be used correctly in the strongly typed context of format, and thus the function call with arbitrary arguments is replaced by successive calls to an argument feeding operator%

The format specification strings used by the Boost Format library use the Unix98 Open-group printf precise syntax. Further information on the format specification strings used by the Boost Format library can be found in the Boost printf format specifications section of the Boost Format library documentation. Note that these are essentially the same format specification strings that are used by the _sprintf_p function. As a result, the GetDateFormatString function can be used with The Boost Format library.

The following function shows how this can be done.


std::string GetDateStringBoost(
const std::string &langCode, int month, int day, int year)
{
std::string fmt = GetDateFormatString(langCode);
std::string ret = boost::str(
boost::format(fmt) % month % day % year);
return ret;
}

Source: Variable Expansion in Strings — Example 7


Self Documenting Format Specification Strings

One problem with the printf style format specification strings is that they require some form of supporting documentation to indicate which part of the format specification string corresponds to which variable. For example, in order for the GetDateFormatString function to be considered complete, a comment should be added to specify that the %1$ component corresponds to the month, the %2$ component corresponds to the day, and the %3$ component corresponds to the year.

It would be idea if this documentation was an inherent part of the format specification string. Consider the following syntax for a format string: “\((month)/\)(day)/$(year).” In this string their is no need for supporting documentation to indicate the meaning of each component of the format specification string.

This technique is commonly referred to as String interpolation or variable interpolation, variable substitution, or variable expansion. Some programming languages have this functionality built in.

For example, the Python programming language supports the Literal String Interpolation feature since Python 3.6. This makes the following possible.


apples = 4
print(f"I have {apples} apples")

Another example is in the C# programming language. C# 6 added the interpolated string feature.


string name = "Mark";
var date = DateTime.Now;
Console.WriteLine(
$"Hello, {name}! Today is {date.DayOfWeek}, it's {date:HH:mm} now.");

The ExpandVars Function

The YekNeb C++ Code snippets library provides two versions of the ExpandVars function, which provides string interpolation functionality for C++. One version of the function uses nothing beyond the STL. Another version of the function uses the Boost Xpressive library. Both versions of the function return a string in which the variables are expanded based on the values specified in either an environment map or the environment variables. The following formats are supported for variable names.

  • %VarName%
  • %(VarName)
  • %[VarName]
  • %{VarName}
  • $(VarName)
  • $[VarName]
  • ${VarName}
  • #(VarName)
  • #[VarName]
  • #{VarName}

Bash style variable names in the form of $VarName are not supported.

The variable names used by the ExpandVars function may contain word characters, space characters, the ( character, and the ) character. Note that if the variable includes either the ( character or the ) character you should not use the %(VarName) or $(VarName) syntax.

The following is a simplified version of the STL only ExpandVars function.


bool FindVariableString(
const std::string &str,
const std::string::size_type pos,
std::string::size_type &beginVarStringPos,
std::string::size_type &endVarStringPos,
std::string::size_type &beginVarNamePos,
std::string::size_type &endVarNamePos)
{
const char *TestString = "%$#";
const char PercentSign = '%';
const char LeftParenthesis = '(';
const char LeftSquareBracket = '[';
const char LeftCurlyBracket = '{';
const char RightParenthesis = ')';
const char RightSquareBracket = ']';
const char RightCurlyBracket = '}';
beginVarStringPos = std::string::npos;
endVarStringPos = std::string::npos;
beginVarNamePos = std::string::npos;
endVarNamePos = std::string::npos;
if (str.empty())
{
return false;
}
beginVarStringPos = str.find_first_of(TestString, pos);
if (std::string::npos == beginVarStringPos)
{
return false;
}
if (beginVarStringPos >= str.length() - 1)
{
return false;
}
char ch = str[beginVarStringPos];
char ch1 = str[beginVarStringPos + 1];
if (
PercentSign == ch
&& LeftParenthesis != ch1 && LeftSquareBracket != ch1
&& LeftCurlyBracket != ch1
)
{
beginVarNamePos = beginVarStringPos + 1;
endVarStringPos = str.find(PercentSign, beginVarNamePos);
if (std::string::npos == endVarStringPos)
{
return false;
}
}
else if (
LeftParenthesis != ch1 && LeftSquareBracket != ch1
&& LeftCurlyBracket != ch1
)
{
return false;
}
else
{
beginVarNamePos = beginVarStringPos + 2;
char closeChar = 0;
if (LeftParenthesis == ch1)
{
closeChar = RightParenthesis;
}
else if (LeftSquareBracket == ch1)
{
closeChar = RightSquareBracket;
}
else if (LeftCurlyBracket == ch1)
{
closeChar = RightCurlyBracket;
}
endVarStringPos = str.find(closeChar, beginVarNamePos);
if (std::string::npos == endVarStringPos)
{
return false;
}
}
endVarNamePos = endVarStringPos - 1;
return true;
}

bool StringContainsVariableStrings(const std::string &str)
{
std::string::size_type beginVarStringPos = 0;
std::string::size_type endVarStringPos = 0;
std::string::size_type beginVarNamePos = 0;
std::string::size_type endVarNamePos = 0;
bool ret = FindVariableString(
str, 0, beginVarStringPos, endVarStringPos,
beginVarNamePos, endVarNamePos);
return ret;
}

std::string GetVariableValue(
const std::string &varName,
const std::map<std::string, std::string> &env,
bool &fromEnvMap, bool &valueContainsVariableStrings)
{
typedef std::map<std::string, std::string> my_map;
fromEnvMap = false;
valueContainsVariableStrings = false;
std::string ret;
my_map::const_iterator itFind = env.find(varName);
if (itFind != env.end())
{
ret = (*itFind).second;
if (!ret.empty())
{
fromEnvMap = true;
valueContainsVariableStrings =
StringContainsVariableStrings(ret);
}
}
if (ret.empty())
{
ret = ::getenv(varName.c_str());
}
return ret;
}

std::string ExpandVars(
const std::string &original,
const std::map<std::string, std::string> &env)
{
std::string ret = original;
if (original.empty())
{
return ret;
}
bool foundVar = false;
std::string::size_type pos = 0;
do
{
std::string::size_type beginVarStringPos = 0;
std::string::size_type endVarStringPos = 0;
std::string::size_type beginVarNamePos = 0;
std::string::size_type endVarNamePos = 0;
foundVar = FindVariableString(
ret, pos, beginVarStringPos, endVarStringPos,
beginVarNamePos, endVarNamePos);
if (foundVar)
{
std::string::size_type varStringLen =
endVarStringPos - beginVarStringPos + 1;
std::string varString = ret.substr(
beginVarStringPos, varStringLen);
std::string::size_type varNameLen =
endVarNamePos - beginVarNamePos + 1;
std::string varName = ret.substr(
beginVarNamePos, varNameLen);
bool fromEnvMap;
bool valueContainsVariableStrings;
std::string value = GetVariableValue(
varName, env, fromEnvMap,
valueContainsVariableStrings);
if (!value.empty())
{
ret = ret.replace(
beginVarStringPos, varStringLen, value);
pos = beginVarStringPos;
}
else
{
pos = endVarStringPos + 1;
}
}
} while (foundVar);
return ret;
}

Source: Variable Expansion in Strings — Example 8

The following code demonstrates the use of the ExpandVars function.

std::string GetDateFormatStringExpandVars(
const std::string& langCode)
{
if (
0 == langCode.compare(0, 2, "en")
|| 0 == langCode.compare(0, 2, "EN")
)
{
return std::string("${month}/${day}/${year}");
}
else if (
0 == langCode.compare(0, 2, "fr")
|| 0 == langCode.compare(0, 2, "FR")
)
{
return std::string("${day}/${month}/${year}");
}
else if (
0 == langCode.compare(0, 2, "ja")
|| 0 == langCode.compare(0, 2, "JA")
|| 0 == langCode.compare(0, 2, "jp")
|| 0 == langCode.compare(0, 2, "JP")
)
{
return std::string("${year}/${day}/${month}");
}
return std::string("${month}/${day}/${year}");
}

std::string GetDateStringExpandVars(
const std::string &langCode, int month, int day, int year)
{
std::string fmt = GetDateFormatStringExpandVars(langCode);
std::map<std::string,std::string> env{
{"month", std::to_string(month)},
{"day", std::to_string(day)},
{"year", std::to_string(year)}
};
std::string ret = ExpandVars(fmt, env);
return ret;
}

Source: Variable Expansion in Strings — Example 8


The following is a simplified version of a version of the ExpandVars function that uses the Boost Xpressive regex_replace function.


::boost::xpressive::sregex GetRegex()
{
namespace xpr = ::boost::xpressive;
xpr::sregex ret =
"%" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")")) >> '%'
| "%(" >> (xpr::s1 = +(xpr::_w | xpr::_s)) >> ')'
| "%[" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> ']'
| "%{" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> '}'
| "$(" >> (xpr::s1 = +(xpr::_w | xpr::_s)) >> ')'
| "$[" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> ']'
| "${" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> '}'
| "#(" >> (xpr::s1 = +(xpr::_w | xpr::_s)) >> ')'
| "#[" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> ']'
| "#{" >> (xpr::s1 = +(xpr::_w | xpr::_s | "(" | ")"))
>> '}';
return ret;
}
struct string_formatter
{
typedef std::map<std::string, std::string> env_map;
env_map env;
mutable bool valueContainsVariables;
string_formatter()
{
valueContainsVariables = false;
}
template<typename Out>
Out operator()(
::boost::xpressive::smatch const& what, Out out) const
{
bool fromEnvMap;
bool valueContainsVariableStrings;
std::string value = GetVariableValue(
what.str(1), env, fromEnvMap,
valueContainsVariableStrings);
if (
fromEnvMap && !value.empty()
&& valueContainsVariableStrings
)
{
valueContainsVariables = true;
}
if (value.empty())
{
value = what[0];
}
if (!value.empty())
{
out = std::copy(value.begin(), value.end(), out);
}
return out;
}
};
std::string ExpandVarsR(
const std::string &original,
const std::map<std::string, std::string> &env)
{
std::string ret = original;
if (original.empty())
{
return ret;
}
string_formatter fmt;
fmt.env = env;
fmt.valueContainsVariables = false;
::boost::xpressive::sregex envar = GetRegex();
ret = ::boost::xpressive::regex_replace(original, envar, fmt);
if (fmt.valueContainsVariables)
{
std::string newValue;
std::string prevValue = ret;
do
{
fmt.valueContainsVariables = false;
newValue = ::boost::xpressive::regex_replace(
prevValue, envar, fmt);
if (0 == prevValue.compare(newValue))
{
break;
}
prevValue.erase();
prevValue = newValue;
}
while (fmt.valueContainsVariables);
if (0 != ret.compare(newValue))
{
ret = newValue;
}
}
return ret;
}

Source: Variable Expansion in Strings — Example 9


The source code of the full version of the STL only implementation of the ExpandVars function can be found in ExpandVars.h and ExpandVars.cpp. The source code of the Boost implementation of the ExpandVars function can be found in boost/ExpandVars.h and boost/ExpandVars.cpp.

Originally published at yekneb.com on November 7, 2018.