In recent years, C++ developers have become very interested in heterogenous containers -- containers that can contain an value which might be of several different types.
One way to approach the problem is to use a union
type. union is a language-level feature inherited from
C. [1]
Prior to C++11, a union could only contain POD types. Since C++11, this restriction was relaxed, but the compiler still does not generate useful special member functions, nor can it, really. A union is ultimately a very low level construct -- usually in C++, it is there for compatibility with C, or it is used as a building block in creating a more user-friendly type, rather than as a solution itself.
For purposes of C++, if we want to have a union
type which is extended to act more like a standard container, a natural approach
is to use a discriminated union. This means, a union
, together with an int
,
or an enum
, which keeps track
of which type is currently engaged within the union
.
When we have this extra information, we can access the union in a type-safe
way, and give it a copy constructor and so on. This idea was originally described
by Andrei Alexandrescu [2].
boost::variant
is a well-known implementation
of these ideas, and it became a very influential library. Besides providing
a generic, type-safe, discriminated union, it also provided a concept of
static visitor. A static visitor is function object
which the programmer creates in order to "visit" the variant type
in a systematic fashion, rather than using a sequence of if
... else
or similar .
The function object's implementation may use templates, and so take advantage of C++'s pattern matching abilities when visiting the variant type, which is very powerful and convenient. This is perhaps the closest thing in current versions of C++ to the pattern matching features found in languages like ML, Rust, and Haskell.
However, one of boost::variant
's constraints is that it doesn't
require C++11 -- its interface and design work in a way that accomodates
C++ sans move semantics and noexcept
annotations, which gives up certain optimization opportunities. Some of these
gains can be realized by using the most recent versions of boost::variant
when it enables C++11 features,
but some cannot, as we will see.
The core issue in implementing a variant
is how to handle a throwing, type-changing assignment
[3].
Fundamentally, there is a chicken-and-egg problem: When the variant is assigned to a value different from the currently contained type, we must first destroy the current value in order to vacate the storage, then we can construct the new value in that storage. Clearly, we can't construct the new value without destroying the old value first. But on the other hand, if constructing the new value fails by throwing an exception, we would like to retain the old value, so that we can have roll-back semantics and strong exception safety. Once it's destroyed, we can't exactly do that anymore. So somehow, neither of these steps should happen first.
There are a number of possible ways to resolve this issue.
One simple solution is to allocate double the storage for the variant. Then, the new object need not be allocated in the same memory as the old object, so it can always be constructed first and the issue is resolved. [4]
But in some applications, the overhead of double memory usage isn't tolerable.
boost::variant
approach
boost::variant
resolves this tension as follows.
First, it assumes that the contained object is copyable, and then, before
any type-changing assignment, makes a backup copy of your object on the heap.
Then the object in storage is destroyed and the new one is created. If construction
fails, the variant holds onto the heap pointer, and subsequent accesses go
via that pointer, until a new assignment takes place. [5]
In many scenarios, this is a great solution -- the storage overhead is at most the size of a pointer. However, if your object is expensive or impossible to copy, then this won't work so well for you.
std::variant
approach
After much discussion, the std::variant
of C++17 took the following approach: relax the "never-empty" guarantee
to a "rarely-empty" guarantee, by introducing an empty state which
occurs when construction of the new object fails. This empty state can be
queried, and if a variant is visited which was made empty by an earlier exception,
then a bad_visit
is thrown.
This is the so-called "Kona Compromise". [6]
This solution has several obvious advantages -- we don't have to assume that the user type is copyable, and we don't have to make a backup copy at all, which is significantly more efficient in some cases. In terms of performance, we basically get ideal behavior, for assignments and for the storage space used by the variant.
However, it also complicates the interface to the variant
,
as now the programmer needs to think about and sometimes query the empty
state.
In many projects, the empty state isn't a meaningful concern and this solution works wonderfully.
Indeed, many other modern C++ variant types targetting C++11 and C++14 have
followed suit and adopted the std::variant
interface and behavior [7] [8].
However, in some projects, when the exception caused by object is thrown, we want to recover and proceed with what we were doing. Depending on the lifetimes of the containers in the project, a variant with only a basic exception safety guarantee may be very inconvenient, as now we may have to go find that variant and put it in a good state before attempting to proceed, or risk it throwing a new exception when we try to use it. In these projects, it would be much nicer if the exception only invalidates the object that threw it, and not also the surrounding container.
strict_variant
approach
Since most standard and boost containers provide strong-exception safety, it seems desirable to have at our disposal an efficient, modern C++ variant type which also provides this guarantee. And indeed, there have been some standardization proposals like this [9] [10].
strict_variant
achieves this
by making a different implementation decision with a different trade-off.
Instead of compromising on strong-exception safety, we compromise somewhat
on the stack-based goal -- we simply say that any type that is not no-throw
move-constructible needs to be allocated on the heap.
Specifically, objects with throwing moves get put in a wrapper which allocates them on the heap. The wrapper only holds a pointer, and moving the pointer can't fail, even if the object has a throwing move.
For example, this solves the copy-assignment problem as follows:
When using the easy_variant
template alias, the throwing move is detected and the wrapper is handled
transparently for the programmer, giving a general, easy-to-use, and performant
variant.
From the user's point of view, this design results in a very familiar feel:
strict_variant
is either nothrow-moveable itself or is put in a nothrow-moveable
wrapper.
boost::variant
.
You don't have to worry about an empty state.
strict_variant
doesn't
actually come with any of its own exception types, and doesn't do any
explicit try
, catch
, or throw
.
You don't have to add new exception types to your mental checklist when
you use it, and it is very easy to use in applications that are compiled
with -fno-exceptions
and / or -fno-rtti
.
Of course, this implementation pays a cost when we try to access one of the types with a throwing move, since it is on the heap. However, this also is somewhat consistent with the rest of the standard library -- since C++11, most programmers have learned that they should try hard to make their objects no-throw moveable if possible, since many standard library containers such as vector will be able to use a "fast path" in that case and be much more efficient.
While it's been well-established that throwing moves are sometimes hard to avoid and can't reasonably be banned from general-purpose containers, [11], it seems that supporting them by putting them on the heap is a reasonable solution to the problem. If such objects are large and complex enough, it may be favorable to allocate them on the heap anyways.
Another nice aspect of our design is that we make it easy for the programmer to control the overhead that they pay to support the throwing move situation, to the degree that they want to.
The "real" implementation of the variant type,
strict_variant
, only fully
supports types which are no-throw move constructible.
If it is constructed with a type that is not no-throw move constructible, then the variant cannot generate assignment operators -- a compiler error results.
In some projects, this is more desirable than a container which silently
begins to make extra heap allocations. When a strict_variant
cannot be assigned and needs to be, the programmer resolves the situation
either by factoring out the throwing move, or putting the type in recursive_wrapper
within the variant declaration.
(Or using the emplace
member
function, rather than assignment.)
Note | |
---|---|
Recall that
A
In |
Since our recursive_wrapper
is no-throw move constructible, even if T
is not, this effectively works around the issue, and the rest of your code
using the variant can compile without changes after substituting recursive_wrapper<T>
for
T
.
When using strict_variant
,
you thus control directly exactly what extra heap allocations are made in
support of strong exception-safety, and it happens in a transparent way --
you don't pay for anything you don't use.
If you can manage to make all your types no-throw move constructible, then you get ideal variant performance -- no storage overhead, no extra copies or dynamic allocations, and no empty state to worry about.
We provide a template alias easy_variant
which takes care of these details if you don't care to be bothered by the
compiler about a throwing move / dynamic allocation.
(Some programmers would prefer that the compiler not start making dynamic
allocations without a warning, just because some noexcept
annotation was not deduced the way they expected. But programmer convenience
is a good thing too.)
Specifically, any type that you put in the easy_variant
which has a throwing move will be wrapped in recursive_wrapper
implicitly.
namespace strict_variant { template <typename ... Ts> using easy_variant = variant<wrap_if_thowing_move_t<Ts>...>; }
where wrap_if_throwing_move_t
is
namespace strict_variant { struct <typename T, typename = mpl::enable_if_t<std::is_nothrow_destructible<T>::value && !std::is_reference<T>::value>> struct wrap_if_throwing_move { using type = typename std::conditional<std::is_nothrow_move_constructible<T>::value, T, recursive_wrapper<T>>::type; }; template <typename T> using wrap_if_throwing_move_t = typename wrap_if_throwing_move<T>::type; }
So, easy_variant
means that
if something starts throwing in the future, you won't have to go back and
change your variant
declaration.
But, if you can manage to make all the types involved no-throw move constructible,
then you still don't pay anything for what you don't use.
A secondary motivation of strict_variant
is to reduce the errors caused by implicit conversions when using a variant
, and help achieve consistent cross-platform
behavior.
For instance, some programmers consider it very undesirable that the following code compiles without a warning or error.
boost::variant<std::string, int> v; v = true;
When an assignment like this takes place, boost::variant
determines what type will be used via overload resolution. That is, a function
object f
is created, which
has overloads for std::string
and int
arguments. If overload resolution succeeds, then it determines which type
will result inside of the variant.
A drawback of this is that it means that any C++ implicit conversions are permitted to take place when assigning a variant type.
Overload resolution is a core C++ language feature, and in 95% of cases it works very well and does the right thing.
However, in the 5% of cases where overload resolution does the wrong thing, it can be quite difficult to work around it. This includes the scenarios in which overload resolution is ambiguous, and those in an overload is selected which the user did not intend due to a subtle conversion.
Because integral types have so many permitted conversions, these problems are particularly obvious when you have a variant with several integral types.
This happens commonly when using variant types to interface with some scripting language for instance. The typical dynamically-typed scripting language will permit a variety of primitive values, so when binding to it, you may naturally end up with something like
boost::variant<bool, int, float, std::string, ...>
To make it easier to get good results when manipulating a variant like this,
strict_variant
prevents many
implicit conversions from occurring when two arithmetic or pointer types
types are involved.
When assigning or constructing a variant, some conversions which may be "surprising" or which are not totally portable are not permitted.
short
,
int
, long
, long
long
, and unsigned variations)
float
,
double
, long double
)
char
,
char16_t
, char32_t
, and unsigned variations)
bool
wchar_t
long
long
cannot be converted to long
and long
cannot be converted to int
,
even if you are on a 32-bit machine and they have the same size for you,
because it could be narrowing on a 64-bit machine.)
So for instance,
strict_variant<std::string, int> v; v = true;
is an error because bool
cannot
be used to construct int
within
the variant, and
strict_variant<long, float> v; v = 100;
is not ambiguous -- int
to
float
is not allowed by our
rules, so it is eliminated from the overload resolution using SFINAE, and
long
is unambiguously selected.
With boost::variant
, code like this is okay:
boost::variant<const void *, bool> v; const char * str = "foo"; v = str;
With strict_variant
, it isn't
-- neither conversion const char *
to
const void
*
or to bool
is permitted to occur implicitly here.
Note | |
---|---|
When the type contained is a user-defined type, all the normal conversions
take effect -- if your type is constructible from |
The actual test is performed by a
type trait, strict_variant::safely_constructible
,
which refines std::is_constructible
. You can make tests against
that if you want to e.g. make static assertions that certain conversions
are or aren't permitted.
A second aspect of strict_variant
is that we adopt a rank-based priority policy to resolve some overload ambiguities
involving arithmetic types.
Consider the following overloaded function:
void f(long); void f(long long); f(100);
Under standard C++ rules, this function call is ambiguous: The argument
f
is an int
,
and both long
and long long
can be obtained from int
by
conversion. Since neither has a worse conversion, there is no best conversion.
When assigning to a variant
however, by far the most common situation is that the user wants to use the
"smallest" type that will fit, since "widening" represents
a loss of information in some sense. (After the widening conversion, our
knowledge of the bounds of the value are worse than they were before.) If
later when they are visiting that value, they want to convert it to a larger
type, that is easy to arrange.
Therefore, when an integral value is assigned to strict_variant
,
if there are multiple integral value types which permit a safe conversion,
it prioritizes them by rank and sign. If for one of the candidates, there
is another candidate which has strictly lower rank / sign, it is "dominated"
by the lower rank candidate, and does not participate in overload resolution.
For instance, int
is safely
convertible to both long
and
long long
.
But long
dominates long long
when making arithmetic conversions, so long
long
is eliminated, and long
is selected unambigously when assigning
an int
to strict_variant<long,
long long>
.
There is a second way to achieve a type-changing assignment: the emplace
member function.
v.emplace<long long>(100); v.emplace<unsigned long long>(100);
emplace
takes a single template
parameter, the type of the desired object to emplace. All other parameters
are forwarded to its constructor. This can be used to put objects in the
variant that are neither copyable nor moveable, so long as they are somehow
no-throw constructible.
This can be used to specify explicitly what value type shall result, regardless of overload resolution or the strict_variant rules.
[2] Andrei Alexandrescu. "Generic<Programming>: Discriminated Unions" series: Part 1, Part 2, Part 3. C/C++ Users Journal. 2002.
[3] C.f. Axel Naumann "The Variant Saga: A happy ending?" https://isocpp.org/blog/2015/11/the-variant-saga-a-happy-ending
[4] C.f. Anthony Williams. Double-Storage Proposal. 2002.
[5]
C.f. Eric Friedman, Itay Maman. Online boost::variant
documentation boost:doc/html/variant/design.html#variant.design.never-empty
[6] C.f. Axel Naumann. "Variant: a type-safe union (v6)" Proposal P0088r1 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0088r1.html
[7]
C.f. Agustín Bergé. eggs::variant
online documentation. https://github.com/eggs-cpp/variant
[8]
C.f. Artem Pavlenko. mapbox::variant
online documentation. https://github.com/mapbox/variant
[9] C.f. David Sanke "Simply a Strong Variant". P0093R0 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0093r0.html
[10] C.f. Anthony Williams "P0110R0: Implementing the strong guarantee for variant<> assignment" P0110R0 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0110r0.html
[11] C.f. Ville Voutilainen Proposal P0129R0 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0129r0.html