C++ move semantics from scratch

Written on 2022-01-14

Today I'd like to propose a narrative for building an intuition for move semantics in C++ by describing a world in which they don't exist and how they naturally appear as a solution for pre-C++11 problems. My goal is to provide a clear understanding of what move semantics are, without any hand waviness. We'll start with the state of affairs in C++98, what the limitations were, what C++11 fundamentally brought to the table, how it can be used, how the language helps us use new "move semantics" idioms and finally what potential alternatives are. I am not aiming for historical accuracy, but you should be left with a good sense of what is available nowadays and why/how we got there.

There are plenty of resources talking about that, many of which are terrible. They are either nebulous, based on unfitting metaphors or just plain wrong. Some resources are good at providing rules of thumb and high-level explanations, but they fail to provide a fundamental understanding of the mechanisms at work. Moreover as we'll see, I strongly believe that the use of std::move should be dramatically de-emphasized in teaching material. I also believe that move semantics are best explained bottom-up rather than top-down.

So let's get started, from scratch.

Colored references

The key to understanding move semantics is understanding "rvalue references", those && you might have seen in type declarations. Those rvalue references are exactly the same thing as "regular" & references (called lvalue references), except that they are incompatible. You can think of them as "colored" references, as if lvalue references were blue and rvalue references were green. Besides that, they just work the same way:

int x = 0;

int& lvalueRef = (int&)x;
int&& rvalueRef = (int&&)x;

print(lvalueRef); // 0
print(rvalueRef); // 0

lvalueRef++;

print(lvalueRef); // 1
print(rvalueRef); // 1

rvalueRef++;

print(lvalueRef); // 2
print(rvalueRef); // 2

As you can see, you can read and set a variable via both kinds of references, there is no difference and no reason to make a fuss about them. However, as noted earlier, those references are incompatible, even if they behave the same. This means that you can't pass a reference to a function expecting a reference of the other color (e.g. you can't pass an int& in a function expecting an int&&). However, because they behave identically, we can cast from one to the other:

int x = 2;
int& lvalueRef = x;
// wouldn't compile, those aren't compatible types:
// int&& rvalueRef = lvalueRef;
// this is fine:
int&& rvalueRef = (int&&) lvalueRef;

I strongly suggest that you play a little with those. Try to write a small program that uses rvalue references where you would usually use lvalue references and notice that while the compiler might require you to constantly cast your references to && again, even though that's how they were already declared, your code will work just like if you were using regular old lvalue references. Here is a small example:

struct RefHolder {
    RefHolder(int&& x) : m_x((int&&)x) {}
    RefHolder(const RefHolder&& other) : m_x((int&&)other.m_x) {}
    const int&& m_x;
};

int addOne(const int&& ref) {
    return ref + 1;
}

int main() {
    int x = 2;
    const RefHolder refHolder((int&&)x);
    const RefHolder otherRefHolder((RefHolder&&)refHolder);
    return addOne((int&&)refHolder.m_x);
}

Everything works as expected, including copy constructors, and the program returns 3.

Function overloading

Hopefully I've convinced you that rvalue references are not scary nor that special. The key insight at this point is to realize that we can use the fact that our two reference colors are incompatible to discriminate between them and get different behavior from both. More concretly, you can have an overloaded function that takes either an rvalue reference or an lvalue reference and execute different logic in either case.

Hopefully that sounded like a terrible design idea, because it generally is. You wouldn't want your handleUserRequest function to behave differently when taking a request pointer or a request reference, the same thing should be true for different kinds/colors of references! On the other hand, why introduce a new kind of reference if it's not to do something different with it?

History and better algorithms

Let's forget about references for a second and think about something seemingly unrelated: algorithms over containers. In this context, we'll loosely define containers as "data structures which contain a bunch of heap allocated data". Examples would include std::vector, std::map and friends. You can reduce those containers as "a small class which holds a pointer to lots of data".

One problem that people can face with such containers is that they are expensive to copy: not only do you need to copy the few members of the class itself, you also need to copy all of the heap allocated data that it points to. In the case of a std::vector holding millions of elements, that makes auto v2 = v1; quite costly.

There exists a simple, much more efficient algorithm for transferring data from one container to another: copy the metadata (size and capacity) and point to the same heap allocated data. This turns an O(n) operation into O(1).

At this point, you're faced with a big issue: since two containers point to the same memory, your program just got a lot more complicated. Now, every time one of the instances gets updated, the shared data might be modified, but the other instance's metadata doesn't! This is quite error-prone.

Something that you could do is set the data of the original vector to nullptr and set the size and capacity. Now the original vector doesn't have access to the data anymore, but you've successfully transferred it to the new vector, as if it had been stolen. This trade-off is the best you can get without into more complicated things like copy-on-write, which is out of scope. Note also that this concept could similarly apply to a linked list, tree, hash map, and many other kinds of containers.

So in conclusion, we haven't found a way to magically copy data in O(1) from one container to another, but we've managed to transfer instead, as if the data had been... moved over? We're getting somewhere.

Tying it all together

We've outlined a category of very simple algorithms: those that can steal/move data over from one container to another. We could probably just implement it as a method on all the containers that could support it. In fact, we could implement it on all containers at all and fallback to just copying if no implementation really makes sense for some containers: we'd still have the data at the destination in the end, it might just not particularly efficient. Note that this isn't a deal-breaker: we already have things like std::find which can perform drastically differently depending on what container is searched.

A simple method would feel like a second-class citizen compared to the existing constructors and operator=though, and we'd rather make it as easy as possible to use our stealing algorithm since it has the potential to be much faster than plain old copying. We'd also prefer not to introduce brand new syntax in the language if possible.

I'll let you think of the various ways to expose our algorithm, but let's orient our thoughts a little: how about treating it as a special case of copying? This way we could probably reuse our constructor and assignment syntax. So how do we express whether we'd like to steal/move the values or just copy them? We could have two overloads for constructors and operator=, one that takes a reference and the other that takes a reference to const:

class Container {
public:
    Container() = default;
    Container(Container const&) {
        // we can't do anything to the other container since it's const
        // let's just copy its data
    }
    Container(Container&) {
        // we *can* do anything to the other container since it's *not* const
        // let's steal its data!
    }
private:
    int* m_data;
    // ...
}

This is quite neat, now we can do something like the following:

const Container c1 = makeContainer();
Container copy{c1}; // c1 is const, we can only copy it
Container c2 = makeContainer();
Container steal{c1}; // c2 is not const, we can steal its data!

In practice though, this becomes complicated quite quickly. First off, you might need your container to not be const to put some things in it, before wanting to copy it, so you'll need boilerplate to make sure that you pass it as const. More importantly, if you fail to do so, the data will be moved, which is likely going to lead to bugs later on (this is called use-after-move). Finally, lots of the code that you're already using/that you've already written will not conform to that convention, making your life a lot harder.

What we'd like instead is something that behaves like a reference, but that we can easily discriminate to make sure that we know if we're going to move or copy, which will not interfere with our existing code and which will not compile if we use it in the wrong place. Wait, that sounds familiar...

Indeed, this is precisely where rvalue references come in. We use them in the same places where we would otherwise use regular lvalue references, except that by convention, we try to steal data where we'd otherwise copy it:

class Container {
public:
    Container() = default;
    Container(Container const&) {
        // this is a regular lvalue reference
        // let's just copy its data
    }
    Container(Container&&) {
        // this is one of those rvalue references
        // let's steal its data!
    }
private:
    int* m_data;
    // ...
}

Again, I'd like to emphasize that there is nothing different between those kinds of references: they just have a different color, they are simply incompatible, you just cannot use one instead of the other without casting. As such, it would be perfectly reasonable to swap their semantics: you could copy from rvalue references and move out of lvalue references. We simply don't do that because we've been copying out of lvalue references for decades and if we want to establish a solid convention, it better be compatible with all of the code that already exists out there.

Resource management

We've mostly spoken about containers so far, but those are not the only classes that can benefit from moving their data rather than copying them. Take a reference-counted pointer like std::shared_ptr for example. It essentially consists of a pointer and an integer counting how many references there are to that pointer to avoid freeing it if someone is still using it[1]. When we copy it (i.e. create one by passing an lvalue reference to the constructor/operator=), the count gets incremented. Once again, note that the only reason constructing a shared_ptr from a const shared_ptr& creates a copy is because of conventions. Nothing prevents an implementer from discarding it, printing an insult to stdout and segfaulting instead. "Copy constructors" and "copy assignments" are just shortcut terms which express intent and work well enough in practice because those conventions are usually well followed.

Back to our shared_ptr: what can we do if we're done using our handle and want to pass it to another function? Passing it by lvalue reference will cause a reference count bump, which can be costly if our implementation is thread-safe and later a reference drop when we reach the end of our shared_ptr's scope (again, possibly costly).

Instead, we can take inspiration from our Container above: instead of doing that expensive work, we could just leave the reference counter alone and just pass the pointer over[2]. Again, we need to discriminate between the two cases, and again we can use our two kinds of references to do so:

class MySharedPtr;
void f(MySharedPtr);

MySharedPtr ptr = makeSharedPtr();

// f needs a MySharedPtr, it will be created with the const& overload
f((const MySharedPtr&)ptr);
// f needs a MySharedPtr, it will be created with the && overload
f((MySharedPtr&&)ptr);

This way, we can control how we want to pass our shared pointer to f. This idiom can apply to many resources where copying them is sometimes desirable but also expensive and avoidable by stealing the data instead.

We can also push this a little further, as does std::unique_ptr. This class is a small wrapper around a pointer with interesting semantics: you can never copy it, but you can forward/move/steal it to another instance, so that only one instance ever holds that pointer. By convention, we delete it's const& overloads to signal that it is not copyable and instead provide && overloads which do what we expect: set the new pointer and null the original so that only one of the instances holds it. I might be beating a dead horse at this point, but once again: nothing would prevent you from using a regular lvalue (&) reference to "move" the pointer, we're only following conventions.

`std::move`

So we have a nice little useful idiom on our hands that is enabled by the two colors of references that C++ provides, but two things make it a little annoying to use in practice, which you will have quickly noticed by playing around with those concepts:

If we don't explicitly cast our reference, it will always be treated as an lvalue reference;
Even if an rvalue reference is the only possible overload, the compiler still forces us to cast it explicitly.

To alleviate that in the previous examples (and to make them clearer), I've always explicitly casted all the references in all of their usages, but this is terrible in practice, so let's improve the ergonomics. Because of 1, we can simply drop all the casts when we want a copy/to use the lvalue reference overload. Because of 2, we always need to use a cast, but we'd probably want to use a static_cast over a C-style cast to be stricter. While we're at it, we might as well wrap that cast in a helper function to more clearly express our intention, to save us some typing and to facilitate refactoring by not needing to explictly state the type:

template <typename T>
T&& move(T& a) {
    return static_cast<T&&>(a);
}

This is exactly what std::move is and does, except that it usually looks a bit worse, like all standard library code does. Using std::move is exactly and only equivalent to casting to an rvalue reference, which in turn only calls a specific overload of a function or constructor. There is no magic and all the behavior is dictated by how those functions are implemented, which conventionally is a copy for the lvalue overload and "some kind of stealy algorithm" for the rvalue overload, when applicable, usually with a fallback to the first "copy" behavior otherwise.

Now is the time to stop for a minute and think about all this:

You now know what lvalue and rvalue references are, how they are just two colors for the same thing and how they are not magic;
You know that you can leverage that difference to define overloads with different (move) behaviors that can enable better performance in some situations, although the concrete terms are vague;
You know that there are conventions for which overload should provide the "copy" behavior vs the "move" behavior;
You know that by default, the lvalue overload will be used, but you can opt-in to the rvalue overload by casting, possibly by using the std::move helper function to clearly express intent and save yourself some typing;
There is no magic to rvalue references, std::move, copy/move constructors/assignments nor moving in general.

Take the time to absorb that, think of the implications, play with simple examples and see how this applies to real world applications, as this is all you need to know.

Conventions

Now that we've built move semantics from scratch, let's dive a bit deeper into why the code that we write and use looks and behaves the way it does, starting with something that you might have noticed:

lvalue constructors/operator= always have const parameters
rvalue constructors/operator= never have const parameters

This all boils down to the convention that lvalue constructors are meant to copy and rvalue constructors are meant to move:

If you're copying data from an object, you should not have to modify it: the constness signals and guarantees the caller that whatever he passes in will stay intact;
If you're moving data out of an object, you may want to modify it so that it stops pointing to the data that the new object now references.

If we don't care about the copy/move convention (aka move semantics), then we don't need to care about that: it is perfectly valid to have & or const&& constructors as well. If we do care about move semantics though, making all lvalue parameters const makes sense: we should never need mutation to copy a structure. However, making rvalue parameters non-const is more debatable: it enables us to clear out the original object to avoid later using the data through it afterwards, but:

C++ doesn't mandate it: it requires objects to be in a valid state, but not in any kind of "empty" state;
It could be argued that clearing out the original object is a waste of CPU and should therefore be skipped;
If we don't want to clear out the original object, making the parameter const might enable more optimizations.

In practice though, it's interesting to note that even though C++ touts its high performance, both in technical potential and culture, most move constructors actually clear out the original object, so the convention is that rvalue reference parameters are taken non-const. This is important for code to play nicely with other code that expects that, but keep in mind that it's not technically required. Either way, you generally shouldn't use a value after it has been moved out of since, unless the code that you're using specifically guarantees some post-move state.

In the case of regular functions, here is what you might have noticed:

lvalue parameters are sometimes const, sometimes non-const
rvalue references are always non-const

This is because we can use references to either read from or write to an object without having to take it by value (which requires a constructor call, which is via either the lvalue or rvalue overload). We just make that reference const or not based on our usage of it.

If it takes an rvalue reference, it's probably to call the rvalue reference constructor of that object down the line. Since we've established that it makes sense for constructors to take non-const rvalue references, then it makes sense for our functions to also take them non-const: if they were const we couldn't do much with them down the line.
But let's pause for a second, think about the intent and think about whether that's the right way to go about it: we want to "move" some object into a function, so that it can have it as efficiently as possible, under the assumption that we won't use it anymore.

Could we do it by taking an rvalue reference? Sure, this would be cheap (just passing a reference), the caller would expect the move and we wouldn't need to do much. Note however that if we do not move the object in the body of that function, the caller will still have the original object, which might be unexpected (imagine moving a huge vector into a function, wanting to reuse it expecting it to now be empty but still having all the data because that function did nothing with the rvalue reference).

Could we do it by taking a value? Sure, then we can decide at the call site if we want to copy or move it, but note that we'll need to construct that new value either way, which might be more expensive than passing a reference. At least, if you've moved into it, you're sure to now have a moved-out-of (usually empty) object since you did move it into the function (otherwise the new value couldn't have been constructed), unlike in the previous case. We'll later see how things can be slightly more complicated, but this is a decent mental model to get started.

Could we do it by taking an lvalue reference? Unfortunately yes: because lvalue and rvalue references can easily be converted from one to another via casting (aka std::move), someone taking an lvalue reference could just move the data out it. On one hand, this is very unexpected because if you wanted move behvavior, you'd probably want to pass an rvalue reference instead, by convention. On the other hand, if you passed a non-const lvalue reference, you agreed that anything could happpen to it, so one could agree that this is to be expected. I'll let you be the judge of that.

As you can see, because there are many ways to pass values in C++ and different use cases, the combinations get quite high. Unfortunately, there are no right answers, silver bullets or rule of thumbs that can tell you exactly what you should do, depending on the size of a class, whether it is copy-only, move-only or both, whether we want a function or its caller to decide how to deal with that class, whether we follow typical move conventions or want more speed and more, we might want to do things differently. The fundamentals remain very simple: two colors of references and function overloading, but the combinatorial explosion of cases prevent us from designing simple guidelines. When in doubt, try to desugar the copy and move abstractions, go back to the types and functions you are dealing with and think about what code is actually being called and executed.

Language support

So far I've asserted that lvalue and rvalues references were identical but incompatible, but you might have noticed some odd differences anyways. For example, you never need to cast to an lvalue reference, but you always need to cast to an rvalue reference. Did I lie to you? Not really: they behave similarly, but the language picks which one to use when both are valid. This should not sound unreasonable if you've worked with overloads before, consider:

void f(int);
void f(int&);

Those two overloads are valid, but if you call f with some int variable, the compiler will complain that the call is ambiguous, because both overloads are valid but none is obviously a better pick than the other.

In the case of references, the language defines which one should be picked in certain contexts. It basically goes like this: pick the rvalue overload if the value is temporary and the lvalue overload otherwise. A temporary value simply means that this value is about to die, to be destructed. Examples include values returned from a function and literals: those aren't referenced anywhere else so sure, why not just move them? It will be less expensive. Examples of values that are not temporary include anything that is bound to a variable. Let's have a look at some examples:

// the return value of g isn't referenced anywhere, it is temporary and
// therefore passed as an rvalue reference to f
f(g());
// same here
f(Foo{})

// foo is not temporary, it will still be alive after we've called f and
// could be used long after: it is passed by lvalue reference
Foo foo;
f(foo);

// even though scopedFoo would be destroyed right after the call to f, it is not
// considered a temporary, so it will also be passed by lvalue reference
{
    Foo scopedFoo;
    f(scopedFoo);
}

// because localFoo is being returned, we know that it is about to expire,
// therefore it becomes a temporary: it is passed as an rvalue reference
Foo makeFoo() {
    Foo localFoo;
    return localFoo;
}

I won't go into details about how temporaries work here, you can read about value categories on your own if you care. You might discover that lvalue and rvalue references can be divided into finer grained categories, but that should not change the mental model that I present here, which is coarser but still correct and sufficient for programming, including complicated library work.

If you think hard enough about how C++ picks which reference is being used, you might make a key observation: while there is nothing inherant to copying or moving to lvalue and rvalue references, the language itself strongly orients which overload is going to be used based on usage. Values that won't be reused are passed by rvalue references. Values that might are passed by lvalue reference. Knowing this, it makes more sense to reserve destructive behavior for rvalue overloads, since it won't affect anyone else. This is what really makes move semantics possible: the language itself encourages certain overloads to be picked in certain situations. You can still use lvalue/rvalue overloads to do other things, or even swap their copy/move behaviors, but you'll be fighting against the language, constantly casting values to "correct" what C++ does by default.

Finally, you should be aware of a subtlety of passing objects by value in C++: when passing types with non-trivial constructors or destructors to a function, C++ actually passes a reference instead. This is hidden, but makes intuitive sense at the ABI level: the caller is the one constructing the value, so it must also be the one destructing it. For it to destruct it, the object must be in the caller's stack frame, hence the passing by reference even though the function takes a value. In any case, this language "feature" might complexify your mental model of when to pass things by reference vs value. This is particularly relevant for move-only types like std::unique_ptr, which almost always have non-trivial destructors. You might elect to pass them by value when they are cheap to copy, but if they are going to be passed by reference anyways, you might as well avoid the extra construction and destruction and just pass them by rvalue reference (which again isn't a silver bullet as the callee might unpredictably not actually consume that value).

Limitations, pitfalls and comparisons

We're almost done here, I have no more concepts to introduce, no more complexity to add to your mental model. I just want to discuss some of the implications of C++'s move semantics and what the alternatives are. If you've understood everything so far, you should be well-armed to deal with move semantics already.

First off, I consider it a tremendous failure that people need to be "well-armed" to deal with moves in C++. Ironically, I believe that this article shows that rvalue references can easily be explained and understood but that understanding move semantics is much more involved, even though rvalue references are merely a tool for move semantics. In other words, it's easy to build move semantics from scratch for yourself with just function overloads and two colors of references, but the real world is much more complicated. It might be acceptable if this complexity was inherant to the problem of "copying or moving data around", but I claim that it isn't.

When dealing with the complexities of C++, I like to think about how the same problem would be dealt with in C. In this case, you wouldn't need overloads, different references or even constructors at all. You'd simply have two simple functions: void copy_foo(Foo*, Foo*) and void move_foo(Foo*, Foo*)[3]. Sure, this has all of the downsides that come with C in terms of unsafety, code duplication and more, but it would also be incredibly more simple. You would also only have to think about moves for types which can benefit from them and not have to deal with any extra complexity for all other types[4].

If you'd rather want to look at how a modern, safe language does it, have a look at Rust. Its model is also very simple: only moves exist. This makes sense as a default so that you never copy by accident, but also makes life much easier. Should you want to copy something, you could always do it explicitly with a dedicated method[5].

Another difference in Rust is that values cannot be used after a move, while they simply "should not be used, mostly" in C++. The Rust compiler actually enforces that, which has the side-effect of enabling moves from const values: since the value can't be used afterwards, there's no reason to clear it out, so it can be const. This is technically possible in C++ too as we've discussed before, but in the real world, rvalue references are passed around non-const. This, together with the lack of guarantees regarding the state of a moved-from object, enable a unique category of bugs in C++, called use-after-move, on top of the performance loss of having to clear out objects to mitigate the same use-after-move bugs.

I'd like to address a special thank you to Jean-François Marquis for proofreading this article, making sure that it was accurate, readable and helpful: merci JF !

[1] It's a bit more complicated in practice, especially with std::shared_ptr which supports weak references as well, but this will do in our context. Jump back

[2] We also need to null our own pointer so that our destructor does not release it down the line. There are of course more subtleties, but again, we're not here to design a correct reference counted pointer. Jump back

[3] In fact, if you decide to go for a non-clearing move and your type only has one layer of pointers, C's assignment operator will already functionally be a move, in which case you only need a single deep_copy_foo function! Jump back

[4] You can mostly ignore move semantics for POD types, but they sure need to be dealt with more often than for the cases where they make sense, like containers, smart pointers and ownership tokens. Jump back

[5] Yes, the Copy trait exists. Jump back