Copy Constructor / Assignment Operators
In C++ you can construct one instance from another via a constructor and also by an assignment operator. In some cases a constructor will be used instead of an assignment:
PersonList x;
PersonList y = x; // Copy constructor, not assignment
PersonList z;
z = x; // Assignment operator
By default C++ generates all the code to copy and assign the bytes in one class to another without any effort. Lucky us!
So our class PersonList might look like this:
struct Person {
//...
};
class PersonList {
std::vector<Person> *personList_;
public:
PersonList() : personList_(new std::vector<Person>) {
}
~PersonList() {
delete personList_;
}
// ... Methods to add / search list
};
Except we’re not lucky, we just got slimed. The default byte copy takes the pointer in personList_
and makes a
copy of it. Now if we copy x
to y
, or assign x
to z
we have three classes pointing to the same private data!
On top of that, z
allocated its own personList_
during its default constructor but the byte copy assignment
overwrote it with the one from x
so its old personList_
value just leaks.
Of course we might be able to use a std::unique_ptr
to hold our pointer. In which case the compiler would
generate an error. But it might not always be that simple. personList_
may have been opaquely
allocated by an external library so have no choice but to manage its lifetime through the constructor and destructor.
The Rule of Three
This is such a terrible bug enabling problem in C++ that it has given rise to the so-called the Rule of Three[^1].
The rule says that if we explicitly declare a destructor, copy constructor or copy assignment operator in a C++ class
then we probably need to implement all three of them to safely handle assignment and construction. In other words
the burden for fixing C++’s default and dangerous behaviour falls onto the developer.
So let’s fix the class:
struct Person {
//...
};
class PersonList {
std::vector<Person> *personList_;
public:
PersonList() : personList_(new std::vector<Person>) {
}
PersonList(const PersonList &other) :
personList_(new std::vector<Person>) {
personList_->insert(
personList_->end(), other.personList_->begin(),
other.personList_->end());
}
~PersonList() {
delete personList_;
}
PersonList & operator=(const PersonList &other) {
// Don't forget to check if someone assigns an object to itself
if (&other != this) {
personList_->clear();
personList_->insert(
personList_->end(), other.personList_->begin(),
other.personList_->end());
}
return *this;
}
// ... Methods to add / search list
};
What a mess!
We’ve added a copy constructor and an assignment operator to the class to handle copying safely. The code even had to check if it was being assigned to itself in case someone wrote x = x
. Without that test, the receiving instance would clear itself in preparation to adding elements from itself which would of course wipe out all its contents.
Alternatively we might disable copy / assignments by creating private constructors that prevents them being called by external code:
class PersonList {
std::vector<Person> *personList_;
private:
PersonList(const PersonList &other) {}
PersonList & operator=(const PersonList &other) { return *this; }
public:
PersonList() : personList_(new std::vector<Person>) {
}
~PersonList() {
delete personList_;
}
// ... Methods to add / search list
};
Another alternative would be to use noncopyable types within the class itself. For example, the copy would fail if the pointer were managed with a C++11 std::unique_ptr
(or Boost’s boost::scoped_ptr
).
Boost also provides a boost::noncopyable
class which provides yet another option. Classes may inherit from noncopyable which implements a private copy constructor and assignment operator so any code that tries to copy will generate a compile error.
The Rule of Five
The Rule of Three has become the Rule of Five(!) in C++11 because of the introduction of move semantics.
If you have a class that can benefit from move semantics, the Rule of Five essentially says that the existence of the user-defined destructor, copy constructor and copy assignment operator requires you to also implement a move constructor and a move assignment operator. So in addition to the code we wrote above we must also write two more methods.
class PersonList {
// See class above for other methods, rule of three....
PersonList(PersonList &&other) {
// TODO
}
PersonList &operator=(PersonList &&other) {
if (&other != this) {
// TODO
}
return *this
}
How Rust helps
Move is the default
Rust helps by making move semantics the default. i.e. unless you need to copy data from one instance to another, you don’t. If you assign a struct from one variable to another, ownership moves with it. The old variable is marked invalid by the compiler and it is an error to access it.
But if you do want to copy data from one instance to another then you have two choices.
- Implement the
Clone
trait. Your struct will have an explicitclone()
function you can call to make a copy of the data. - Implement the
Copy
trait. Your struct will now implicitly copy on assignment instead of move. ImplementingCopy
also implies implementingClone
so you can still explicitly callclone()
if you prefer.
Primitive types such as integers, chars, bools etc. implement Copy
so you can just assign one to another
// This is all good
let x = 8;
let y = x;
y = 20;
assert_eq!(x, 8);
But a String
cannot be copied this way. A string has an internal heap allocated pointer so copying is a more expensive operation. So String
only implements the Clone
trait which requires you to explicitly duplicate it:
let copyright = "Copyright 2017 Acme Factory".to_string();
let copyright2 = copyright.clone();
The default for any struct is that it can neither be copied nor cloned.
struct Person {
name: String,
age: u8
}
The following code will create a Person
object, assigns it to person1
. And when person1
is assigned to person2
, ownership of the data also moves:
let person1 = Person { name: "Tony".to_string(), age: 38u8 };
let person2 = person1;
Attempting to use person1
after ownership moves to person2
will generate a compile error:
println!("{}", person1.name); // Error, use of a moved value
To illustrate consider this Rust which is equivalent to the PersonList we saw in C++
struct PersonList {
pub persons: Vec<Person>,
}
We can see that PersonList
has a Vec
vector of Person
objects. Under the covers the Vec
will allocate space in the heap to store its data.
Now let’s use it.
let mut x = PersonList { persons: Vec::new(), };
let mut y = x;
// x is not the owner any more...
x.persons.push(Person{ name: "Fred".to_string(), age: 30u8} );
The variable x
is on the stack and is a PersonList
but the persons member is partly allocated from the heap.
The variable x
is bound to a PersonList on the stack. The vector is created in the heap. If we assign x
to y
then we could have two stack objects sharing the same pointer on the heap in the same way we did in C++.
But Rust stops that from happening. When we assign x
to y
, the compiler will do a bitwise copy of the data in x, but it will bind ownership to y
. When we try to access the in the old var Rust generates a compile error.
error[E0382]: use of moved value: `*x.persons`
|
10 | let mut y = x;
| ----- value moved here
11 | x.persons.push(Person{});
| ^^^^^^^^^ value used here after move
|
= note: move occurs because `x` has type `main::PersonList`, which does not implement the `Copy` trait
Rust has stopped the problem that we saw in C++. Not only stopped it but told us why it stopped it - the value moved from x to y and so we can’t use x any more.
Implementing the Copy trait
The Copy
trait allows us to do direct assignment between variables. The trait has no functions, and acts as a marker in the code to denote data that should be duplicated on assignment.
You can implement the Copy
trait by deriving it, or implementing it. But you can only do so if all the members of the struct also derive the trait:
#[derive(Copy)]
struct PersonKey {
id: u32,
age: u8,
}
// Alternatively...
impl Copy for PersonKey {}
impl Clone for PersonKey {
fn clone(&self) -> PersonKey {
*self
}
}
So PersonKey
is copyable because types u32
and u8
are also copyable and the compiler will take the #[derive(Copy)]
directive and modify the move / copy semantics for the struct.
But when a struct contains a a type that does not implement Copy
you will get a compiler error. So this struct Person
will cause a compiler error because String
does not implement Copy:
#[derive(Copy)]
struct Person {
name: String,
age: u8
}
// Compiler error!
Implementing the Clone trait
The Clone
trait adds a clone()
function to your struct that produces an independent copy of it. We can derive it if every member of the struct can be cloned which in the case of Person
it can:
#[derive(Clone)]
struct Person {
name: String,
age: u8
}
...
let x = Person { /*...*/ };
let y = x.clone();
Now that Person derives Clone
, we can do the same for PersonList because all its member types implement that trait - a Person can be cloned, a Vec can be cloned, and a Box can be cloned:
#[derive(Clone)]
struct PersonList {
pub persons: Box<Vec<Person>>,
}
And now we can clone x
into y
and we have two independent copies.
//...
let mut x = PersonList { persons: Box::new(Vec::new()), };
let mut y = x.clone();
// x and y are two independent lists now, not shared
x.persons.push(Person{ name: "Fred".to_string(), age: 30} );
y.persons.push(Person{ name: "Mary".to_string(), age: 24} );
Summary
In summary, Rust stops us from getting into trouble by treated assigns as moves when a non-copyable variable is assigned from one to another. But if we want to be able to clone / copy we can make our intent explicit and do that too.
C++ just lets us dig a hole and fills the dirt in on top of us.