patterncppMinor
Rule of five and move semantics for a string class
Viewed 0 times
semanticsfiverulemoveforandclassstring
Problem
I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.
I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?
```
==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==
class string {
public:
string() : data(nullptr) {
std::cout (string(*this) += str); //could be replaced with std::move() but I use static cast for education purposes
}
private:
char * data;
size_t m_size;
friend std::ostream& operator<<(std::ostream& os, const string& s) {
os << s.data << "\r\n";
return os;
}
};
int main() {
const char t1[] = "text";
const char t2[] = "more text";
const char * p1 = t1;
const char * p2 = t2;
string a(p1,sizeof(t1));
string b(p2,sizeof(t2));
string c(a);
string d(a + b);
string e;
a += b;
e = d;
c = b + a;
std::cout << "a:\t" << a;
std::cout << "b:\t" << b;
std::cout << "c:\t" << c;
std::cout << "d:\t" << d;
std::cout << "e:\t" << e;
return 0;
}I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?
```
==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==
Solution
Comments
Let's consider one of your constructors, and its accompanying comment:
At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.
The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.
Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.
The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).
Non-templated string type
The days of strings supporting only
Array
The array form of
I'm not particularly fond of your current implementation of
I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:
I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of
Alternative Rule
In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often
Copy assignment
Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that
One easy way to get defined behavior is the copy and swap idiom, which could look something like this:
Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.
In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.
This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:
Phase 1: carry out things that might throw, but don't affect any existing data if they do.
Phase 2: carry out operation that affect existing data, but can't throw.
This way, if anything throws, P
Let's consider one of your constructors, and its accompanying comment:
string(const char * p, size_t size) : m_size(size) { //default constructorAt least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.
The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.
Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.
The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).
Non-templated string type
The days of strings supporting only
char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).Array
newThe array form of
new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).operator+I'm not particularly fond of your current implementation of
operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:
string operator+(const string & str) const {
string r(*this);
r += str;
return r;
}I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of
std::move) can inhibit that.Alternative Rule
In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often
unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.Copy assignment
Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that
new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.One easy way to get defined behavior is the copy and swap idiom, which could look something like this:
string& operator=(string str) {
std::cout << "Copy assignment\n";
using std::swap;
swap(data, str.data);
swap(m_size, str.m_size);
return *this;
}Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.
In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.
This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:
Phase 1: carry out things that might throw, but don't affect any existing data if they do.
Phase 2: carry out operation that affect existing data, but can't throw.
This way, if anything throws, P
Code Snippets
string(const char * p, size_t size) : m_size(size) { //default constructorstring operator+(const string & str) const {
string r(*this);
r += str;
return r;
}string& operator=(string str) {
std::cout << "Copy assignment\n";
using std::swap;
swap(data, str.data);
swap(m_size, str.m_size);
return *this;
}string& operator=(string && str) {
std::cout << "Move assignment\n";
using std::swap;
swap(data, str.data);
swap(m_size, str.m_size);
return *this;
}string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size;
char * new_data = new char[new_size];
mempy(new_data, data, m_size);
memcpy(new_data+m_size, str.data, str.m_size);
delete[] data;
m_size = new_size;
data = new_data;
return *this;
}Context
StackExchange Code Review Q#149917, answer score: 5
Revisions (0)
No revisions yet.