patterncppMinor

Rule of five and move semantics for a string class

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

semanticsfiverulemoveforandclassstring

Problem

I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.

class string {
public:
    string() : data(nullptr) {
        std::cout (string(*this) += str);   //could be replaced with std::move() but I use static cast for education purposes
    }

private:
    char * data;
    size_t m_size;
    friend std::ostream& operator<<(std::ostream& os, const string& s) {
        os << s.data << "\r\n";
        return os;
    }
};

    int main() {
        const char t1[] = "text";
        const char t2[] = "more text";
        const char * p1 = t1;
        const char * p2 = t2;
        string a(p1,sizeof(t1));
        string b(p2,sizeof(t2));
        string c(a);
        string d(a + b);
        string e;
        a += b;
        e = d;
        c = b + a;

        std::cout << "a:\t" << a;
        std::cout << "b:\t" << b;
        std::cout << "c:\t" << c;
        std::cout << "d:\t" << d;
        std::cout << "e:\t" << e;

        return 0;
    }

I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?

```
==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==

Solution

Comments

Let's consider one of your constructors, and its accompanying comment:

string(const char * p, size_t size) : m_size(size) { //default constructor

At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.

The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.

Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.

The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).

Non-templated string type

The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).

Array new

The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).

operator+

I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.

I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:

string operator+(const string & str) const {
    string r(*this);
    r += str;
    return r;
}

I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.

Alternative Rule

In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.

Copy assignment

Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.

One easy way to get defined behavior is the copy and swap idiom, which could look something like this:

string& operator=(string str) {
    std::cout << "Copy assignment\n";
    using std::swap;
    swap(data, str.data);
    swap(m_size, str.m_size);
    return *this;
}

Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.

In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.

This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:

Phase 1: carry out things that might throw, but don't affect any existing data if they do.
Phase 2: carry out operation that affect existing data, but can't throw.

This way, if anything throws, P

Code Snippets

string(const char * p, size_t size) : m_size(size) { //default constructor

string operator+(const string & str) const {
    string r(*this);
    r += str;
    return r;
}

string& operator=(string str) {
    std::cout << "Copy assignment\n";
    using std::swap;
    swap(data, str.data);
    swap(m_size, str.m_size);
    return *this;
}

string& operator=(string && str) {
    std::cout << "Move assignment\n";
    using std::swap;
    swap(data, str.data);
    swap(m_size, str.m_size);
    return *this;
}

string& operator+=(const string & str) {
    size_t new_size = m_size + str.m_size;
    char * new_data = new char[new_size];
    mempy(new_data, data, m_size); 
    memcpy(new_data+m_size, str.data, str.m_size); 
    delete[] data;
    m_size = new_size;
    data = new_data;
    return *this;
}

Context

StackExchange Code Review Q#149917, answer score: 5

Revisions (0)

No revisions yet.