HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppCritical

What's the rationale for null terminated strings?

Submitted by: @import:stackoverflow-api··
0
Viewed 0 times
rationaleterminatedtheforstringswhatnull

Problem

As much as I love C and C++, I can't help but scratch my head at the choice of null terminated strings:

  • Length prefixed (i.e. Pascal) strings existed before C



  • Length prefixed strings make several algorithms faster by allowing constant time length lookup.



  • Length prefixed strings make it more difficult to cause buffer overrun errors.



  • Even on a 32 bit machine, if you allow the string to be the size of available memory, a length prefixed string is only three bytes wider than a null terminated string. On 16 bit machines this is a single byte. On 64 bit machines, 4GB is a reasonable string length limit, but even if you want to expand it to the size of the machine word, 64 bit machines usually have ample memory making the extra seven bytes sort of a null argument. I know the original C standard was written for insanely poor machines (in terms of memory), but the efficiency argument doesn't sell me here.



  • Pretty much every other language (i.e. Perl, Pascal, Python, Java, C#, etc) use length prefixed strings. These languages usually beat C in string manipulation benchmarks because they are more efficient with strings.



  • C++ rectified this a bit with the std::basic_string template, but plain character arrays expecting null terminated strings are still pervasive. This is also imperfect because it requires heap allocation.



  • Null terminated strings have to reserve a character (namely, null), which cannot exist in the string, while length prefixed strings can contain embedded nulls.



Several of these things have come to light more recently than C, so it would make sense for C to not have known of them. However, several were plain well before C came to be. Why would null terminated strings have been chosen instead of the obviously superior length prefixing?

EDIT: Since some asked for facts (and didn't like the ones I already provided) on my efficiency point above, they stem from a few things:

  • Concat using null terminated strings requires O(n + m) time comple

Solution

From the horse's mouth


None of BCPL, B, or C supports
character data strongly in the
language; each treats strings much
like vectors of integers and
supplements general rules by a few
conventions. In both BCPL and B a
string literal denotes the address of
a static area initialized with the
characters of the string, packed into
cells. In BCPL, the first packed byte
contains the number of characters in
the string; in B, there is no count
and strings are terminated by a
special character, which B spelled
*e. This change was made partially
to avoid the limitation on the length
of a string caused by holding the
count in an 8- or 9-bit slot, and
partly because maintaining the count
seemed, in our experience, less
convenient than using a terminator.

Dennis M Ritchie, Development of the C Language

Context

Stack Overflow Q#4418708, score: 219

Revisions (0)

No revisions yet.