HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

ASCII Strings and Zero-Allocation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
asciiandallocationstringszero

Problem

Over the weekend this article inspired me to write an ASCII string implementation that avoids memory allocation during basic operations like Substring and Trim. It's still a work-in-progress but I think this is a solid foundation to build off of and wanted some help checking my logic/maths.

The main method of interest is the overload of Substring that accepts an offset and a count since most other methods are implemented by calling it.

```
using System;
using System.Text;

namespace ByteTerrace.CSharp.Data
{
///
/// Represents text as a series of ASCII characters.
///
///
/// Inspired by Christopher Wright's FastString work: https://github.com/dhasenan/FastString.
///
public struct AsciiString : IEquatable
{
private readonly ArraySegment m_bytes;

///
/// Gets the object at a specified position in this instance.
///
/// A position in the current string.
public char this[int index] {
get {
if ((index (m_bytes.Offset + Length))) {
throw new ArgumentOutOfRangeException(nameof(index));
}

return (char)m_bytes.Array[m_bytes.Offset + index];
}
}
///
/// Indicates whether this instance is empty.
///
public bool IsEmpty {
get {
return m_bytes.Count == 0;
}
}
///
/// Indicates whether this instance is empty or consists of only whitespace characters.
///
public bool IsEmptyOrWhitespace {
get {
return Trim().IsEmpty;
}
}
///
/// The number of characters in this instance.
///
public int Length {
get {
return m_bytes.Count;
}
}

///
/// Initializes a new instance of the structure to the value indicated by an of ASCII bytes

Solution

public AsciiString Substring(int offset, int count) {
    if ((offset  (m_bytes.Offset + Length))) {
        throw new ArgumentOutOfRangeException(nameof(offset));
    }
    if (count > m_bytes.Count) {
        throw new ArgumentOutOfRangeException(nameof(count));
    }

    return new AsciiString(m_bytes.Array, m_bytes.Offset + offset, count);
}


If we take a look at the first if statement we can simplify ((m_bytes.Offset + offset + count) > (m_bytes.Offset + Length)) to ((offset + count) > Length).

If we assume Length == 10 and we call

public AsciiString Substring(int offset) {
    return Substring(offset, Length - offset);
}


with offset == 11 the overloaded Substring(int, int) will be called like Substring(11, -1) which just passes the if conditions but will throw an ArgumentOutOfRangeException at calling the constructor of the ArraySegment which means you are exposing implementation details.

Another edge case is if offset == Length, using the above example with Length == 10, which would result in Substring(10, 0) which shouldn't be valid either.

If you change the second if to

if (count  m_bytes.Count) {
    throw new ArgumentOutOfRangeException(nameof(count));
}


this should be fixed.

For the constructors the points about validation and exposing implementation details apply as well. Although the overloaded constructors are looking nice, some validation should take place.

At least for public AsciiString(byte[] bytes, int offset, int count) the passed arguments should be validated.

The IsWhitespaceCharacter() method could be simplified by just returning the condition of the if. Although I don't know how the compiler will optimize it I would suggest to switch the conditions on the right hand side of the || operator. This is because more characters will be > 0x08 than < 0x0E.

private static bool IsWhitespaceCharacter(ArraySegment bytes, int index) {
    var c = bytes.Array[bytes.Offset + index];
    return (c == 0x20) || ((c  0x08);
}


The IsEmptyOrWhiteSpace() method could be improved some more by first checking for IsEmpty and introducing a IsWhytespace() method which simply checks for whitespace characters.

public bool IsEmptyOrWhitespace
{
    get
    {
        return IsEmpty || IsWhitespace();
    }
}
private bool IsWhitespace()
{
    byte[] current = new byte[m_bytes.Count - m_bytes.Offset];
    Array.Copy(m_bytes.Array, current, current.Length);

   return current.All(c => (c == 0x20) || ((c  0x08));
 }

Code Snippets

public AsciiString Substring(int offset, int count) {
    if ((offset < 0) || ((m_bytes.Offset + offset + count) > (m_bytes.Offset + Length))) {
        throw new ArgumentOutOfRangeException(nameof(offset));
    }
    if (count > m_bytes.Count) {
        throw new ArgumentOutOfRangeException(nameof(count));
    }

    return new AsciiString(m_bytes.Array, m_bytes.Offset + offset, count);
}
public AsciiString Substring(int offset) {
    return Substring(offset, Length - offset);
}
if (count < 1 || count > m_bytes.Count) {
    throw new ArgumentOutOfRangeException(nameof(count));
}
private static bool IsWhitespaceCharacter(ArraySegment<byte> bytes, int index) {
    var c = bytes.Array[bytes.Offset + index];
    return (c == 0x20) || ((c < 0x0E) && (c > 0x08);
}
public bool IsEmptyOrWhitespace
{
    get
    {
        return IsEmpty || IsWhitespace();
    }
}
private bool IsWhitespace()
{
    byte[] current = new byte[m_bytes.Count - m_bytes.Offset];
    Array.Copy(m_bytes.Array, current, current.Length);

   return current.All(c => (c == 0x20) || ((c < 0x0E) && (c > 0x08));
 }

Context

StackExchange Code Review Q#161009, answer score: 4

Revisions (0)

No revisions yet.