patterncsharpMinor
ASCII Strings and Zero-Allocation
Viewed 0 times
asciiandallocationstringszero
Problem
Over the weekend this article inspired me to write an ASCII string implementation that avoids memory allocation during basic operations like
The main method of interest is the overload of
```
using System;
using System.Text;
namespace ByteTerrace.CSharp.Data
{
///
/// Represents text as a series of ASCII characters.
///
///
/// Inspired by Christopher Wright's FastString work: https://github.com/dhasenan/FastString.
///
public struct AsciiString : IEquatable
{
private readonly ArraySegment m_bytes;
///
/// Gets the object at a specified position in this instance.
///
/// A position in the current string.
public char this[int index] {
get {
if ((index (m_bytes.Offset + Length))) {
throw new ArgumentOutOfRangeException(nameof(index));
}
return (char)m_bytes.Array[m_bytes.Offset + index];
}
}
///
/// Indicates whether this instance is empty.
///
public bool IsEmpty {
get {
return m_bytes.Count == 0;
}
}
///
/// Indicates whether this instance is empty or consists of only whitespace characters.
///
public bool IsEmptyOrWhitespace {
get {
return Trim().IsEmpty;
}
}
///
/// The number of characters in this instance.
///
public int Length {
get {
return m_bytes.Count;
}
}
///
/// Initializes a new instance of the structure to the value indicated by an of ASCII bytes
Substring and Trim. It's still a work-in-progress but I think this is a solid foundation to build off of and wanted some help checking my logic/maths.The main method of interest is the overload of
Substring that accepts an offset and a count since most other methods are implemented by calling it.```
using System;
using System.Text;
namespace ByteTerrace.CSharp.Data
{
///
/// Represents text as a series of ASCII characters.
///
///
/// Inspired by Christopher Wright's FastString work: https://github.com/dhasenan/FastString.
///
public struct AsciiString : IEquatable
{
private readonly ArraySegment m_bytes;
///
/// Gets the object at a specified position in this instance.
///
/// A position in the current string.
public char this[int index] {
get {
if ((index (m_bytes.Offset + Length))) {
throw new ArgumentOutOfRangeException(nameof(index));
}
return (char)m_bytes.Array[m_bytes.Offset + index];
}
}
///
/// Indicates whether this instance is empty.
///
public bool IsEmpty {
get {
return m_bytes.Count == 0;
}
}
///
/// Indicates whether this instance is empty or consists of only whitespace characters.
///
public bool IsEmptyOrWhitespace {
get {
return Trim().IsEmpty;
}
}
///
/// The number of characters in this instance.
///
public int Length {
get {
return m_bytes.Count;
}
}
///
/// Initializes a new instance of the structure to the value indicated by an of ASCII bytes
Solution
public AsciiString Substring(int offset, int count) {
if ((offset (m_bytes.Offset + Length))) {
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (count > m_bytes.Count) {
throw new ArgumentOutOfRangeException(nameof(count));
}
return new AsciiString(m_bytes.Array, m_bytes.Offset + offset, count);
}If we take a look at the first
if statement we can simplify ((m_bytes.Offset + offset + count) > (m_bytes.Offset + Length)) to ((offset + count) > Length). If we assume
Length == 10 and we call public AsciiString Substring(int offset) {
return Substring(offset, Length - offset);
}with
offset == 11 the overloaded Substring(int, int) will be called like Substring(11, -1) which just passes the if conditions but will throw an ArgumentOutOfRangeException at calling the constructor of the ArraySegment which means you are exposing implementation details. Another edge case is if
offset == Length, using the above example with Length == 10, which would result in Substring(10, 0) which shouldn't be valid either.If you change the second
if to if (count m_bytes.Count) {
throw new ArgumentOutOfRangeException(nameof(count));
}this should be fixed.
For the constructors the points about validation and exposing implementation details apply as well. Although the overloaded constructors are looking nice, some validation should take place.
At least for
public AsciiString(byte[] bytes, int offset, int count) the passed arguments should be validated.The
IsWhitespaceCharacter() method could be simplified by just returning the condition of the if. Although I don't know how the compiler will optimize it I would suggest to switch the conditions on the right hand side of the || operator. This is because more characters will be > 0x08 than < 0x0E. private static bool IsWhitespaceCharacter(ArraySegment bytes, int index) {
var c = bytes.Array[bytes.Offset + index];
return (c == 0x20) || ((c 0x08);
}The
IsEmptyOrWhiteSpace() method could be improved some more by first checking for IsEmpty and introducing a IsWhytespace() method which simply checks for whitespace characters.public bool IsEmptyOrWhitespace
{
get
{
return IsEmpty || IsWhitespace();
}
}
private bool IsWhitespace()
{
byte[] current = new byte[m_bytes.Count - m_bytes.Offset];
Array.Copy(m_bytes.Array, current, current.Length);
return current.All(c => (c == 0x20) || ((c 0x08));
}Code Snippets
public AsciiString Substring(int offset, int count) {
if ((offset < 0) || ((m_bytes.Offset + offset + count) > (m_bytes.Offset + Length))) {
throw new ArgumentOutOfRangeException(nameof(offset));
}
if (count > m_bytes.Count) {
throw new ArgumentOutOfRangeException(nameof(count));
}
return new AsciiString(m_bytes.Array, m_bytes.Offset + offset, count);
}public AsciiString Substring(int offset) {
return Substring(offset, Length - offset);
}if (count < 1 || count > m_bytes.Count) {
throw new ArgumentOutOfRangeException(nameof(count));
}private static bool IsWhitespaceCharacter(ArraySegment<byte> bytes, int index) {
var c = bytes.Array[bytes.Offset + index];
return (c == 0x20) || ((c < 0x0E) && (c > 0x08);
}public bool IsEmptyOrWhitespace
{
get
{
return IsEmpty || IsWhitespace();
}
}
private bool IsWhitespace()
{
byte[] current = new byte[m_bytes.Count - m_bytes.Offset];
Array.Copy(m_bytes.Array, current, current.Length);
return current.All(c => (c == 0x20) || ((c < 0x0E) && (c > 0x08));
}Context
StackExchange Code Review Q#161009, answer score: 4
Revisions (0)
No revisions yet.