HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Custom encoding for BinaryReader

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
customforbinaryreaderencoding

Problem

I have a file that seems to mix encoding in it. It seems like a Unicode encoded file, but the character length string is encoded like a UTF8 or similar. Here is an example:

05 41 00 72 00 69 00 61 00 6C 00
5  A  .  r  .  i  .  a  .  l  .


In this example it stores the string like Unicode, using the extra character, but the length of the string is half of what it should be, 05 instead of 0A, as if it were encoded as UTF8.

If I use:

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
  temp = reader.ReadString();
}


When I run this then temp = "Ar"

I have this code that works. But is there a better way?

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
  tempByte = reader.ReadByte();
  var length = Convert.ToInt32(tempByte) * 2;
  byteArray = reader.ReadBytes(length);
  for (var ww = 0; ww < length; ww = ww + 2)
  {
    tempString = tempString + (char)byteArray[ww];
  }
}

Solution

BinaryReader.ReadString() expects the string prefixed with the number of bytes to read, not the number of characters (I think this is because of variable-length encodings, especially UTF-8, but also UTF-16).

So, you can't use ReadString() directly, but you also don't have to convert the characters byte by byte like you do (which wouldn't work for non-ASCII characters anyway).

For this, you can use ReadChars(), which takes as a parameter the number of characters (not bytes) to read.

You also need to figure out what format is the number of bytes saved in. It could be a simple single-byte number (which means the string can have at most 255 characters), or it could be VLQ-encoded, which you can read using Read7BitEncodedInt(). Though that method is protected, so I'm going to assume the former for simplicity.

So, the code could look like this:

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
    int characterCount = reader.ReadByte();
    char[] characters = reader.ReadChars(characterCount);
    return new string(characters);
}

Code Snippets

using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
    int characterCount = reader.ReadByte();
    char[] characters = reader.ReadChars(characterCount);
    return new string(characters);
}

Context

StackExchange Code Review Q#41666, answer score: 3

Revisions (0)

No revisions yet.