patterncsharpMinor
Custom encoding for BinaryReader
Viewed 0 times
customforbinaryreaderencoding
Problem
I have a file that seems to mix encoding in it. It seems like a Unicode encoded file, but the character length string is encoded like a UTF8 or similar. Here is an example:
In this example it stores the string like Unicode, using the extra character, but the length of the string is half of what it should be, 05 instead of 0A, as if it were encoded as UTF8.
If I use:
When I run this then temp = "Ar"
I have this code that works. But is there a better way?
05 41 00 72 00 69 00 61 00 6C 00
5 A . r . i . a . l .In this example it stores the string like Unicode, using the extra character, but the length of the string is half of what it should be, 05 instead of 0A, as if it were encoded as UTF8.
If I use:
using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
temp = reader.ReadString();
}When I run this then temp = "Ar"
I have this code that works. But is there a better way?
using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
tempByte = reader.ReadByte();
var length = Convert.ToInt32(tempByte) * 2;
byteArray = reader.ReadBytes(length);
for (var ww = 0; ww < length; ww = ww + 2)
{
tempString = tempString + (char)byteArray[ww];
}
}Solution
BinaryReader.ReadString() expects the string prefixed with the number of bytes to read, not the number of characters (I think this is because of variable-length encodings, especially UTF-8, but also UTF-16).So, you can't use
ReadString() directly, but you also don't have to convert the characters byte by byte like you do (which wouldn't work for non-ASCII characters anyway).For this, you can use
ReadChars(), which takes as a parameter the number of characters (not bytes) to read.You also need to figure out what format is the number of bytes saved in. It could be a simple single-byte number (which means the string can have at most 255 characters), or it could be VLQ-encoded, which you can read using
Read7BitEncodedInt(). Though that method is protected, so I'm going to assume the former for simplicity.So, the code could look like this:
using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
int characterCount = reader.ReadByte();
char[] characters = reader.ReadChars(characterCount);
return new string(characters);
}Code Snippets
using (var reader = new BinaryReader(File.Open(fileName, FileMode.Open), Encoding.Unicode))
{
int characterCount = reader.ReadByte();
char[] characters = reader.ReadChars(characterCount);
return new string(characters);
}Context
StackExchange Code Review Q#41666, answer score: 3
Revisions (0)
No revisions yet.