HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

EncodingTranslatorStream - A stream object that translates character encoding

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
streamcharacterthattranslatesencodingencodingtranslatorstreamobject

Problem

This class is a stream designed to perform character encoding translation. So you instantiate it, pass an input stream, and specify the input encoding and desired output encoding, so that when you read from this stream, is will translate the source data from one encoding type to another.

I'm interested in a general code review on style/organization of code as well as for any apparent bugs and performance considerations

```
///
/// This class is a stream designed to perform character encoding translation from one encoding to another.
///
public class EncodingTranslatorStream : System.IO.Stream
{
///
/// Input data. This is the data that well be decoded, and re-encoded in the specified encoding
///
private System.IO.Stream strInput_m;

///
/// Input stream reader. This will be responsible for decoding the input bytes into unicode characters based on the specified input encoding
///
private StreamReader srInput_m;

///
/// Output stream reader. This will be responsible for encoding unicode characters into bytes based on the specified output encoding
///
private StreamWriter swOutput_m;

///
/// Holds a stream of bytes, and when read, the bytes are automatically removed from the stream
///
private Stream strOut_m;

///
/// Constructor. Specifies the input and output encoding.
///
/// Input data, that will be decoded and re-encode into the specified output encoding
/// The input character encoding to use.
/// Output encoding
///
/// The character encoding is set by the encoding parameter.
/// The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
///
public EncodingTranslatorStream(System.IO.Stream strIn

Solution

Constructors

Because the constructor of the StreamReader (Stream, Encoding) is equal to calling the overloaded constructor (Stream, Encoding, bool) with true for the bool parameter, you should use constructor chaining.

From the first link:

The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.

So you could simplify your code like

public EncodingTranslatorStream(System.IO.Stream strInput, Encoding encodingIn, Encoding encodingOut)   
     : this(strInput, encodingIn, true, encodingOut)
{}


Dead code like //this.strOut_m = new MemoryStream(); should be removed because it only adds noise to the code instead of any value.

The same is true for comments which are saying what is done instead of why something is done. Please read this fine answer about comments: https://codereview.stackexchange.com/a/90113/29371

Possible problems

The overridden Position property can cause problems if a not correctly coded subclass of stream is passed to the constructor.

Usually a Stream should throw a NotSupportedException if the Position property is set and the stream is not seekable. So if you check CanSeek before setting the Position you have done everything you can do to prevent any exceptions. If a subclass of the Stream class is passed in, which doesn't follow this pattern, noone can blame you then.

Again the naming

This block of code

if (this.srInput_m.CurrentEncoding.Equals(this.swOutput_m.Encoding))
{
    //The encodings are the same,  lets just bypass the translation stuff
    return this.strInput_m.Read(buffer, offset, count);
}


is very hard to read/grasp at first glance. The difference between srInput_m and strInput_m is extremely small.

You should at least change sr_Input_m to reader or streamReader with or without your postfix of choice where as I would not use it if you always use this.

Validation

In the Read() method you have

//Validate the parameters passed in
this.ValidateBufferArgs(buffer, offset, count);


but you are using the parameters before the check if the encoding of the used StreamReader equals the encoding of the used StreamWriter.

Either omit the check (I guess the stream and streamreader take care of this) or move it to the top of the method.

Read() method

Basically the Read() method is way to long. You should consider to break it into smaller methods which are easier to maintain and also better to read.

Code Snippets

public EncodingTranslatorStream(System.IO.Stream strInput, Encoding encodingIn, Encoding encodingOut)   
     : this(strInput, encodingIn, true, encodingOut)
{}
if (this.srInput_m.CurrentEncoding.Equals(this.swOutput_m.Encoding))
{
    //The encodings are the same,  lets just bypass the translation stuff
    return this.strInput_m.Read(buffer, offset, count);
}
//Validate the parameters passed in
this.ValidateBufferArgs(buffer, offset, count);

Context

StackExchange Code Review Q#93269, answer score: 4

Revisions (0)

No revisions yet.