patterncsharpMinor
EncodingTranslatorStream - A stream object that translates character encoding
Viewed 0 times
streamcharacterthattranslatesencodingencodingtranslatorstreamobject
Problem
This class is a stream designed to perform character encoding translation. So you instantiate it, pass an input stream, and specify the input encoding and desired output encoding, so that when you read from this stream, is will translate the source data from one encoding type to another.
I'm interested in a general code review on style/organization of code as well as for any apparent bugs and performance considerations
```
///
/// This class is a stream designed to perform character encoding translation from one encoding to another.
///
public class EncodingTranslatorStream : System.IO.Stream
{
///
/// Input data. This is the data that well be decoded, and re-encoded in the specified encoding
///
private System.IO.Stream strInput_m;
///
/// Input stream reader. This will be responsible for decoding the input bytes into unicode characters based on the specified input encoding
///
private StreamReader srInput_m;
///
/// Output stream reader. This will be responsible for encoding unicode characters into bytes based on the specified output encoding
///
private StreamWriter swOutput_m;
///
/// Holds a stream of bytes, and when read, the bytes are automatically removed from the stream
///
private Stream strOut_m;
///
/// Constructor. Specifies the input and output encoding.
///
/// Input data, that will be decoded and re-encode into the specified output encoding
/// The input character encoding to use.
/// Output encoding
///
/// The character encoding is set by the encoding parameter.
/// The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
///
public EncodingTranslatorStream(System.IO.Stream strIn
I'm interested in a general code review on style/organization of code as well as for any apparent bugs and performance considerations
```
///
/// This class is a stream designed to perform character encoding translation from one encoding to another.
///
public class EncodingTranslatorStream : System.IO.Stream
{
///
/// Input data. This is the data that well be decoded, and re-encoded in the specified encoding
///
private System.IO.Stream strInput_m;
///
/// Input stream reader. This will be responsible for decoding the input bytes into unicode characters based on the specified input encoding
///
private StreamReader srInput_m;
///
/// Output stream reader. This will be responsible for encoding unicode characters into bytes based on the specified output encoding
///
private StreamWriter swOutput_m;
///
/// Holds a stream of bytes, and when read, the bytes are automatically removed from the stream
///
private Stream strOut_m;
///
/// Constructor. Specifies the input and output encoding.
///
/// Input data, that will be decoded and re-encode into the specified output encoding
/// The input character encoding to use.
/// Output encoding
///
/// The character encoding is set by the encoding parameter.
/// The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
///
public EncodingTranslatorStream(System.IO.Stream strIn
Solution
Constructors
Because the constructor of the StreamReader
From the first link:
The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
So you could simplify your code like
Dead code like
The same is true for comments which are saying what is done instead of why something is done. Please read this fine answer about comments: https://codereview.stackexchange.com/a/90113/29371
Possible problems
The overridden
Usually a
Again the naming
This block of code
is very hard to read/grasp at first glance. The difference between
You should at least change
Validation
In the
but you are using the parameters before the check if the encoding of the used
Either omit the check (I guess the stream and streamreader take care of this) or move it to the top of the method.
Read() method
Basically the
Because the constructor of the StreamReader
(Stream, Encoding) is equal to calling the overloaded constructor (Stream, Encoding, bool) with true for the bool parameter, you should use constructor chaining.From the first link:
The StreamReader object attempts to detect the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
So you could simplify your code like
public EncodingTranslatorStream(System.IO.Stream strInput, Encoding encodingIn, Encoding encodingOut)
: this(strInput, encodingIn, true, encodingOut)
{}Dead code like
//this.strOut_m = new MemoryStream(); should be removed because it only adds noise to the code instead of any value.The same is true for comments which are saying what is done instead of why something is done. Please read this fine answer about comments: https://codereview.stackexchange.com/a/90113/29371
Possible problems
The overridden
Position property can cause problems if a not correctly coded subclass of stream is passed to the constructor.Usually a
Stream should throw a NotSupportedException if the Position property is set and the stream is not seekable. So if you check CanSeek before setting the Position you have done everything you can do to prevent any exceptions. If a subclass of the Stream class is passed in, which doesn't follow this pattern, noone can blame you then.Again the naming
This block of code
if (this.srInput_m.CurrentEncoding.Equals(this.swOutput_m.Encoding))
{
//The encodings are the same, lets just bypass the translation stuff
return this.strInput_m.Read(buffer, offset, count);
}is very hard to read/grasp at first glance. The difference between
srInput_m and strInput_m is extremely small.You should at least change
sr_Input_m to reader or streamReader with or without your postfix of choice where as I would not use it if you always use this.Validation
In the
Read() method you have//Validate the parameters passed in
this.ValidateBufferArgs(buffer, offset, count);but you are using the parameters before the check if the encoding of the used
StreamReader equals the encoding of the used StreamWriter.Either omit the check (I guess the stream and streamreader take care of this) or move it to the top of the method.
Read() method
Basically the
Read() method is way to long. You should consider to break it into smaller methods which are easier to maintain and also better to read.Code Snippets
public EncodingTranslatorStream(System.IO.Stream strInput, Encoding encodingIn, Encoding encodingOut)
: this(strInput, encodingIn, true, encodingOut)
{}if (this.srInput_m.CurrentEncoding.Equals(this.swOutput_m.Encoding))
{
//The encodings are the same, lets just bypass the translation stuff
return this.strInput_m.Read(buffer, offset, count);
}//Validate the parameters passed in
this.ValidateBufferArgs(buffer, offset, count);Context
StackExchange Code Review Q#93269, answer score: 4
Revisions (0)
No revisions yet.