HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Text parser code elegance

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parsercodetextelegance

Problem

I have a text parser that reads certain information from a given file with a specified format. The text file contains some measured properties of a product. The date model number/lot number is written on the file name.

```
class SomeTextFileParser
{
private static int POINT_COUNT = 15;

string filePath;
string date;
string modelName;
string lotName;
string fileDate;
string fileModelName;
string fileLotName;
public string errMessage;
public List dataList = new List();

private bool ErrorReturn(string message)
{
this.errMessage += message + " ";
return false;
}

public SomeTextFileParser(string path)
{
this.filePath = path;
this.errMessage = "";
}

public bool Parse()
{
try
{
string[] lines;

if (!File.Exists(this.filePath))
return ErrorReturn("File does not exist.");

lines = System.IO.File.ReadAllLines(this.filePath, Encoding.GetEncoding(949));

if (lines.Length < 1)
return ErrorReturn("File is empty.");

if (!ParseFilePath())
return ErrorReturn("");

if (!ParseHeaderLine(lines[0]))
return ErrorReturn("");

if (!modelName.Equals(fileModelName) || !lotName.Equals(fileLotName))
return ErrorReturn("File name does not correspond with the model/lot number in the content.");

this.dataList.Clear();

foreach (string line in lines)
{
string[] valueLineParts = line.Split(',');
if (valueLineParts.Length != POINT_COUNT + 7)
{
continue;
}

if (!ParseValueLine(line))
{
this.dataList.Clear();
return ErrorReturn("");
}
}

if(this.dataList.Count == 0)
return

Solution

Answers to specific questions

Property vs. Field

It can be considered A Matter Of Personal Preference (AMOPP) as much as anything else, but I would declare public data as properties rather than as fields. A property can be considered part of the class's contract; it can be included in an interface. A field is implementation. For example, one could have ErrorMessage as string property backed by a list of strings.

private readonly List _errorMessages = new List();
//...
public string ErrorMessage {
    get { return string.Join(Environment.NewLine, _errorMessages); }
}


Note: AMOPP, but one may as well go with ErrorMessage as ErrMessage.

TryParse()?

Unless you have a strong need to avoid throwing an exception from the method I would simply have Parse return IEnumerable and throw an exception on any sort of problem.

Most of the conditions that you have errors returning are exceptions (file not found, unreadable file name, unreadable data). We should be throwing exceptions.

Note: although the ErrMessage code allows for adding messages onto the existing one, that doesn't seem to happen. A single error message in an exception should suit.

Another possibility is to have a constructed return value, say, ParseResult that would include the output of the parsing and any error message.

public class ParseResult {

    public ParseResult(IEnumerable output) 
        : this(null, output) {}

    public ParseResult(string errorMessage ) 
        : this( errorMessage, null) { }

    private ParseResult(string errorMessage, IEnumerable output ) {
        ErrorMessage = errorMessage;

        Output = output;
    }

    public string ErrorMessage { get; private set; }

    public bool IsSuccess {
        get { return ErrorMessage != null; }
    }

    public IEnumerable Output { get; private set; }
}


As in the answer from @mjolka, I would modify the Parse() function to take in the path as input and return the result. There is no value in passing it into the ctor and then using it in parse. Also, this way, an instance of the parser can be used on different files.

ParseResult Parse(string path) {
    // ...
}

// throws exception
IEnumerable Parse(string path) {
    // ...
}


I wouldn't change it to a static method. Again, AMOPP, but if we make Parse() a static method, there are few of the mocking frameworks (and, AFAIK, none of the free ones) that support mocking static methods and this will hamper unit testing.

ParseFilePath - Class vs. Instance members

I would go with returning an class from ParseFileName and then passing that into a constructor the EqusData. It localizes changes if/when the file name format changes.

public class FileInfo {

    public FileInfo(DateTime date, string lotName, string modelName) {
        Date = date;
        LotName = lotName;
        ModelName = modelName;
    }

    public DateTime Date { get; private set; }
    public string LotName { get; private set; }
    public string ModelName { get; private set; }
}

public class EqusData {

    public EqusData(FileInfo info) {
        Values = new List();
        Date = info.Date;
        LotName = info.LotName;
        ModelName = info.ModelName;
    }

    public DateTime Date { get; private set; }
    public string LotName { get; private set; }
    public string ModelName { get; private set; }
    public int Sequence { get; set; }
    public bool IsGood { get; set; }
    public string Usl { get; set; }
    public string Lsl { get; set; }
    public List Values { get; private set; }

}


Now, if the format of the file name changes (say to add a Batch number) we need to change ParseFileName() and EqusData and its ctor but the rest of the code is untouched.

There's a few things is here that count as AMOPP rather than solid do this items.

All the members of EqusData are Properties not fields - note: even if fields, the names should be capitalized
Values, Date, LotName, ModelName are all readonly. They don't need to be settable from outside the class. If they don't need to be, then don't let them be.
Date is stored as a DateTime not a string. Helps with locales (if needed) and is just generally cleaner. If it is a date, treat it as one, not as a string.

Other Points

In terms of readability there are too many returns.

Exceptions should be used to handle exceptional situations and most of the errors here seem to fall into that category. The top level function (Parse) has too many different pieces at too many different levels. By using exceptions and not having to worry about checking and propagating returns we can simplify the layout a lot and make it more readable.

public IEnumerable Parse(string path) {

    var fileInfo = ParseFilePath(path);

    var lines = ReadLinesFromFile(path);

    CheckHeaderLine(lines[0], fileInfo);

    return ReadValueLines(fileInfo, lines.Skip(1));

}


Get the file info from the file name (path)
Read all the lines from the file
Check the header against th

Code Snippets

private readonly List<string> _errorMessages = new List<string>();
//...
public string ErrorMessage {
    get { return string.Join(Environment.NewLine, _errorMessages); }
}
public class ParseResult {

    public ParseResult(IEnumerable<EqusData> output) 
        : this(null, output) {}

    public ParseResult(string errorMessage ) 
        : this( errorMessage, null) { }

    private ParseResult(string errorMessage, IEnumerable<EqusData> output ) {
        ErrorMessage = errorMessage;

        Output = output;
    }

    public string ErrorMessage { get; private set; }

    public bool IsSuccess {
        get { return ErrorMessage != null; }
    }

    public IEnumerable<EqusData> Output { get; private set; }
}
ParseResult Parse(string path) {
    // ...
}

// throws exception
IEnumerable<EqusData> Parse(string path) {
    // ...
}
public class FileInfo {

    public FileInfo(DateTime date, string lotName, string modelName) {
        Date = date;
        LotName = lotName;
        ModelName = modelName;
    }

    public DateTime Date { get; private set; }
    public string LotName { get; private set; }
    public string ModelName { get; private set; }
}

public class EqusData {

    public EqusData(FileInfo info) {
        Values = new List<string>();
        Date = info.Date;
        LotName = info.LotName;
        ModelName = info.ModelName;
    }

    public DateTime Date { get; private set; }
    public string LotName { get; private set; }
    public string ModelName { get; private set; }
    public int Sequence { get; set; }
    public bool IsGood { get; set; }
    public string Usl { get; set; }
    public string Lsl { get; set; }
    public List<string> Values { get; private set; }

}
public IEnumerable<EqusData> Parse(string path) {

    var fileInfo = ParseFilePath(path);

    var lines = ReadLinesFromFile(path);

    CheckHeaderLine(lines[0], fileInfo);

    return ReadValueLines(fileInfo, lines.Skip(1));

}

Context

StackExchange Code Review Q#53951, answer score: 6

Revisions (0)

No revisions yet.