patterncsharpMinor
CSV reader using StreamReader and LINQ
Viewed 0 times
linqcsvusingandstreamreaderreader
Problem
Mostly due to readability, I am using the following code to
However, the process takes about 4-5 seconds, which is simply put too long. Any suggestions on how to improve the following (or maybe even replace it)?
The order inside the CSV file is important - the file will contain between (2000-7000) x 25 entries.
The CSV file has several fields which I am not interested in - and I don't necessarily need them in my
This approach results in a strange type for query - and for that matter was pretty slow (I would have an array with about 9-11 properties). Any ideas on that second issue?
- find a specific csv file on a drive
- read that CSV file using a streamReader
- parse it into a
Listto be able to use LINQ and/or PLINQ later on
However, the process takes about 4-5 seconds, which is simply put too long. Any suggestions on how to improve the following (or maybe even replace it)?
var query = (from x in Directory.GetFiles(_path, "*.csv").AsParallel()
where x.Contains(dateTime)
select x).First();
#region Read CSV File
List _tempList = new List();
FileStream stream = new FileStream(query, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (StreamReader reader = new StreamReader(stream, Encoding.Default, true, 1024))
{
string currentLine;
while ((currentLine = reader.ReadLine()) != null)
{
string[] temp = currentLine.Split(new char[] { ',' }, StringSplitOptions.None);
_tempList.Add(temp);
}
}
#endregionThe order inside the CSV file is important - the file will contain between (2000-7000) x 25 entries.
The CSV file has several fields which I am not interested in - and I don't necessarily need them in my
List (or string[][]). I have tried to filter them with a LINQ statement similar to the following pseudo code:var query = from x in MyListOfStringArray
select new {Col1 = x[1] , Col2 = c[4]}.ToArray();This approach results in a strange type for query - and for that matter was pretty slow (I would have an array with about 9-11 properties). Any ideas on that second issue?
Solution
var query = (from x in Directory.GetFiles(_path, "*.csv").AsParallel()
where x.Contains(dateTime)
select x).First();If
Directory.GetFiles(_path, "*.csv") doesn't return any files where x.Contains(dateTime), this line will blow up with an InvalidOperationException that might be a bit off-putting / surprising.I find the query would be clearer with a 100% method syntax - this would be equivalent:
var query = Directory.GetFiles(_path, "*.csv")
.AsParallel()
.Where(name => name.Contains(dateTime))
.First();The name
query is wrong/misleading. It would be a query if it were an IQueryable (i.e. up to right after the .Where() call)... but it's not, var actually stands for string and what you call query is actually a fileName: .First() materializes the query, and returns the first element... or throws an exception.It would be better to call
.FirstOrDefault(), which would make your result null if no files matched your search criteria.I don't understand why you're using
var here, given that you're using explicit types everywhere else.. especially since var is just a string. Don't get me wrong though: I prefer seeing var here... and everywhere else.#region Read CSV FileUsing
#region inside a method clearly says your method is doing more than one thing. Whenever you do that, you're missing a refactoring opportunity. Consider extracting a method here:private string GetFileName(string formattedDateTime)
{
return Directory.GetFiles(_path, "*.csv")
.AsParallel()
.Where(name => name.Contains(formattedDateTime))
.FirstOrDefault();
}(lacking context to recommend passing in an actual
DateTime)That leaves you with a method that (hopefully) now only has this
#region block, which is now completely redundant and can be discarded (if there's still more to it, extract methods as needed)That said...
FileStream stream = new FileStream(query, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (StreamReader reader = new StreamReader(stream, Encoding.Default, true, 1024))Disposing the
StreamReader will also cleanly close and dispose the FileStream, but that is far from obvious unless you know it, and if Jon Skeet says it's best practice to put it in its own using block, then I can only agree with him.Your code can use a variable (declared outside the loop's body) for the value separator: you're creating a new character array for every single row you're reading!
char[] separator = new char[] { ',' };Or more succinctly:
var separator = new[] { ',' };This makes the
temp assignment even clearer/cleaner:string[] temp = currentLine.Split(separator, StringSplitOptions.None);Looking at the
currentLine, it's far from obvious what's going on here:while ((currentLine = reader.ReadLine()) != null)Using the result of an assignment rarely results in easily readable code. I like how it cleverly combines reading the next line into
currentLine and verifying whether that's a null value though... so I'm not sure I'd recommend rewriting that part after all.It's not clear what happens to your
List when you're done reading the file, so I'll assume you're just returning it. In that case, you could consider yielding the results instead:private IEnumerable ReadCsv(string fileName)
{
char[] separator = new[] { ',' };
string currentLine;
using (var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var reader = new StreamReader(stream, Encoding.Default, true, 1024))
{
while ((currentLine = reader.ReadLine()) != null)
{
yield return currentLine.Split(separator, StringSplitOptions.None);
}
}
}Code Snippets
var query = (from x in Directory.GetFiles(_path, "*.csv").AsParallel()
where x.Contains(dateTime)
select x).First();var query = Directory.GetFiles(_path, "*.csv")
.AsParallel()
.Where(name => name.Contains(dateTime))
.First();#region Read CSV Fileprivate string GetFileName(string formattedDateTime)
{
return Directory.GetFiles(_path, "*.csv")
.AsParallel()
.Where(name => name.Contains(formattedDateTime))
.FirstOrDefault();
}FileStream stream = new FileStream(query, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (StreamReader reader = new StreamReader(stream, Encoding.Default, true, 1024))Context
StackExchange Code Review Q#61973, answer score: 9
Revisions (0)
No revisions yet.