patterncsharpMinor
Improve my copying of a CSV file
Viewed 0 times
improvecsvfilecopying
Problem
I need to merge some large csv files but I feel that the way I am doing it is not optmized. It gets a number of csvfiles and creates one
Any suggestion or code snippets?
public static void MergeFiles(IEnumerable csvFileNames, string outputPath)
{
var sb = new StringBuilder();
bool hasHeader = false;
foreach (string csvFileName in csvFileNames)
{
TextReader tr = new StreamReader(csvFileName);
string headers = tr.ReadLine();
if (!hasHeader)
{
sb.AppendLine(headers);
hasHeader = true;
}
sb.AppendLine(tr.ReadToEnd());
}
File.WriteAllText(outputPath, sb.ToString());
}Any suggestion or code snippets?
Solution
-
-
You are reading all files into memory and then writing it out. While this probably has a fairly good performance as it's first performing a bunch of (probably) sequential reads of all the files and then one big block write you could consider writing it out as you read it:
If the files are very big you even might want to consider reading the input line by line and writing it to the output line by line. This will be less performant but also use much less memory. In the end it's a trade off between speed and memory consumption (as it often is in software development).
The other advantage with the write as you go method is that you open the output file first - so if there is a problem with writing to the target you will know before you do any work at all.
StreamReader is IDisposable and therefor the use of it should be wrapped in a using block like this:using (TextReader tr = new StreamReader(csvFileName))
{
...
}-
You are reading all files into memory and then writing it out. While this probably has a fairly good performance as it's first performing a bunch of (probably) sequential reads of all the files and then one big block write you could consider writing it out as you read it:
using (var writer = new StreamWriter(outputPath))
{
foreach (var csvFileName in csvFileNames)
{
using (var reader = new StreamReader(csvFileName))
{
string headers = tr.ReadLine();
if (!hasHeader)
{
writer.WriteLine(headers);
hasHeader = true;
}
writer.Write(reader.ReadToEnd());
}
}
}If the files are very big you even might want to consider reading the input line by line and writing it to the output line by line. This will be less performant but also use much less memory. In the end it's a trade off between speed and memory consumption (as it often is in software development).
The other advantage with the write as you go method is that you open the output file first - so if there is a problem with writing to the target you will know before you do any work at all.
Code Snippets
using (TextReader tr = new StreamReader(csvFileName))
{
...
}using (var writer = new StreamWriter(outputPath))
{
foreach (var csvFileName in csvFileNames)
{
using (var reader = new StreamReader(csvFileName))
{
string headers = tr.ReadLine();
if (!hasHeader)
{
writer.WriteLine(headers);
hasHeader = true;
}
writer.Write(reader.ReadToEnd());
}
}
}Context
StackExchange Code Review Q#36273, answer score: 5
Revisions (0)
No revisions yet.