patterncsharpModerate
Combining .txt files
Viewed 0 times
txtcombiningfiles
Problem
A guy at the company I work for needed a small application that would combine multiple text files into one, larger text file.
I wrote a console application for this. it seems pretty efficient, but I was wondering if there would be an even more efficient way of doing this.
It has 2 important functions, one that gets the files from a folder, where
And of course the function that combines the files together, its input are the name of the file (
And because
How efficient is this and could it be more efficient?
I wrote a console application for this. it seems pretty efficient, but I was wondering if there would be an even more efficient way of doing this.
It has 2 important functions, one that gets the files from a folder, where
string input is the folder location:static string[] getFiles(string input)
{
DirectoryInfo dinfo = new DirectoryInfo(@input);
FileInfo[] files = dinfo.GetFiles("*.txt");
List list = new List();
foreach(FileInfo file in files)
{
list.Add(input + @"\" + file.Name);
}
string[] arr = list.ToArray();
return arr;
}And of course the function that combines the files together, its input are the name of the file (
string newName) and an array with the names of the files found in the folder by getFiles() (string[] files):static void writeDump(string newName, string[] files)
{
if (!File.Exists(newName))
{
using (StreamWriter sw = File.CreateText(newName))
{
for (int i = 0; i < files.Length; i++)
{
using (StreamReader sr = File.OpenText(files[i]))
{
string s = "";
while ((s = sr.ReadLine()) != null)
{
sw.WriteLine(s);
}
}
}
}
} else
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine("File already exists");
start(); //start is called from the main function
}
}And because
start(); might be confusing, I'll also add the main function here:static void Main(string[] args)
{
start();
}How efficient is this and could it be more efficient?
Solution
list.Add(input + @"\" + file.Name);Seems a bit pointless:
file.FullName would get you the fully qualified name without throwing information away and reconstructing it. In fact, that method could be simplified with Linq tostatic string[] getFiles(string input)
{
DirectoryInfo dinfo = new DirectoryInfo(@input);
return dinfo.GetFiles("*.txt").Select(f => f.FullName).ToArray();
}I also note that the .Net convention for the name would be
GetFiles with an initial uppercase letter.for (int i = 0; i < files.Length; i++)
{
using (StreamReader sr = File.OpenText(files[i]))
{
string s = "";
while ((s = sr.ReadLine()) != null)
{
sw.WriteLine(s);
}
}
}Since you don't care about
i you could simplify things with foreach; and the initial value of s is unnecessary, so you could haveforeach (var filename in files)
{
using (StreamReader sr = File.OpenText(filename))
{
string s;
while ((s = sr.ReadLine()) != null)
{
sw.WriteLine(s);
}
}
}But now we get to two key points of the requirements which aren't explicitly stated:
- If the files don't end with newlines, this code will insert newlines. That may or may not be intended, and it may or may not be desirable.
-
This code is using an
Encoding to parse the bytes to strings, then using an Encoding to convert the strings back to bytes. The particular encoding used is implicit. This isn't particularly efficient, but it does have some benefits:- If the files were generated by Microsoft tools, they are quite likely to start with BOMs (even if they're UTF-8). In the nasty case that they mix UTF-8-BOM, UTF-8, and UTF-16 then you rely on the encoding conversion.
- Even if the files are consistent, you're going to avoid the appearance of BOMs embedded in the text that a straightforward byte-by-byte concatenation would give.
It also has at least one non-performance-related disadvantage:
- Regardless of the encoding of the input files, the output file is likely to be UTF-8-BOM, which may be an undesirable side-effect if they were all UTF-8 or UTF-16.
If you wanted a straight byte-by-byte conversion then it would be more efficient to use
using (var strmOut = File.Create(newName))
{
foreach (var filename in files)
{
using (var strmIn = File.OpenRead(filename))
{
strmIn.CopyTo(strmOut);
}
}
}If you can guarantee that the input files are all UTF-8-BOM then it would be more efficient to use
using (var strmOut = File.Create(newName))
{
foreach (var filename in files)
{
using (var strmIn = File.OpenRead(filename))
{
strmIn.Position = 3;
strmIn.CopyTo(strmOut);
}
}
}although that's not production-quality code (should check that there are 3 bytes and that they correspond to a BOM).
Code Snippets
list.Add(input + @"\" + file.Name);static string[] getFiles(string input)
{
DirectoryInfo dinfo = new DirectoryInfo(@input);
return dinfo.GetFiles("*.txt").Select(f => f.FullName).ToArray();
}for (int i = 0; i < files.Length; i++)
{
using (StreamReader sr = File.OpenText(files[i]))
{
string s = "";
while ((s = sr.ReadLine()) != null)
{
sw.WriteLine(s);
}
}
}foreach (var filename in files)
{
using (StreamReader sr = File.OpenText(filename))
{
string s;
while ((s = sr.ReadLine()) != null)
{
sw.WriteLine(s);
}
}
}using (var strmOut = File.Create(newName))
{
foreach (var filename in files)
{
using (var strmIn = File.OpenRead(filename))
{
strmIn.CopyTo(strmOut);
}
}
}Context
StackExchange Code Review Q#153019, answer score: 12
Revisions (0)
No revisions yet.