patterncsharpMinor
Extracting data from .txt files and writing to Excel
Viewed 0 times
excelwritingfilestxtextractingandfromdata
Problem
I am extracting data from .txt files and writing needed data to Excel. My problem is that my code is very very slow and at one point even slower (+35k txt files). How should I rethink this problem?
```
Microsoft.Office.Interop.Excel.Application excelapp = new Microsoft.Office.Interop.Excel.Application();
excelapp.Visible = true;
_Workbook workbook = (_Workbook)(excelapp.Workbooks.Open(textBox2.Text));
_Worksheet worksheet = (_Worksheet)workbook.ActiveSheet;
DateTime dt = DateTime.Now;
foreach (string fileName in Directory.GetFiles(textBox1.Text, "*.txt"))
{
int row = 1, EmptyRow = 0;
while (Convert.ToString(worksheet.Range["A" + row].Value) != null)
{
row++;
EmptyRow = row;
}
string lines1 = GetLine(fileName, 10);
File.WriteAllText(@"D:/text.txt", lines1);
string[] ParsedLines = File.ReadAllLines(@"D:/text.txt");
string comp = ParsedLines[0];
string compare = comp.Substring(30, comp.Length - 30);
if (compare == "Failed")
{
string[] lines = File.ReadAllLines(fileName);
string serial = lines[3];
string SN = serial.Substring(30, serial.Length - 30);
worksheet.Range["A" + EmptyRow].Value2 = SN;
string data = lines[4];
string DT = data.Substring(30, data.Length - 30);
worksheet.Range["B" + EmptyRow].Value2 = DT;
string time = lines[5];
string TM = time.Substring(30, time.Length - 30);
worksheet.Range["C" + EmptyRow].Value2 = TM;
string operat = lines[6];
string OP = operat.Substring(30, operat.Length - 30);
worksheet.Range["D" + Empty
```
Microsoft.Office.Interop.Excel.Application excelapp = new Microsoft.Office.Interop.Excel.Application();
excelapp.Visible = true;
_Workbook workbook = (_Workbook)(excelapp.Workbooks.Open(textBox2.Text));
_Worksheet worksheet = (_Worksheet)workbook.ActiveSheet;
DateTime dt = DateTime.Now;
foreach (string fileName in Directory.GetFiles(textBox1.Text, "*.txt"))
{
int row = 1, EmptyRow = 0;
while (Convert.ToString(worksheet.Range["A" + row].Value) != null)
{
row++;
EmptyRow = row;
}
string lines1 = GetLine(fileName, 10);
File.WriteAllText(@"D:/text.txt", lines1);
string[] ParsedLines = File.ReadAllLines(@"D:/text.txt");
string comp = ParsedLines[0];
string compare = comp.Substring(30, comp.Length - 30);
if (compare == "Failed")
{
string[] lines = File.ReadAllLines(fileName);
string serial = lines[3];
string SN = serial.Substring(30, serial.Length - 30);
worksheet.Range["A" + EmptyRow].Value2 = SN;
string data = lines[4];
string DT = data.Substring(30, data.Length - 30);
worksheet.Range["B" + EmptyRow].Value2 = DT;
string time = lines[5];
string TM = time.Substring(30, time.Length - 30);
worksheet.Range["C" + EmptyRow].Value2 = TM;
string operat = lines[6];
string OP = operat.Substring(30, operat.Length - 30);
worksheet.Range["D" + Empty
Solution
Input Validation
Currently the code just takes user-input and assumes it's well-formed and valid. That is not a reasonable assumption for all accounts and purposes.
The first place this happens is when the workbook is retrieved. The msdn-Documentation states, that the "file name" is required to open the Workbook. What if the user gives you something obviously incorrect like "my$\important\$Wb.exe"?
It would be a much cleaner UX, if you used a FilePicker here to get the workbook's filename.
The next place is where the code assumes
Stuff ...
The casing on the variable names is inconsistent. You should be consistent at almost any cost. It's important to make sure future maintainers (including you) can read the code easily and know what to expect.
In that vein variable names like
On that note there's quite a bunch of unnecessary assignments in the code. You're iterating
Additionally whenever there's a match for what you need, the first thing you do is copying the line into an intermediary variable (which is busywork for the CPU, if it's not optimized out). Let me show what I mean in an example:
If you want to stop iterating lines when all of these properties have been used (which seems like a micro-optimization to me), you'll need to change your foreach loop to a for-loop over an index and stop iterating depending on flags.
You can see here that I use the
Additionally it integrates nicely with the use of
On a last note: Since the
Currently the code just takes user-input and assumes it's well-formed and valid. That is not a reasonable assumption for all accounts and purposes.
The first place this happens is when the workbook is retrieved. The msdn-Documentation states, that the "file name" is required to open the Workbook. What if the user gives you something obviously incorrect like "my$\important\$Wb.exe"?
It would be a much cleaner UX, if you used a FilePicker here to get the workbook's filename.
The next place is where the code assumes
textBox1.Text contains a valid Path. Here the "correct" solution is to show a FolderPickerStuff ...
The casing on the variable names is inconsistent. You should be consistent at almost any cost. It's important to make sure future maintainers (including you) can read the code easily and know what to expect.
In that vein variable names like
SN and DT don't even remotely help, because nobody beside you right now can tell what they mean. You'll have a very hard time reading the code in 2 or 3 months, because you will have forgotten most of what the names mean. serialNumber instead of SN and timestamp instead of TM as well as better naemes for all these variables will help you when you need to touch this code again.On that note there's quite a bunch of unnecessary assignments in the code. You're iterating
foreach (string line in lines) multiple times. For one you could merge all of these loops, because they're completely independent.Additionally whenever there's a match for what you need, the first thing you do is copying the line into an intermediary variable (which is busywork for the CPU, if it's not optimized out). Let me show what I mean in an example:
foreach (string line in lines)
{
if (line.Contains("FixtureCoverResistance:"))
{
(worksheet.Cells[emptyRow, 6] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
if (line.Contains("FwProgrammingCheck:"))
{
(worksheet.Cells[emptyRow, 7] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
// ...
}If you want to stop iterating lines when all of these properties have been used (which seems like a micro-optimization to me), you'll need to change your foreach loop to a for-loop over an index and stop iterating depending on flags.
You can see here that I use the
Cells property to access the relevant cells in the worksheet instead of Range. This should be minimally faster, because accessing a Range over a string is always bound to make Excel parse the specifier string into something more suitable for addressing the range.Additionally it integrates nicely with the use of
EmptyRowOn a last note: Since the
else-block is empty, you may want to consider inverting the if-condition to save one level of indentation and switch away early:if (compare != "Failed")
{
continue;
}
string[] lines = File.ReadAllLines(fileName);
// ...Code Snippets
foreach (string line in lines)
{
if (line.Contains("FixtureCoverResistance:"))
{
(worksheet.Cells[emptyRow, 6] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
if (line.Contains("FwProgrammingCheck:"))
{
(worksheet.Cells[emptyRow, 7] as Excel.Range).Value2 = line.Substring(31, line.length - 31);
}
// ...
}if (compare != "Failed")
{
continue;
}
string[] lines = File.ReadAllLines(fileName);
// ...Context
StackExchange Code Review Q#148304, answer score: 2
Revisions (0)
No revisions yet.