patterncsharpMinor
Program slows down significantly based on input
Viewed 0 times
programinputslowsdownbasedsignificantly
Problem
I've been writing a program to perform a kind of pattern-matching in XML and text files.
When my program reaches this section of the code, the CPU usage gets very high and the performance slows down to a point where the program seems to be frozen. It actually isn't, but depending on the input (number of text files and their content), it may take several hours to complete the task.
I'm looking for a more efficient way to rewrite this section of the code:
There are many nested loops and I know this is the problem. What do you think should be improved?
When my program reaches this section of the code, the CPU usage gets very high and the performance slows down to a point where the program seems to be frozen. It actually isn't, but depending on the input (number of text files and their content), it may take several hours to complete the task.
I'm looking for a more efficient way to rewrite this section of the code:
List CandidatesRet = new List();
for (int indexCi = 0; indexCi = 0)
{
string[] listTempCi = Ci[indexCommon].Split(new char[] { ' ' });
foreach (string itemCi in listTempCi)
if (!subitem.Contains(itemCi))
commonItem.Add(itemCi);
}
allCommonItems.Add(commonItem);
}
// generate condidate from common item
foreach (string item in oldItemsetCi)
{
bool flagCi = true;
foreach (List listCommItem in allCommonItems)
if (!listCommItem.Contains(item))
{
flagCi = false;
break;
}
if (flagCi)
CandidatesRet.Add((Ci[indexCi] + " " + item).Trim());
}There are many nested loops and I know this is the problem. What do you think should be improved?
Solution
This should help a little bit:
However, the big problem is the still the double loop, which will grow as a power-of-two rather than linearly with your data size.
List CandidatesRet = new List();
var splitChars = new[] { ' ' };
for (var indexCi = 0; indexCi allItems.Where((t, j) => i != j)
.Aggregate(string.Empty, (current, t) => current + (t + " ")))
.Select(tempStr => tempStr.Trim()));
// THE PROBLEM BEGINS HERE
foreach (var subitem in subItemset)
{
for (var indexCommon = indexCi + 1; indexCommon !subitem.Contains(itemCi)));
}
allCommonItems.Add(commonItem);
}
// generate condidate from common item
CandidatesRet.AddRange(oldItemsetCi
.Select(item => new { item, flagCi = allCommonItems.All(listCommItem => listCommItem.Contains(item)) })
.Where(t => t.flagCi)
.Select(t => (Ci[indexCi] + " " + t.item).Trim()));
}
}However, the big problem is the still the double loop, which will grow as a power-of-two rather than linearly with your data size.
Code Snippets
List<string> CandidatesRet = new List<string>();
var splitChars = new[] { ' ' };
for (var indexCi = 0; indexCi < Ci.Count - 1; indexCi++)
{
// generate all sub itemset with length-1
var allItems = Ci[indexCi].Split(splitChars);
subItemset
.AddRange(allItems.Select((s, i) => allItems.Where((t, j) => i != j)
.Aggregate(string.Empty, (current, t) => current + (t + " ")))
.Select(tempStr => tempStr.Trim()));
// THE PROBLEM BEGINS HERE
foreach (var subitem in subItemset)
{
for (var indexCommon = indexCi + 1; indexCommon < Ci.Count; indexCommon++)
{
if (Ci[indexCommon].IndexOf(subitem, StringComparison.Ordinal) < 0)
{
continue;
}
var listTempCi = Ci[indexCommon].Split(splitChars);
commonItem.AddRange(listTempCi.Where(itemCi => !subitem.Contains(itemCi)));
}
allCommonItems.Add(commonItem);
}
// generate condidate from common item
CandidatesRet.AddRange(oldItemsetCi
.Select(item => new { item, flagCi = allCommonItems.All(listCommItem => listCommItem.Contains(item)) })
.Where(t => t.flagCi)
.Select(t => (Ci[indexCi] + " " + t.item).Trim()));
}
}Context
StackExchange Code Review Q#29418, answer score: 2
Revisions (0)
No revisions yet.