patterncsharpMajor
Looping through an array of strings containing approximately 3 million elements
Viewed 0 times
loopingcontainingapproximatelymillionelementsarraythroughstrings
Problem
I have a function that loops through an array of strings containing approximately 3 million lines of strings. My problem is this loop is running too slowly and I want to know how to make it faster.
And my
string[] words = { "Hello", "world", "!", "My", "name", "is", "Something", "."};
List result = new List();
for (int i = 0; i i + 2 && FileContains(words[i] + " " + words[i + 1] + " " + words[i + 2]))
{
result.Add(new MyClass(words[i] + " " + words[i + 1] + " " + words[i + 2]));
words[i + 1] = "&&&";
words[i + 2] = "&&&";
}
else if (words.Length > i + 1 && FileContains(words[i] + " " + words[i + 1]))
{
result.Add(new MyClass(words[i] + " " + words[i + 1]));
words[i + 1] = "&&&";
}
else
result.Add(new MyClass(words[i], Parent));
}And my
FileContains function which loops through the array of strings:public static bool FileContains(string Text)
{
foreach (string line in ARRAYOFWORDS) // Approximately 3 million strings
{
if (Text.ToLower() == line.Substring(0, line.LastIndexOf('|')))
{
return true;
}
}
return false;
}Solution
Well you can optimize
In fact, at that point it's really simple to rewrite. Here's the code for the original question, which didn't have the
If you store the lines in a
With the
Then use
Note that using
FileContains very simply - stop calling Text.ToLower() on every iteration!In fact, at that point it's really simple to rewrite. Here's the code for the original question, which didn't have the
Substring part:public static bool FileContains(string text)
{
// Variable name changed to save my eyes from the horror.
return lines.Contains(text.ToLower());
}If you store the lines in a
HashSet instead of an array or a list, it will be blazingly fast.With the
Substring part, I'd create a new HashSet with just those substrings, assuming you want to look up more than once:HashSet lineStarts;
...
lineStarts = new HashSet(lines.Select(x => x.Substring(0, x.IndexOf('|'));Then use
lineStarts within FileContains.Note that using
ToLower is a fairly crude approach to case sensitivity. You should think carefully about what this needs to do in terms of culture-sensitivity etc. See the relevant MSDN page for lots more information. (Ain't culture fun?)Code Snippets
public static bool FileContains(string text)
{
// Variable name changed to save my eyes from the horror.
return lines.Contains(text.ToLower());
}HashSet<string> lineStarts;
...
lineStarts = new HashSet<string>(lines.Select(x => x.Substring(0, x.IndexOf('|'));Context
StackExchange Code Review Q#59328, answer score: 21
Revisions (0)
No revisions yet.