HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Faster method to find data using search word?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
methodsearchfasterwordusingfinddata

Problem

I need to search through a string array to find every occurrence of "Parameters Table", and between each "Parameters Table" and the next, get another string from a specified index (that remains constant). I have been doing this like so:

public List findlistOfNames(string[] arrayToLookForNames)
{
    List listOfNames = new List();

    const string separator = "Parameters Table"; //This is the string I am searching for

    var cuttedWords = arrayToLookForNames.SkipWhile(x => x != separator).Skip(1);

    while (cuttedWords.Any())
    {
        var variable = cuttedWords.TakeWhile(x => x != separator).ToArray();
        cuttedWords = cuttedWords.Skip(variable.Length + 1);
        listOfNames.Add(variable[2]); //This (always index 2) needs to be added to the list
    }
    return listOfNames;
}


This works but too slowly. Is there a better way to do this?

Here is a snippet of string[] arrayToLookForNames:


Parameters Table


0


41


Baro Pressure


hPa


AFD2


recorded


Parameters Table


0


42


Baro Setting


in-hg

Solution

Perhaps the best way to solve this problem is to not have it in the first place. If you are in control of the code that produce this array of string, probably from a text file or a similar input, you could produce a tree of values instead right from the start. That way, instead of trying to find specific values through a text search, you could simply find the nodes in the tree that you need, for instance the "Parameters Table" node, and then dive into each parameters from there. This would look a little bit like this:


    A
        B1
        C1
    A
        B2
        C2


If you are not in control of this code, then you have no other choice than to search through the entire list. Marcin Juraszek did propose a pretty good solution for this. Another way to do this search, which would also be a little bit faster would be this:

public static class EnumerableExtensions
{
    private class SearchResults
    {
        public List Values = new List();
        public int SkipCounter = -1;
    }

    public static List FindAllSubItems(this IEnumerable source, string keyword, int distance)
    {
        return source.Aggregate(
            new SearchResults(),
            (sr, v) =>
            {
                if (v.Equals(keyword))
                {
                    sr.SkipCounter = distance;
                }

                if (sr.SkipCounter == 0)
                {
                    sr.Values.Add(v);
                }

                --sr.SkipCounter;

                return sr;
            },
            sr => sr.Values
        );
    }
}


Which can be used like so:

var someList = new string[] {"A", "B1", "C1", "A", "B2", "C2" };
someList.FindAllSubItems("A", 1); // returns { "B1", "B2" }


The idea around this solution is to make sure that only one pass through the list will ever happen. This is an O(n) solution, as opposed to Juraszek's one which is an O(n + m) solution where n is the amount of inputs and m is the amount of matching keys. This solution also adds a safety around keys that does not have at least distance elements. For now, it simply skips it, but it is trivial to check for the value of SkipCounter before resting it if an exception needs to be thrown.

Once we find a match for our first parameter, we set a skip counter based on a distance parameter. We then proceed on decrementing the counter until it reaches zero. If the counter reaches zero, we have a found an item at distance items of the parameter so we save it. If it never reaches zero it means that we are looking for an item further than there is in the list. Providing a negative distance to this algorithm will never yield any results as it is a forward only algorithm. When the final value is returned, we simply trash the counter and only keep the values we saved in our temporary SearchResults object.

EDIT

As requested in the comment, the question author is wondering how to trigger updates to a progress bar in a solution like this. Since this code is using LINQ which works wonders with yield return, you can simply call it like this:

var items = someList
    .Select((x, i) => { UpdateMyProgressBarForIndex(i); return x; })
    .FindAllSubItems(...);


As you expect, the lambda specified in the select block will be called for each element in the list. But because of how LINQ methods are designed (using yield return), this will happen as the aggregate method process them. This is because the list of items that Select returns is built only as new items are requested by the bottom most foreach loop which runs inside the Aggregate method. Basically, you will see this behavior:

Initialize aggregate seed
foreach items in the list
Select's lambda for item (x)
Aggregate lambda for item(x)
when done
extract final values from the seed

Code Snippets

<root node>
    A
        B1
        C1
    A
        B2
        C2
public static class EnumerableExtensions
{
    private class SearchResults
    {
        public List<String> Values = new List<string>();
        public int SkipCounter = -1;
    }

    public static List<string> FindAllSubItems(this IEnumerable<string> source, string keyword, int distance)
    {
        return source.Aggregate(
            new SearchResults(),
            (sr, v) =>
            {
                if (v.Equals(keyword))
                {
                    sr.SkipCounter = distance;
                }

                if (sr.SkipCounter == 0)
                {
                    sr.Values.Add(v);
                }

                --sr.SkipCounter;

                return sr;
            },
            sr => sr.Values
        );
    }
}
var someList = new string[] {"A", "B1", "C1", "A", "B2", "C2" };
someList.FindAllSubItems("A", 1); // returns { "B1", "B2" }
var items = someList
    .Select((x, i) => { UpdateMyProgressBarForIndex(i); return x; })
    .FindAllSubItems(...);
Initialize aggregate seed
foreach items in the list
Select's lambda for item (x)
Aggregate lambda for item(x)
when done
extract final values from the seed

Context

StackExchange Code Review Q#54431, answer score: 5

Revisions (0)

No revisions yet.