HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Searching files in a directory for a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
directorysearchingfilesforstring

Problem

I have the following static class that enumerates directories in a folder, then searches each file in the folder (it seems to only work with text files even thought I don't explicitly specify that) for a given string and returns an IEnumerable that holds the results.

It takes about 15 seconds to go through 40 text files that are about 250kb in size and I think it could be faster. Could I use a better algorithm, or is there a better method of achieving this?

public static class LogFileReader
{
    public static IEnumerable GetLines(string path, string searchterm)
    {

        var dirs = Directory.EnumerateDirectories(path);
        List thelines = new List();

        foreach (var dir in dirs)
        {

            var files = Directory.EnumerateFiles(dir);

            foreach (var file in files)
            {
                using (StreamReader sr = new StreamReader(file))
                {
                    string line = string.Empty;
                    while ((line = sr.ReadLine()) != null)
                    {
                        if (line.IndexOf(searchterm, StringComparison.CurrentCultureIgnoreCase) >= 0)
                        {
                            thelines.Add(line);
                        }
                    }

                }
            }

        }

        return thelines;
    }
}

Solution

Currently your method does two things. It searches the directory structure and anylizes the files at the same time. This should be separated so that you can maintain each feature separately without affecting the other. For example should you want to search the directories recursively you now only need to change the GetFileNames method without thinking about reading the files.

public static IEnumerable FindLines(this IEnumerable fileNames, Func predicate)
{
    return fileNames.Select(fileName =>
    {
        using (var sr = new StreamReader(fileName))
        {
            var line = string.Empty;
            while ((line = sr.ReadLine()) != null)
            {
                if (predicate(line))
                {
                    return line;
                }
            }
        }
        return null;
    })
    .Where(line => !string.IsNullOrEmpty(line));
}

public static IEnumerable GetFileNames(this string path)
{
    return 
        Directory.EnumerateDirectories(path)
        .SelectMany(Directory.EnumerateFiles);
}


If you want you can make it parallel later with:

var results = 
    @"c:\foo".GetFileNames()
    .FindLines(line => line.IndexOf("bar", StringComparison.CurrentCultureIgnoreCase) >= 0)
    .AsParallel()
    .ToList();

Code Snippets

public static IEnumerable<string> FindLines(this IEnumerable<string> fileNames, Func<string, bool> predicate)
{
    return fileNames.Select(fileName =>
    {
        using (var sr = new StreamReader(fileName))
        {
            var line = string.Empty;
            while ((line = sr.ReadLine()) != null)
            {
                if (predicate(line))
                {
                    return line;
                }
            }
        }
        return null;
    })
    .Where(line => !string.IsNullOrEmpty(line));
}

public static IEnumerable<string> GetFileNames(this string path)
{
    return 
        Directory.EnumerateDirectories(path)
        .SelectMany(Directory.EnumerateFiles);
}
var results = 
    @"c:\foo".GetFileNames()
    .FindLines(line => line.IndexOf("bar", StringComparison.CurrentCultureIgnoreCase) >= 0)
    .AsParallel()
    .ToList();

Context

StackExchange Code Review Q#136159, answer score: 4

Revisions (0)

No revisions yet.