HiveBrain v1.2.0
Get Started
← Back to all entries
principlecsharpMinor

Console app to compare all directory names for similarity

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
directoryallappnamessimilaritycompareforconsole

Problem

I just got the urge to write a small console app to compare all directories name for similarity. I have > 3000 directories and over time some of them are really similar, eg. an update: Test Case ver 1 vs. Test Case ver 2.
Well everything is working but it is really slow, it is probably faster for me to sort the directories by name and go through them manually...
The code is 200 lines. I understand that this is a lot more than usual but I could not find something about that in the help section and as mentioned a lot it should be completed so here goes:

```
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;

namespace Similarity
{
///
/// Credit http://www.dotnetperls.com/levenshtein
/// Contains approximate string matching
///
static class LevenshteinDistance
{
///
/// Compute the distance between two strings.
///
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];

// Step 1
if (n == 0)
{
return m;
}

if (m == 0)
{
return n;
}

// Step 2
for (int i = 0; i _blackList = new List();

public List blackList
{
get
{
return this._blackList;
}
}

public void AddBlackListEntry(string line)
{
blackList.Add(line);
}
#endregion

static void Main(string[] args)
{
var directories = Directory.EnumerateDirectories(Directory.GetCurrentDirectory(), "*", SearchOption.TopDirectoryOnly)
.Select(x => new DirectoryInfo(x).Name).OrderBy(y => new DirectoryInfo(y).Name).ToList();

Solution

In this code,
it would be better to move the initialization of int[,] d further down,
after you check n and m.

int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];

        // Step 1
        if (n == 0)
        {
            return m;
        }

        if (m == 0)
        {
            return n;
        }


This loop doesn't loop, so it shouldn't be a loop:

foreach (var item in p.blackList)
{
    if(name == item)
    {
        return true;
    }
    else
    {
        return false;
    }
}
return false; // will not be reached


Also, when you have code like if (cond) return true; else return false;,
then you really should write simply return cond instead, so in this example return name == item;.

Finally, the comment says // will not be reached,
but that's not true: it will be reached when b.blacklist is empty.

Code Snippets

int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];

        // Step 1
        if (n == 0)
        {
            return m;
        }

        if (m == 0)
        {
            return n;
        }
foreach (var item in p.blackList)
{
    if(name == item)
    {
        return true;
    }
    else
    {
        return false;
    }
}
return false; // will not be reached

Context

StackExchange Code Review Q#88583, answer score: 5

Revisions (0)

No revisions yet.