HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Count duplicate List<int>

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countlistduplicateint

Problem

Count duplicate List in List.

A duplicate is not order-dependent. { 1, 2, 3 } is a duplicate of { 2, 1, 3 }

Looking for speed without getting silly.

public static void CountListTest()
{
    List> testData = new List>
    {
            new List { 7, 3, 4 },
            new List { 1, 2, 3 },
            new List { 2, 1, 3 },
            new List { 6, 8, 3 },
            new List { 9, 2, 4 },
            new List { 6, 3, 8 },
            new List { 7, 3, 4 },
    };
    List> countList = CountList(testData);
    Debug.WriteLine("");
    testData = new List>
    {
            new List { 1, 2, 3 },
            new List { 2, 1, 3 },
            new List { 6, 8, 3 },
            new List { 2, 1, 3, 9 },
            new List { 6, 8, 3, 9 },
            new List { 2, 1, 3, 9 },
            new List { 9, 2, 4 },
            new List { 6, 3, 8 },
            new List { 7, 3, 4 },
    };
    countList = CountList(testData);
}
public static List> CountList(List> input)
{
    List> returnListList = new List>();
    foreach (List list in input)
        list.Sort();
    HashSet alreadyMatched = new HashSet();
    int count = 0;
    bool match = true;
    int lastEval = 0;
    for (int i = 0; i  0)
    {
        Debug.WriteLine("Count = {0}  List {1}", 1, string.Join(", ", input[lastEval]));
        returnListList.Add(input[lastEval]);
    }
    return returnListList;
}

Solution

The code itself is not so bad although it could use more {} and maybe more precise names then lastEval or match. Of course it shouldn't print to the console but return a usable result.

I can't tell how fast/slow it would be if it returned a result that can be printed outside the method. I compared it with another search algorithm and for 1.000.000 iterations mine is ~20ms faster.

I think the method could do better by returning a dictionary with the results.

Currently the original method runs twice over the item list. First by sorting them and then by searching for duplicates.

This could be improved by using a helper array to mark which item is already sorted and skip subsequent sortings thus run over the items only once.

The last optimization is to skip items that already have been checked. I think you are doing this with the alreadyMatched helper but I'm not sure about it.

Making the method generic and require T to be IComparable makes it possbile to reuse it for other collections with comparable items.

The value comparison takes place in a local function sequenceEqualFast.

Similarly it sorts the lists and keeps track of the sorting in the local sortList function.

public static Dictionary, int> FindDuplicatesFast(
    List> values
) where T : IComparable
{
    var isDuplicate = new bool[values.Count];
    var isSorted = new bool[values.Count];

    // Key: first of duplicates
    // Value: count of duplicates
    var results = new Dictionary, int>();

    var sequenceEqualFast = new Func, List, bool>((l1, l2) =>
    {
        if (l1.Count != l2.Count) { return false; }
        for (int k = 0; k (i =>
    {
        if (!isSorted[i])
        {
            values[i].Sort();
            isSorted[i] = true;
        }
    });

    for (int i = 0; i < values.Count - 1; i++)
    {
        if (isDuplicate[i]) { continue; }
        sortList(i);

        results[values[i]] = 1;
        isDuplicate[i] = true;

        for (int j = i + 1; j < values.Count; j++)
        {
            if (isDuplicate[j]) { continue; }
            sortList(j);

            if (sequenceEqualFast(values[i], values[j]))
            {
                isDuplicate[j] = true;
                results[values[i]] += 1;
            }
        }
    }
    return results;
}


I have run some more tests and these are the results.

Test collection

var rnd = new Random();

var itemCount = 100;
var testData = Enumerable.Range(0, itemCount).Select(e => 
{
    var count = rnd.Next(3, 5);
    return Enumerable.Range(0, count).Select(en => rnd.Next(0, 10)).ToList();
}).ToList();

var testCount = 10000;


Results

@Paparazzi
00:00:03.2791287

@t3chb0t
00:00:01.6763969

Code Snippets

public static Dictionary<List<T>, int> FindDuplicatesFast<T>(
    List<List<T>> values
) where T : IComparable
{
    var isDuplicate = new bool[values.Count];
    var isSorted = new bool[values.Count];

    // Key: first of duplicates
    // Value: count of duplicates
    var results = new Dictionary<List<T>, int>();

    var sequenceEqualFast = new Func<List<T>, List<T>, bool>((l1, l2) =>
    {
        if (l1.Count != l2.Count) { return false; }
        for (int k = 0; k < l1.Count(); k++)
        {
            if (l1[k].CompareTo(l2[k]) != 0)
            {
                return false;
            }
        }
        return true;
    });

    var sortList = new Action<int>(i =>
    {
        if (!isSorted[i])
        {
            values[i].Sort();
            isSorted[i] = true;
        }
    });

    for (int i = 0; i < values.Count - 1; i++)
    {
        if (isDuplicate[i]) { continue; }
        sortList(i);

        results[values[i]] = 1;
        isDuplicate[i] = true;

        for (int j = i + 1; j < values.Count; j++)
        {
            if (isDuplicate[j]) { continue; }
            sortList(j);

            if (sequenceEqualFast(values[i], values[j]))
            {
                isDuplicate[j] = true;
                results[values[i]] += 1;
            }
        }
    }
    return results;
}
var rnd = new Random();

var itemCount = 100;
var testData = Enumerable.Range(0, itemCount).Select(e => 
{
    var count = rnd.Next(3, 5);
    return Enumerable.Range(0, count).Select(en => rnd.Next(0, 10)).ToList();
}).ToList();

var testCount = 10000;

Context

StackExchange Code Review Q#151606, answer score: 4

Revisions (0)

No revisions yet.