patternMinor
Joining collections... because it's fun
Viewed 0 times
becausefuncollectionsjoining
Problem
I'm working on a
Join
Correlates the elements of two sequences based on matching keys.
I've been told this might be \$O(log(n))\$, but since
LinqEnumerable class (reviewable here), and I'm no Big-O expert and I'm pretty bad with algorithms, so I wonder what the complexity is for what I've implemented here, and whether it could be improved.Join
Correlates the elements of two sequences based on matching keys.
Private encapsulated As New Collection
Option ExplicitPublic Function Join(ByVal inner As LinqEnumerable, _
ByVal outerKeySelector As Delegate, _
ByVal innerKeySelector As Delegate, _
ByVal resultSelector As Delegate) As LinqEnumerable
Dim result As New LinqEnumerable
Dim outerValue As Variant
Dim outerKey As Variant
Dim innerValue As Variant
Dim innerKey As Variant
Dim innerList As System.List
Set innerList = inner.ToList
For Each outerValue In encapsulated
outerKey = outerKeySelector.Execute(outerValue)
For Each innerValue In innerList
innerKey = innerKeySelector.Execute(innerValue)
If outerKey = innerKey Then
result.Add resultSelector.Execute(outerValue, innerValue)
innerList.Remove innerValue
End If
Next
Next
Set Join = result
End FunctionI've been told this might be \$O(log(n))\$, but since
innerList.Remove is \$O(n)\$ I'm not sure. Could it be?Solution
The complexity here is much more than \$O(\log{n})\$. Let's go though it...
There's one \$O(o)\$ loop, where \$o\$ is the size of the
Now, for each outer loop, we then loop through all the members of
Inside that inner loop you then remove members of the
The resulting complexity that I can see here is.... \$O(o \times i^2)\$ which means, if the inner and outer data sets are about the same size (worst case), you have complexity of \$O(n^3)\$ where \$n\$ is the size of each list.
Now, that is tempered somewhat by the fact that you only do the inner remove if it actually matches something, and when it does, it makes the inner data list smaller. I don't believe this affects the complexity all that much though.
If you are looking for performance improvements, there are a few I can see.
First up, convert all the inner-member values to the key outside the outer loop and store them in a dictionary:
Then, this will save you having to recalculate the keys many times, and it also converts the lookup, and remove both to O(1) operations.
I believe, if you do this you end up with a \$O(n)\$ operation where \$n\$ is the combined size of
There's one \$O(o)\$ loop, where \$o\$ is the size of the
outer List.Now, for each outer loop, we then loop through all the members of
innerList. If that has size \$i\$, the combined complexity is now \$O(o \times i)\$.Inside that inner loop you then remove members of the
innerList. The remove operation is horribly expensive too. First, it starts at the beginning of the list, and searches for the item to remove. When it finds it, it then goes through each remaining item in the list, and shuffles them forward one position, to 'remove' the item.The resulting complexity that I can see here is.... \$O(o \times i^2)\$ which means, if the inner and outer data sets are about the same size (worst case), you have complexity of \$O(n^3)\$ where \$n\$ is the size of each list.
Now, that is tempered somewhat by the fact that you only do the inner remove if it actually matches something, and when it does, it makes the inner data list smaller. I don't believe this affects the complexity all that much though.
If you are looking for performance improvements, there are a few I can see.
First up, convert all the inner-member values to the key outside the outer loop and store them in a dictionary:
Dim innerKeyDict As Scripting.Dictionary
....
For Each innerValue In innerList
innerKey = innerKeySelector.Execute(innerValue)
innerKeyDict.Add innerKey, innerValue
NextThen, this will save you having to recalculate the keys many times, and it also converts the lookup, and remove both to O(1) operations.
I believe, if you do this you end up with a \$O(n)\$ operation where \$n\$ is the combined size of
inner and outer.Code Snippets
Dim innerKeyDict As Scripting.Dictionary
....
For Each innerValue In innerList
innerKey = innerKeySelector.Execute(innerValue)
innerKeyDict.Add innerKey, innerValue
NextContext
StackExchange Code Review Q#66784, answer score: 5
Revisions (0)
No revisions yet.