patternMinor
Join two collections by date with no duplicate
Viewed 0 times
withduplicatejoindatetwocollections
Problem
Data
I have 4 Collection objects of the following kind:
The rules of the game are:
-
-
The dates into the two
Problem in words
I would like to create a "merged" collection of these two. In particular,
-
The
-
The
For example:
will result in the following:
What I have tried so far
I have tried to stretch the human logic described above in a code:
Determining with which date the joint collection should start:
The idea, hence, is to keep on looping through the elements like this with the separate in
I have 4 Collection objects of the following kind:
dates1=(12/01/02, 15/01/03, 17/01/03, 25/04/04...)
dates2=(15/02/02, 17/06/07, 10/05/10, ...)
numbers1 = (3,5,2,7,...)
numbers2 = (1,-3,-1,...)The rules of the game are:
-
dates1 will always have the same length of numbers1, dates2 will always have the same length of numbers, but dates1 is not necessarily as long as dates2.-
The dates into the two
dates collections are always growing within the same collection, but not necessarily the first date of dates1 will be before the first date of dates2.Problem in words
I would like to create a "merged" collection of these two. In particular,
-
The
datesTot collection will have to contain only one date for each record. So, for example, if both "dates1" and "dates2" contain the 14/01/13, then this date will appear in the collection only once.-
The
numbersTot collection will have to contain the sum of the two "numbers" collection in case the date is a duplicate, otherwise only the single value.For example:
numbers1 = (1,3,4)
numbers2 = (4,7,12)
dates1 = (13/01/14, 14/01/15, 16/01/17)
dates2 = (12/01/14, 14/01/15, 18/01/17)will result in the following:
datesTot = (12/01/14, 13/01/14, 14/01/15, 16/01/17, 18/01/17)
numbersTot = (4, 1, 10, 4, 12) '<-- where 10 is the sum of 7+3, because they both are on 14/01/15What I have tried so far
I have tried to stretch the human logic described above in a code:
Determining with which date the joint collection should start:
Do While (dates1Index dates2(dates2Index) Then
datesTot.Add dates2(dates2Index)
numbersTot.Add numbers2(dates2Index)
dates2Index = dates2Index+1
Else
datesTot.Add dates1(dates1Index)
numbersTot.Add numbers1(dates1Index) + numbers2(dates2Index)
dates1Index = dates1Index + 1
dates2Index = dates2Index + 1
End If
LoopThe idea, hence, is to keep on looping through the elements like this with the separate in
Solution
You've got a pretty big bug here. You don't actually end up with unique items in your final list. In fact, with the sample data you gave,
I think this stems from the fact that you don't loop all the way through the first collection, before you shift the index on the second. Your attempt at optimizing the performance of the code has led to a bug. A wise dev once told me:
The second problem I see is your data structure. You're keeping related data in two separate, but parallel collections. That's a recipe for disaster. Why? Because you can't overwrite an item in a collection. You have to add the replacement to the collection and then remove the original. That leaves a ton of room to make a mistake.
A correct algorithm requires three loops.
Implementing that in a clean way would be easier said than done though, because you can't overwrite a value in a collection. That means you will need to keep single collection of objects that represent your data structure. For simplicity's sake, you could create a class module as simple as this.
Alternatively, assuming your original data that you want to merge has only one instance of each date, then a
Output:
1/15/14 ends up in the final output twice. You should be entering that Else block, but you're not. This is because you're not comparing what's already stored in datesTot to that value in dates2, you're comparing it to what is at the current index of dates1. (By the way, don't shorten your variable names like that. datesTotal or dateTotals would be a much better name).I think this stems from the fact that you don't loop all the way through the first collection, before you shift the index on the second. Your attempt at optimizing the performance of the code has led to a bug. A wise dev once told me:
- Make it work.
- Make it right.
- Then make it fast.
The second problem I see is your data structure. You're keeping related data in two separate, but parallel collections. That's a recipe for disaster. Why? Because you can't overwrite an item in a collection. You have to add the replacement to the collection and then remove the original. That leaves a ton of room to make a mistake.
A correct algorithm requires three loops.
For Each outerItem in outerCollecton
For Each innerItem in innerCollection
If outerItem = innerItem Then
For Each resultItem In results
If innerItem = resultItem then
resultItem.Count = resultItem.Count + innerItem.Count
Exit For
Else
results.Add innerItem
End If
Next resultItem
End If
Next innerItem
Next outerItemImplementing that in a clean way would be easier said than done though, because you can't overwrite a value in a collection. That means you will need to keep single collection of objects that represent your data structure. For simplicity's sake, you could create a class module as simple as this.
Option Explicit
Public DateValue As Date
Public Count As LongAlternatively, assuming your original data that you want to merge has only one instance of each date, then a
Scripting.Dictionary could be a great data structure to use. See my example below.Public Sub test()
Dim dates1 As New Scripting.Dictionary
dates1.Add #1/14/2013#, 1
dates1.Add #1/15/2014#, 3
dates1.Add #1/17/2016#, 4
Dim dates2 As New Scripting.Dictionary
dates2.Add #12/1/2014#, 4
dates2.Add #1/15/2014#, 7
dates2.Add #1/17/2018#, 12
' first copy the first dict to a new one to return
Dim results As New Dictionary
Dim currentKey
For Each currentKey In dates1.Keys
results.Add currentKey, dates1(currentKey)
Next
' second dict
For Each currentKey In dates2.Keys
If results.Exists(currentKey) Then
results(currentKey) = results(currentKey) + dates2(currentKey)
Else
results.Add currentKey, dates2(currentKey)
End If
Next
' print the results
For Each currentKey In results.Keys
Debug.Print "Date: " & currentKey & vbTab & "Count: " & results(currentKey)
Next
End SubOutput:
Date: 1/14/2013 Count: 1
Date: 1/15/2014 Count: 10
Date: 1/17/2016 Count: 4
Date: 12/1/2014 Count: 4
Date: 1/17/2018 Count: 12Code Snippets
For Each outerItem in outerCollecton
For Each innerItem in innerCollection
If outerItem = innerItem Then
For Each resultItem In results
If innerItem = resultItem then
resultItem.Count = resultItem.Count + innerItem.Count
Exit For
Else
results.Add innerItem
End If
Next resultItem
End If
Next innerItem
Next outerItemOption Explicit
Public DateValue As Date
Public Count As LongPublic Sub test()
Dim dates1 As New Scripting.Dictionary
dates1.Add #1/14/2013#, 1
dates1.Add #1/15/2014#, 3
dates1.Add #1/17/2016#, 4
Dim dates2 As New Scripting.Dictionary
dates2.Add #12/1/2014#, 4
dates2.Add #1/15/2014#, 7
dates2.Add #1/17/2018#, 12
' first copy the first dict to a new one to return
Dim results As New Dictionary
Dim currentKey
For Each currentKey In dates1.Keys
results.Add currentKey, dates1(currentKey)
Next
' second dict
For Each currentKey In dates2.Keys
If results.Exists(currentKey) Then
results(currentKey) = results(currentKey) + dates2(currentKey)
Else
results.Add currentKey, dates2(currentKey)
End If
Next
' print the results
For Each currentKey In results.Keys
Debug.Print "Date: " & currentKey & vbTab & "Count: " & results(currentKey)
Next
End SubDate: 1/14/2013 Count: 1
Date: 1/15/2014 Count: 10
Date: 1/17/2016 Count: 4
Date: 12/1/2014 Count: 4
Date: 1/17/2018 Count: 12Context
StackExchange Code Review Q#74670, answer score: 5
Revisions (0)
No revisions yet.