HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Join two collections by date with no duplicate

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withduplicatejoindatetwocollections

Problem

Data

I have 4 Collection objects of the following kind:

dates1=(12/01/02, 15/01/03, 17/01/03, 25/04/04...)
dates2=(15/02/02, 17/06/07, 10/05/10, ...)
numbers1 = (3,5,2,7,...)
numbers2 = (1,-3,-1,...)


The rules of the game are:

-
dates1 will always have the same length of numbers1, dates2 will always have the same length of numbers, but dates1 is not necessarily as long as dates2.

-
The dates into the two dates collections are always growing within the same collection, but not necessarily the first date of dates1 will be before the first date of dates2.

Problem in words

I would like to create a "merged" collection of these two. In particular,

-
The datesTot collection will have to contain only one date for each record. So, for example, if both "dates1" and "dates2" contain the 14/01/13, then this date will appear in the collection only once.

-
The numbersTot collection will have to contain the sum of the two "numbers" collection in case the date is a duplicate, otherwise only the single value.

For example:

numbers1 = (1,3,4)
numbers2 = (4,7,12)
dates1 = (13/01/14, 14/01/15, 16/01/17)
dates2 = (12/01/14, 14/01/15, 18/01/17)


will result in the following:

datesTot = (12/01/14, 13/01/14, 14/01/15, 16/01/17, 18/01/17)
numbersTot = (4, 1, 10, 4, 12) '<-- where 10 is the sum of 7+3, because they both are on 14/01/15


What I have tried so far

I have tried to stretch the human logic described above in a code:

Determining with which date the joint collection should start:

Do While (dates1Index  dates2(dates2Index) Then
    datesTot.Add dates2(dates2Index)
    numbersTot.Add numbers2(dates2Index)
    dates2Index = dates2Index+1
Else
    datesTot.Add dates1(dates1Index)
    numbersTot.Add numbers1(dates1Index) + numbers2(dates2Index)
    dates1Index = dates1Index + 1 
    dates2Index = dates2Index + 1
End If  
Loop


The idea, hence, is to keep on looping through the elements like this with the separate in

Solution

You've got a pretty big bug here. You don't actually end up with unique items in your final list. In fact, with the sample data you gave, 1/15/14 ends up in the final output twice. You should be entering that Else block, but you're not. This is because you're not comparing what's already stored in datesTot to that value in dates2, you're comparing it to what is at the current index of dates1. (By the way, don't shorten your variable names like that. datesTotal or dateTotals would be a much better name).

I think this stems from the fact that you don't loop all the way through the first collection, before you shift the index on the second. Your attempt at optimizing the performance of the code has led to a bug. A wise dev once told me:



  • Make it work.



  • Make it right.



  • Then make it fast.




The second problem I see is your data structure. You're keeping related data in two separate, but parallel collections. That's a recipe for disaster. Why? Because you can't overwrite an item in a collection. You have to add the replacement to the collection and then remove the original. That leaves a ton of room to make a mistake.

A correct algorithm requires three loops.

For Each outerItem in outerCollecton

    For Each innerItem in innerCollection

        If outerItem = innerItem Then

            For Each resultItem In results
                If innerItem = resultItem then
                    resultItem.Count = resultItem.Count + innerItem.Count
                    Exit For
                Else
                    results.Add innerItem
                End If
            Next resultItem

        End If

    Next innerItem

Next outerItem


Implementing that in a clean way would be easier said than done though, because you can't overwrite a value in a collection. That means you will need to keep single collection of objects that represent your data structure. For simplicity's sake, you could create a class module as simple as this.

Option Explicit

Public DateValue As Date
Public Count As Long


Alternatively, assuming your original data that you want to merge has only one instance of each date, then a Scripting.Dictionary could be a great data structure to use. See my example below.

Public Sub test()

    Dim dates1 As New Scripting.Dictionary
    dates1.Add #1/14/2013#, 1
    dates1.Add #1/15/2014#, 3
    dates1.Add #1/17/2016#, 4

    Dim dates2 As New Scripting.Dictionary
    dates2.Add #12/1/2014#, 4
    dates2.Add #1/15/2014#, 7
    dates2.Add #1/17/2018#, 12

    ' first copy the first dict to a new one to return
    Dim results As New Dictionary

    Dim currentKey

    For Each currentKey In dates1.Keys
        results.Add currentKey, dates1(currentKey)
    Next

    ' second dict
    For Each currentKey In dates2.Keys

        If results.Exists(currentKey) Then
            results(currentKey) = results(currentKey) + dates2(currentKey)
        Else
            results.Add currentKey, dates2(currentKey)
        End If

    Next

    ' print the results
    For Each currentKey In results.Keys
        Debug.Print "Date: " & currentKey & vbTab & "Count: " & results(currentKey)
    Next

End Sub


Output:

Date: 1/14/2013 Count: 1
Date: 1/15/2014 Count: 10
Date: 1/17/2016 Count: 4
Date: 12/1/2014 Count: 4
Date: 1/17/2018 Count: 12

Code Snippets

For Each outerItem in outerCollecton

    For Each innerItem in innerCollection

        If outerItem = innerItem Then

            For Each resultItem In results
                If innerItem = resultItem then
                    resultItem.Count = resultItem.Count + innerItem.Count
                    Exit For
                Else
                    results.Add innerItem
                End If
            Next resultItem

        End If

    Next innerItem

Next outerItem
Option Explicit

Public DateValue As Date
Public Count As Long
Public Sub test()

    Dim dates1 As New Scripting.Dictionary
    dates1.Add #1/14/2013#, 1
    dates1.Add #1/15/2014#, 3
    dates1.Add #1/17/2016#, 4

    Dim dates2 As New Scripting.Dictionary
    dates2.Add #12/1/2014#, 4
    dates2.Add #1/15/2014#, 7
    dates2.Add #1/17/2018#, 12

    ' first copy the first dict to a new one to return
    Dim results As New Dictionary

    Dim currentKey

    For Each currentKey In dates1.Keys
        results.Add currentKey, dates1(currentKey)
    Next

    ' second dict
    For Each currentKey In dates2.Keys

        If results.Exists(currentKey) Then
            results(currentKey) = results(currentKey) + dates2(currentKey)
        Else
            results.Add currentKey, dates2(currentKey)
        End If

    Next


    ' print the results
    For Each currentKey In results.Keys
        Debug.Print "Date: " & currentKey & vbTab & "Count: " & results(currentKey)
    Next

End Sub
Date: 1/14/2013 Count: 1
Date: 1/15/2014 Count: 10
Date: 1/17/2016 Count: 4
Date: 12/1/2014 Count: 4
Date: 1/17/2018 Count: 12

Context

StackExchange Code Review Q#74670, answer score: 5

Revisions (0)

No revisions yet.