patternMinor

Performance of using objects with scripting dictionary

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

scriptingobjectswithusingperformancedictionary

Problem

The code below is a working excerpt from a larger application that I am re-factoring. My question focuses on the performance aspect of using (or not using) objects in this case.

I have a medium-ish size data set on a worksheet (6500 rows by 12 columns) and am building an application to merge (mostly) new data records with the existing data.

My overall approach is to:

Ingest the new data into a single object. The new dataset is held internally in both an array and a dictionary.

Ingest the existing data into a single object. The existing data is held internally in both an array and a dictionary.

Determine if a record from the new dataset exists in the existing dataset by creating a "key" and using it to search the dictionary. This unique key is created by concatenating the first four fields of a data record.

If the data record does not exist in the existing dataset, add it to the existing data object (to both the dictionary and the array).

So far, so good. I have working code that accomplishes these steps, though without the architecture of classes/objects and good functional design.

My question concerns performance issues I'm seeing in Step 2. (A fully functional example is listed at the bottom of this post.)

I'm seeing wildly different execution performance times when using objects and the dictionary as opposed to when I side-step the objects completely. Searches online for questions related to object performance turned up this reference, but nothing directly related to my experience.

Giving you the answers up front, my performance is measured as:

Array Used is (1000 by 12)
set up array                                = 7.046 ms
populate dict with    objects               = 4775.396 ms
populate dict without objects (store key)   = 11.222 ms
populate dict without objects (store array) = 7502.135 ms

Clearly, when I'm using my full dataset at 6500+ rows, my execution time grows to an unacceptable level.

I would appreciate comments and feedback on

Solution

As far as the relative performance goes:

When you instantiate an object and add it to the Dictionary in DictWithObjects, the only thing that actually gets stored in the Dictionary is a reference to the object. This is basically just a pointer. So in your "populate dict with objects" case, you are making a copy of the object reference and storing that. In this case, you are adding the overhead of instantiating the object and initializing it. However, this case also does something that none of the other test cases do - it uses Application.WorksheetFunction.Index to slice the input array. This alone invalidates your benchmarks because none of the other subs do this.

In your "populate dict without objects (store array)" case, you are actually cramming a significant amount of data into the Dictionary. Not only that, but in this line...

thisDict.Add key, arr          'the Item is the array slice

...you are not only storing the entire array in the Dictionary, you're making a copy of the entire array and storing that in the Dictionary. I would guess that a very large part of the poor performance of DictWithoutObjectsArr is due solely to this bug in the benchmarking code. That said, the missing call to Application.WorksheetFunction.Index(arr, i, 0) makes the performance of this method even worse - when I add the array slicing, it seems like the large memory copies are actually cheaper from a performance standpoint than an array slice and a much smaller memory copy.

In your final case, storing the unique row identifier as both the key and the value, you are simply storing a String. This case is sacrificing long term performance in exchange for not collecting the same information that the other cases do. The trade-off will come later in a finished product when it is time to get information back out. Obviously this isn't possible (since you only have a key-key combination), and you aren't benchmarking everything that is relevant to the overall performance. Assuming that you need to get the row information at some point, this benchmark should simply either be ignored or taken only as a proxy estimation of how long it takes to add a generic key value pair to a Dictionary. Otherwise, at very least the value should be the row - at least in that case you can get some useful information out of it later.

In short, the above make the benchmarking results almolst entirely meaningless.

A couple other things:

Your object shouldn't be exposing all of it's members as public class variables. These should be encapsulated into properties. As currently written, the class is little more than a heavy-weight Type with a initializer property (more on this below).

The objects should be responsible for calculating their own keys - not the calling code.

Your "constructor" seems really misplaced in the write-only property Record. Better would be to simply make it a Sub. Best would be to add the VB_PredeclaredId attribute and create an object factory. However, taking an array as a parameter seems suspect. If the class is intended to mirror what you would find on a row, a Range parameter would probably be clearer and more performant than slicing a single dimension out of a 2D array. If you're going to pull your data range into an array (which makes sense), it's probably much faster to just set the individual properties. If you aren't pulling the dataset into an array, just pass it a Range. There's nothing that says you can only have one class factory. In fact, in a comparison benchmark, setting properties directly cut the object creation overhead by %75 over slicing and passing an array using WorksheetFunction.Index.

The line duplicate = False in Class_Initialize raises red flags about the class design also. It's hard to tell what it is there for (due mainly to the stripped down version of the code submitted), but regardless, it probably shouldn't. An individual object shouldn't have any concept of being a "duplicate" outside of possibly an Equals(other as LaborRecord) utility function. If your class structure is designed properly, a LaborRecord shouldn't know or care about other LaborRecords.

Code Snippets

thisDict.Add key, arr          'the Item is the array slice

Context

StackExchange Code Review Q#141182, answer score: 4

Revisions (0)

No revisions yet.