HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Output the average values of binned columns in Excel VBA

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thecolumnsexceloutputaveragebinnedvbavalues

Problem

This code is meant to run through a column of values, bin the values based on specified ranges, then output the average value of each bin. The problem is the code is running quite slowly (approximately 30 min for around 100000 values). I am definitely a beginner at coding and was hoping there was some way to speed this code along.

Sub BinValues()
'binns seperation distance values for the creation of variogram

Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual

Dim Cell As Object
Dim R1 As Range
Dim R2 As Range
Dim rng As Range
'define range before running
Set rng = Range("A1:A105570")
Dim K, n, L As Integer
'n is equal to the number of lags
'L is the lag size
n = 12
L = 600
For K = L To (n * L) Step 600
    For Each Cell In rng
    Dim min As Integer
    min = K - L
    'upper bound exclusive and lower bound inclusive
        If Cell.Value >= min And Cell.Value < K Then
            If R1 Is Nothing Then
                Set R1 = Range(Cell.Address)
            Else
                Set R1 = Union(R1, Range(Cell.Address))
            End If
            Cells((K / L), 5) = WorksheetFunction.Average(R1)     
        End If  

    Next
    Set R1 = Nothing
Next

Application.ScreenUpdating = True
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic

End Sub

Solution

Data belongs in an Array

A worksheet looks like a grid of data, but there's an enourmous amount of overhead sitting behind it. Every time you do anything to a spreadsheet, events fire, formulas calculate and a million other things happen behind the scenes.

Working with Ranges is computationally expensive, and you're doing it N105,5702 times.

Instead, what you want is an Array. An Array is just a grid of data laid out in memory. Because it is just data there are no overheads, and so you can read/write to it about a Million times faster.

You can create an Array by reading in a range, like so:

Dim dataRange As Range
Set dataRange = Range("A1:A105570")

Dim dataArray As Variant
dataArray = dataRange.Value


And now, the value in "A1" is in dataArray(1, 1), "A2" in dataArray(2, 1) etc.

Let's re-write your code to use an Array:

Option Explicit

Public Sub BinValues()
    'binns seperation distance values for the creation of variogram

    Application.ScreenUpdating = False
    Application.EnableEvents = False
    Application.Calculation = xlCalculationManual

    Dim dataRange As Range
    Set dataRange = Range("A1:A105570")

    Dim dataArray As Variant
    dataArray = dataRange.Value

    Const NUM_LAGS As Long = 12
    Const LAG_SIZE As Long = 600

    Dim minValue As Double
    Dim maxValue As Double
    Dim lagCounter As Long
    Dim ix As Long

    Dim elementValue As Double
    Dim elementSum As Double
    Dim numElements As Double
    Dim elementAverage As Double

    For lagCounter = 1 To NUM_LAGS

        minValue = (lagCounter - 1) * LAG_SIZE
        maxValue = (lagCounter * LAG_SIZE) - 1

        numElements = 0
        elementSum = 0
        For ix = LBound(dataArray, 1) To UBound(dataArray, 1)
            elementValue = dataArray(ix, 1)
            If elementValue >= minValue And elementValue <= maxValue Then
                numElements = numElements + 1
                elementSum = elementSum + elementValue
            End If
        Next ix

        elementAverage = elementSum / numElements
        Cells(lagCounter, 5) = elementAverage

    Next lagCounter

    Application.ScreenUpdating = True
    Application.EnableEvents = True
    Application.Calculation = xlCalculationAutomatic

End Sub


That alone should take your runtime from 1/2 an hour to a couple of seconds (if that).

Code Snippets

Dim dataRange As Range
Set dataRange = Range("A1:A105570")

Dim dataArray As Variant
dataArray = dataRange.Value
Option Explicit

Public Sub BinValues()
    'binns seperation distance values for the creation of variogram

    Application.ScreenUpdating = False
    Application.EnableEvents = False
    Application.Calculation = xlCalculationManual

    Dim dataRange As Range
    Set dataRange = Range("A1:A105570")

    Dim dataArray As Variant
    dataArray = dataRange.Value

    Const NUM_LAGS As Long = 12
    Const LAG_SIZE As Long = 600

    Dim minValue As Double
    Dim maxValue As Double
    Dim lagCounter As Long
    Dim ix As Long

    Dim elementValue As Double
    Dim elementSum As Double
    Dim numElements As Double
    Dim elementAverage As Double

    For lagCounter = 1 To NUM_LAGS

        minValue = (lagCounter - 1) * LAG_SIZE
        maxValue = (lagCounter * LAG_SIZE) - 1

        numElements = 0
        elementSum = 0
        For ix = LBound(dataArray, 1) To UBound(dataArray, 1)
            elementValue = dataArray(ix, 1)
            If elementValue >= minValue And elementValue <= maxValue Then
                numElements = numElements + 1
                elementSum = elementSum + elementValue
            End If
        Next ix

        elementAverage = elementSum / numElements
        Cells(lagCounter, 5) = elementAverage

    Next lagCounter

    Application.ScreenUpdating = True
    Application.EnableEvents = True
    Application.Calculation = xlCalculationAutomatic

End Sub

Context

StackExchange Code Review Q#136787, answer score: 5

Revisions (0)

No revisions yet.