patternMinor
Improving performance of VBA data validation script in Access with large data size
Viewed 0 times
scriptwithsizevalidationlargeperformanceaccessvbadataimproving
Problem
I work in IT compliance and one of my main duties is ensuring the integrity of employee network accounts. Partial VBA script as currently used is shown below:
Given that the size of the database is around 50K of records, the above code works but is slow. There are several more validation functions such as for social security numbers, email, and user account control (disabled or enabled) depending on multiple criteria. Presently, the source data report is run bimonthly from which I import the results into Microsoft Access for auditing.
Is there a more efficient manner in which the validation can be done?
Public Function validateHireDate(hireDate as String)
If Not IsDate(hireDate) Or IsNull(hireDate) Then
validateHireDate = "Invalid hire date"
Else
validateHireDate = "Hire date reviewed"
End if
End Function
Public Function validateEmployeeManager(employeeStatus as String, employeeManager as String)
If employeeStatus <> "Active" Then
Exit Function
Else
If IsNull(employeeManager) Then
validateEmployeeManager = "Invalid employee manager"
Else
validateEmployeeManager = "Employee manager reviewed"
End If
End If
End FunctionGiven that the size of the database is around 50K of records, the above code works but is slow. There are several more validation functions such as for social security numbers, email, and user account control (disabled or enabled) depending on multiple criteria. Presently, the source data report is run bimonthly from which I import the results into Microsoft Access for auditing.
Is there a more efficient manner in which the validation can be done?
Solution
There are a couple of micro-optimizations to be made. They probably won't give you the boost you're looking for, but it's worth trying.
First, you can explicitly define the return type. When you don't define one, the compiler defines it as a
For clarity, this
Is equivalent to this
And really should be
But I find it a bit odd that a validation routine is returning a string. I would consider returning a boolean instead. It has a smaller memory footprint than a string and is, in theory, faster than a string.
The second thing you can do is manually short circuit the logic, but you'll have to sacrifice some DRYness to do it.
VBA does't have short circuiting logical operations, so no matter what the return value of
Now, the order here maybe important, and I might not have it right. First, verify that
I like your
Finally, I would just encourage you to think about whether you need to do this in VBA at all. Access is a SQL based program and, like any SQL, it's better to use set based operations than procedure based ones. I would argue that it's perhaps even more important with Access. You've already noticed how slow the COM based VBA interaction is compared to a regular query, so why not move this logic into a query, or a series of queries? You're not doing anything here that can't be accomplished with the
Some psuedo-SQL for your first function.
That's the faithful translation. I would find out if the requirements can be adjusted to only return the records that are in error. If the valid records are of no interest to the business, then something like this becomes possible.
Which returns only the invalid records.
First, you can explicitly define the return type. When you don't define one, the compiler defines it as a
Variant type, which incurs a lot of overheard compared to the "regular" data types. For clarity, this
Public Function validateHireDate(hireDate as String)Is equivalent to this
Public Function validateHireDate(hireDate as String) As VariantAnd really should be
Public Function validateHireDate(hireDate as String) As StringBut I find it a bit odd that a validation routine is returning a string. I would consider returning a boolean instead. It has a smaller memory footprint than a string and is, in theory, faster than a string.
The second thing you can do is manually short circuit the logic, but you'll have to sacrifice some DRYness to do it.
If Not IsDate(hireDate) Or IsNull(hireDate) Then
validateHireDate = "Invalid hire date"
Else
validateHireDate = "Hire date reviewed"
End ifVBA does't have short circuiting logical operations, so no matter what the return value of
IsDate(hireDate) is, IsNull(hireDate) executes every time as well. We can unroll the logic a bit to make it more efficient, but be sure to leave a comment explaining your actions to your future self. If you're anything like me, you'll instantly want to undo this optimization the next time you see it if there's nothing telling you not to. If IsNull(hireDate) Then
validateHireDate = "Invalid hire date"
Else If Not IsDate(hireDate) Then
validateHireDate = "Invalid hire date"
Else
validateHireDate = "Hire date reviewed"
End ifNow, the order here maybe important, and I might not have it right. First, verify that
IsDate() can handle a null value. If it can, then you need to study the data to determine which is the more common error state. You want the most common case to come first in order to optimize the logic. I like your
validateEmployeeManager function. You're evaluating the most likely situation first and exiting as fast as possible. You can explicitly declare the return type, but I see no further room for optimization. It's a well written function. Finally, I would just encourage you to think about whether you need to do this in VBA at all. Access is a SQL based program and, like any SQL, it's better to use set based operations than procedure based ones. I would argue that it's perhaps even more important with Access. You've already noticed how slow the COM based VBA interaction is compared to a regular query, so why not move this logic into a query, or a series of queries? You're not doing anything here that can't be accomplished with the
IIF and ISNULL functions. Even a faithful translation should be faster than even the optimized version of the VBA code. Although, I would also encourage you to investigate ways to do this truly set based instead. Some psuedo-SQL for your first function.
SELECT IIF(ISNULL(Emp.HireDate),"Invalid",IIF(ISDATE(Emp.HireDate,"Invalid","Reviewed"))
FROM EmployeesThat's the faithful translation. I would find out if the requirements can be adjusted to only return the records that are in error. If the valid records are of no interest to the business, then something like this becomes possible.
SELECT * --don't actually select all
FROM Employees
WHERE HireDate IS NULL
OR (NOT ISDATE(HireDate))Which returns only the invalid records.
Code Snippets
Public Function validateHireDate(hireDate as String)Public Function validateHireDate(hireDate as String) As VariantPublic Function validateHireDate(hireDate as String) As StringIf Not IsDate(hireDate) Or IsNull(hireDate) Then
validateHireDate = "Invalid hire date"
Else
validateHireDate = "Hire date reviewed"
End ifIf IsNull(hireDate) Then
validateHireDate = "Invalid hire date"
Else If Not IsDate(hireDate) Then
validateHireDate = "Invalid hire date"
Else
validateHireDate = "Hire date reviewed"
End ifContext
StackExchange Code Review Q#88605, answer score: 5
Revisions (0)
No revisions yet.