patternMinor
Remove unwanted characters from specified column in all Worksheets
Viewed 0 times
unwantedcolumnallremoveworksheetscharactersfromspecified
Problem
I need to remove some unwanted characters from a given column in all Worksheets from my Workbook. The code below works perfectly, but I'd like to know if there is a better/cleaner/faster way of doing this.
What exactly am I doing?
I am storing all unwanted/forbidden characters in an Array, after that I'm walking through all worksheets and replacing those characters with ""(empty) in each cell in column "J". In this case it will Always be in column J, but for maintainability (in case the template changes in future), I'm declaring it as a constant.
Why do I need to do this?
The data in this Workbook is uploaded to ALM, which does not allow any of those characters in the "Test Subject" field, hence I'm removing them. I chose this approach because I think it is easier, but if you have a better way of doing this (e.g. preventing the user from typing them in the first place, or any other approach that doesn't even require coding) it will be very welcome.
Code:
What exactly am I doing?
I am storing all unwanted/forbidden characters in an Array, after that I'm walking through all worksheets and replacing those characters with ""(empty) in each cell in column "J". In this case it will Always be in column J, but for maintainability (in case the template changes in future), I'm declaring it as a constant.
Why do I need to do this?
The data in this Workbook is uploaded to ALM, which does not allow any of those characters in the "Test Subject" field, hence I'm removing them. I chose this approach because I think it is easier, but if you have a better way of doing this (e.g. preventing the user from typing them in the first place, or any other approach that doesn't even require coding) it will be very welcome.
Code:
Sub replaceAll()
Const FIRST_ROW As Integer = 2, TEST_COLUMN As String = "J"
Dim arrForbidden() As String: arrForbidden = Split("\,/,:,%,',*,?,,|,"",.", ",")
Dim ws As Worksheet
Dim lastRow As Long
Dim rngCells As Range
Dim cell, character
For Each ws In ThisWorkbook.Worksheets
lastRow = ws.Cells(ws.Rows.Count, TEST_COLUMN).End(xlUp).Row
Set rngCells = ws.Range(TEST_COLUMN & FIRST_ROW).Resize(lastRow - FIRST_ROW + 1, 1)
For Each cell In rngCells
For Each character In arrForbidden
cell.Value = Replace(cell.Value, character, "")
Next
Next
Next
End SubSolution
First of all, let me note that your code looks good already. There are only a few things I would change:
-
It might be a good idea to turn the (Public) Sub into a Private Sub parameterized by
-
The name of the sub and its parameters could be improved by replacing them with something more expressive, like
-
I am personally not a fan of declaring multiple variables on one line or initializing on the same line because it often hurts readability. However, here it does not really hurt.
Now, I would like to present a number of different approaches to your problem. Some of these can even be combined for better results.
Using an Array to Query and Write Data
Instead of directly iterating over the cells in the collumn and updating every cell, you can read in the entire range into a two-dimensional (Variant) array using the
Then, you can iterate over the array and finally paste it back by assigning to the
This approach is faster then updating each cell because accessing the cells in a worksheet is rather slow.
Using Regular Expressions
Instead of looping over your array, you could use a regular expression. The VBA support for these is in the library 'Microsoft VBScript Regular Expressions 5.5'. An extensive description on how to use them can be found in the answer to this question on Stack Overflow.
Basically, what you would do is set up the regex matcher with a pattern string—in your case,
I do not know whether this is really much faster than the simple loop in your case. That pretty much depends on how well optimized the regex
Using User-Defined Functions
Instead of using a sub, you could write a function just for replacing all forbidden characters in a cell. Then, you could use this in some new column in the worksheets to automatically produce the cleaned-up version of your column in that new column. However, this would probably require you to actually reformat the worksheets before uploading to ALM (assuming ALM requires a fixed sheet layout for the import).
Personally, I do not really like this approach as it pollutes the worksheets with helper columns. However, if the need arises to remove the forbidden characters in more places, writing such a function might be the way to go.
Using Excel Functions
If you do not want to use VBA at all, you can modify the previous approach to use a suitable Excel workbook function. The base function for this approach is
A simple approach would be to nest calls to
There might be a nicer way to reduce the nested formulas into one array formula, but I personally don't know how to do it.
-
It might be a good idea to turn the (Public) Sub into a Private Sub parameterized by
FIRST_ROW, TEST_COLUMN, and arrForbidden, which then is called by a new Public Sub passing in your special values. This would separate the configuration from the actual algorithm.-
The name of the sub and its parameters could be improved by replacing them with something more expressive, like
RemoveAllForbiddenCharacters instead of ReplaceAll, and removalColumn instead of TEST_COLUMN. (Well, the last one is not very good, but nothing better came to my mind immediately.)-
I am personally not a fan of declaring multiple variables on one line or initializing on the same line because it often hurts readability. However, here it does not really hurt.
Now, I would like to present a number of different approaches to your problem. Some of these can even be combined for better results.
Using an Array to Query and Write Data
Instead of directly iterating over the cells in the collumn and updating every cell, you can read in the entire range into a two-dimensional (Variant) array using the
Value property of the range, or even better the Value2 property, which is faster and usually safer. (See this blog post.) Then, you can iterate over the array and finally paste it back by assigning to the
Value property of the range.This approach is faster then updating each cell because accessing the cells in a worksheet is rather slow.
Using Regular Expressions
Instead of looping over your array, you could use a regular expression. The VBA support for these is in the library 'Microsoft VBScript Regular Expressions 5.5'. An extensive description on how to use them can be found in the answer to this question on Stack Overflow.
Basically, what you would do is set up the regex matcher with a pattern string—in your case,
"[\\/:%'*?<>|"".]". Next, you would use the Replace function of the regex object to replace all occurrences of the forbidden characters in one go.I do not know whether this is really much faster than the simple loop in your case. That pretty much depends on how well optimized the regex
Replace procedure is. But either way, it is a little more expressive and you could reuse the code to do all sorts of more elaborate substitutions.Using User-Defined Functions
Instead of using a sub, you could write a function just for replacing all forbidden characters in a cell. Then, you could use this in some new column in the worksheets to automatically produce the cleaned-up version of your column in that new column. However, this would probably require you to actually reformat the worksheets before uploading to ALM (assuming ALM requires a fixed sheet layout for the import).
Personally, I do not really like this approach as it pollutes the worksheets with helper columns. However, if the need arises to remove the forbidden characters in more places, writing such a function might be the way to go.
Using Excel Functions
If you do not want to use VBA at all, you can modify the previous approach to use a suitable Excel workbook function. The base function for this approach is
SUBSTITUTE, which replaces occurrences of one string in another by a specified value. A simple approach would be to nest calls to
SUBSTITUTE, one for each character to replace. However, this very easily becomes completely unreadable. Moreover, there is an upper limit on how long a formula is allowed to be. (OK, this Microsoft page says the limit is now over 8000 characters, so this should not really be a problem.)There might be a nicer way to reduce the nested formulas into one array formula, but I personally don't know how to do it.
Context
StackExchange Code Review Q#152256, answer score: 4
Revisions (0)
No revisions yet.