patternMinor
Converting all cells to text
Viewed 0 times
cellsconvertingtextall
Problem
Microsoft Excel cells basically have two values: the formatted value that is shown to the user, and the actual value of the cell. These two values can be extremely different when the cell has certain formatting types such as date, datetime, currency, numbers with commas, etc.
I am writing a script that cleans up database data. It uses find, replace, RegEx, and other functions that act on strings of text. I need to be able to convert data into very specific formats so that my upload tool will accept the data. Therefore I need all of the cells in my Excel spreadsheets to be TEXT. I need the displayed values to match the actual values.
This is surprisingly hard to do in Excel. There is no paste special option that I could find that pastes what you see on your screen as text. "Paste Special -> Values" pastes, predictably, unformatted values. So if you try to "Paste Special -> Values" into cells formatted as TEXT, and they are coming from cells formatted as DATES, you might get something like this:
Old Cell: 4/7/2001
New Cell: 36988
If you paste normally, then the paste goes in as 4/7/2001, but in date format. So if I create a RegEx that looks for 4/7/2001 and changes it to 4/7/01, it will not work, because the underlying value is 36988.
Anyway, I discovered the
1) It does not work on cells with more than 255 characters. This is because
2) It is extremely slow. In my 1.5 million cell test sheet, code execution took 394 seconds.
I'm looking for ideas to optimize the code's speed, and to make the code work on all display value lengths.
```
Function ConvertAllCellsToText()
' Convert all cells from non-text to text
I am writing a script that cleans up database data. It uses find, replace, RegEx, and other functions that act on strings of text. I need to be able to convert data into very specific formats so that my upload tool will accept the data. Therefore I need all of the cells in my Excel spreadsheets to be TEXT. I need the displayed values to match the actual values.
This is surprisingly hard to do in Excel. There is no paste special option that I could find that pastes what you see on your screen as text. "Paste Special -> Values" pastes, predictably, unformatted values. So if you try to "Paste Special -> Values" into cells formatted as TEXT, and they are coming from cells formatted as DATES, you might get something like this:
Old Cell: 4/7/2001
New Cell: 36988
If you paste normally, then the paste goes in as 4/7/2001, but in date format. So if I create a RegEx that looks for 4/7/2001 and changes it to 4/7/01, it will not work, because the underlying value is 36988.
Anyway, I discovered the
Cell.Text property and wrote some code that converts all cells in a spreadsheet from formatted to text. But the code has two major problems:1) It does not work on cells with more than 255 characters. This is because
Cell.Text literally displays what the cell is actually showing to the user, and if the column is too narrow, it will display pound signs (########), and columns have a maximum width of 255 characters.2) It is extremely slow. In my 1.5 million cell test sheet, code execution took 394 seconds.
I'm looking for ideas to optimize the code's speed, and to make the code work on all display value lengths.
```
Function ConvertAllCellsToText()
' Convert all cells from non-text to text
Solution
This code has basically the same performance issue as your last question, and the performance solution is pretty much the same. Other things to note:
dead code that is commented out.
dimension my variables in VBA
really? over on SO.
declare it as
explicitly declaring your scope.
the
cell depending on the formatting. However, when you test the length
of the cell values, you're checking against the Value that you
already point out is not dependant on the formatting. You need to
test the Text against your width.
for proportional width fonts, the value referred to by
is based on the width of the character '0' (see the
documentation).
If you happen to be using a font where any character is wider than
'0' (for example 'X' in my browser's default font), you can't reliably use the
length of a string that it can display.
width of a text rendered cell to this value, and using it in
multiple places in your code to provide formatting information, make
it a constant and give it an appropriate name.
sort of indication to the user that something is actually happening
instead of a hung script. I'd remove it completely. If you call
This is not only more performant (it avoids a function call), but it
will also be clearer to the reader what is going on. Even better
would be to simply omit it, as redrawing the screen is what is
killing your in performance in the first place (see below).
are done instead of using the arbitrary value '20'. This upholds
the principle of least astonishment.
end of the
loop is not only redundant, but forces a
killing your performance. Basically, what is happening is that you are updating the display once for every cell in the
This is incredibly expensive. Turn
is more expensive in terms of performance than you might guess.
When you are working with large ranges of cells, you should pull
them into an array if at all possible, then write the entire array
back at once. This lowers your "Excel overhead" almost
quadratically.
dependant on what object is active) and the
(almost every operation can be performed with the
already have a reference to).
'.' operator that you see in your code causes an object deference
call. Wrapping sections where the same object is used repeatedly
in a
instead of repeatedly performing it.
the object is much more expensive than storing it. If you have to
reference the same value twice (looking at you,
Const MaxTextLength = 255
Private Sub ConvertAllCellsToText()
StartNewTask "Converting all cells to text"
Application.ScreenUpdating = False
Dim MyRange As Range
Set MyRange = ActiveSheet.UsedRange
Dim cache() As Long
cache = GetColumnWidths(MyRange)
With MyRange
.ColumnWidth = MaxTextLength
Dim Values() As Variant
ReDim Values(.Rows.Count, .Columns.Count)
Dim col As Long
Dim row As Long
For row = 0 To UBound(Values, 1)
For col = 0 To UBound(Values, 2)
Dim temp As String
temp = .Cells(row + 1, col + 1).Text
If Len(temp) <= MaxTextLength Then
Values(row, col) = temp
End If
Next col
Next row
.NumberFormat = "@"
End With
MyRange = Values
SetColumnWidths MyRange, cache
Application.ScreenUpdating = True
End Sub
Private Function GetColumnWidths(Target As Range) As Long()
Dim output() As Long
- This is more of a nag than anything, but you should really remove
dead code that is commented out.
- The variable
Cellis never declared. You should putOption Explicitat the top of all of your code modules. See Why should I
dimension my variables in VBA
really? over on SO.
- You don't return a value, so this should really be a
Sub, not a
Function. If you don't want it to appear in the macro list,declare it as
Private. I'd get into the habit of alwaysexplicitly declaring your scope.
- You have a bug. You explain pretty clearly in your question that
the
.Value of a cell can differ dramatically from the .Text of acell depending on the formatting. However, when you test the length
of the cell values, you're checking against the Value that you
already point out is not dependant on the formatting. You need to
test the Text against your width.
- You have another possible bug (not addressed in the code below) -
for proportional width fonts, the value referred to by
ColumnWidthis based on the width of the character '0' (see the
documentation).
If you happen to be using a font where any character is wider than
'0' (for example 'X' in my browser's default font), you can't reliably use the
ColumnWidth as a proxy for thelength of a string that it can display.
- 255 is the magic number, yes it is. Since you're testing the maximum
width of a text rendered cell to this value, and using it in
multiple places in your code to provide formatting information, make
it a constant and give it an appropriate name.
- I'm assuming that the call to
RefreshScreenis to provide some
sort of indication to the user that something is actually happening
instead of a hung script. I'd remove it completely. If you call
DoEvents Excel will have a chance to update the screen buffer. This is not only more performant (it avoids a function call), but it
will also be clearer to the reader what is going on. Even better
would be to simply omit it, as redrawing the screen is what is
killing your in performance in the first place (see below).
- Personally, I would cache the column widths and reset them when you
are done instead of using the arbitrary value '20'. This upholds
the principle of least astonishment.
- You set the
NumberFormatto text for the entire Worksheet at the
end of the
Function. Setting it for individual cells inside yourloop is not only redundant, but forces a
ScreenUpdate. This iskilling your performance. Basically, what is happening is that you are updating the display once for every cell in the
UsedRange.This is incredibly expensive. Turn
ScreenUpdating off instead.- As I mentioned in comments to your previous question, updating cells
is more expensive in terms of performance than you might guess.
When you are working with large ranges of cells, you should pull
them into an array if at all possible, then write the entire array
back at once. This lowers your "Excel overhead" almost
quadratically.
- You should avoid using the
Cellscollection (it is entirely
dependant on what object is active) and the
Selection object(almost every operation can be performed with the
Range youalready have a reference to).
- Use
Withblocks to your advantage, especially in loops. Every
'.' operator that you see in your code causes an object deference
call. Wrapping sections where the same object is used repeatedly
in a
With block essentially caches the result of the deferenceinstead of repeatedly performing it.
- Similar note with pulling cell values from a
Range. Accessing
the object is much more expensive than storing it. If you have to
reference the same value twice (looking at you,
If Cell.Text <> Cell.Value And Len(Cell.Value)
The above notes basically boil down to something more like this:
``Const MaxTextLength = 255
Private Sub ConvertAllCellsToText()
StartNewTask "Converting all cells to text"
Application.ScreenUpdating = False
Dim MyRange As Range
Set MyRange = ActiveSheet.UsedRange
Dim cache() As Long
cache = GetColumnWidths(MyRange)
With MyRange
.ColumnWidth = MaxTextLength
Dim Values() As Variant
ReDim Values(.Rows.Count, .Columns.Count)
Dim col As Long
Dim row As Long
For row = 0 To UBound(Values, 1)
For col = 0 To UBound(Values, 2)
Dim temp As String
temp = .Cells(row + 1, col + 1).Text
If Len(temp) <= MaxTextLength Then
Values(row, col) = temp
End If
Next col
Next row
.NumberFormat = "@"
End With
MyRange = Values
SetColumnWidths MyRange, cache
Application.ScreenUpdating = True
End Sub
Private Function GetColumnWidths(Target As Range) As Long()
Dim output() As Long
Code Snippets
Const MaxTextLength = 255
Private Sub ConvertAllCellsToText()
StartNewTask "Converting all cells to text"
Application.ScreenUpdating = False
Dim MyRange As Range
Set MyRange = ActiveSheet.UsedRange
Dim cache() As Long
cache = GetColumnWidths(MyRange)
With MyRange
.ColumnWidth = MaxTextLength
Dim Values() As Variant
ReDim Values(.Rows.Count, .Columns.Count)
Dim col As Long
Dim row As Long
For row = 0 To UBound(Values, 1)
For col = 0 To UBound(Values, 2)
Dim temp As String
temp = .Cells(row + 1, col + 1).Text
If Len(temp) <= MaxTextLength Then
Values(row, col) = temp
End If
Next col
Next row
.NumberFormat = "@"
End With
MyRange = Values
SetColumnWidths MyRange, cache
Application.ScreenUpdating = True
End Sub
Private Function GetColumnWidths(Target As Range) As Long()
Dim output() As Long
ReDim output(1 To Target.Columns.Count)
Dim index As Long
For index = 1 To Target.Columns.Count
output(index) = Target.Columns(index).ColumnWidth
Next index
GetColumnWidths = output
End Function
Private Sub SetColumnWidths(Target As Range, widths() As Long)
Dim index As Long
For index = LBound(widths) To UBound(widths)
Target.Columns(index).ColumnWidth = widths(index)
Next index
End SubContext
StackExchange Code Review Q#124059, answer score: 7
Revisions (0)
No revisions yet.