HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Converting all cells to text

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
cellsconvertingtextall

Problem

Microsoft Excel cells basically have two values: the formatted value that is shown to the user, and the actual value of the cell. These two values can be extremely different when the cell has certain formatting types such as date, datetime, currency, numbers with commas, etc.

I am writing a script that cleans up database data. It uses find, replace, RegEx, and other functions that act on strings of text. I need to be able to convert data into very specific formats so that my upload tool will accept the data. Therefore I need all of the cells in my Excel spreadsheets to be TEXT. I need the displayed values to match the actual values.

This is surprisingly hard to do in Excel. There is no paste special option that I could find that pastes what you see on your screen as text. "Paste Special -> Values" pastes, predictably, unformatted values. So if you try to "Paste Special -> Values" into cells formatted as TEXT, and they are coming from cells formatted as DATES, you might get something like this:

Old Cell: 4/7/2001

New Cell: 36988

If you paste normally, then the paste goes in as 4/7/2001, but in date format. So if I create a RegEx that looks for 4/7/2001 and changes it to 4/7/01, it will not work, because the underlying value is 36988.

Anyway, I discovered the Cell.Text property and wrote some code that converts all cells in a spreadsheet from formatted to text. But the code has two major problems:

1) It does not work on cells with more than 255 characters. This is because Cell.Text literally displays what the cell is actually showing to the user, and if the column is too narrow, it will display pound signs (########), and columns have a maximum width of 255 characters.

2) It is extremely slow. In my 1.5 million cell test sheet, code execution took 394 seconds.

I'm looking for ideas to optimize the code's speed, and to make the code work on all display value lengths.

```
Function ConvertAllCellsToText()

' Convert all cells from non-text to text

Solution

This code has basically the same performance issue as your last question, and the performance solution is pretty much the same. Other things to note:

  • This is more of a nag than anything, but you should really remove


dead code that is commented out.

  • The variable Cell is never declared. You should put Option Explicit at the top of all of your code modules. See Why should I


dimension my variables in VBA
really? over on SO.

  • You don't return a value, so this should really be a Sub, not a


Function. If you don't want it to appear in the macro list,
declare it as Private. I'd get into the habit of always
explicitly declaring your scope.

  • You have a bug. You explain pretty clearly in your question that


the .Value of a cell can differ dramatically from the .Text of a
cell depending on the formatting. However, when you test the length
of the cell values, you're checking against the Value that you
already point out is not dependant on the formatting. You need to
test the Text against your width.

  • You have another possible bug (not addressed in the code below) -


for proportional width fonts, the value referred to by ColumnWidth
is based on the width of the character '0' (see the
documentation).
If you happen to be using a font where any character is wider than
'0' (for example 'X' in my browser's default font), you can't reliably use the ColumnWidth as a proxy for the
length of a string that it can display.

  • 255 is the magic number, yes it is. Since you're testing the maximum


width of a text rendered cell to this value, and using it in
multiple places in your code to provide formatting information, make
it a constant and give it an appropriate name.

  • I'm assuming that the call to RefreshScreen is to provide some


sort of indication to the user that something is actually happening
instead of a hung script. I'd remove it completely. If you call
DoEvents Excel will have a chance to update the screen buffer.
This is not only more performant (it avoids a function call), but it
will also be clearer to the reader what is going on. Even better
would be to simply omit it, as redrawing the screen is what is
killing your in performance in the first place (see below).

  • Personally, I would cache the column widths and reset them when you


are done instead of using the arbitrary value '20'. This upholds
the principle of least astonishment.

  • You set the NumberFormat to text for the entire Worksheet at the


end of the Function. Setting it for individual cells inside your
loop is not only redundant, but forces a ScreenUpdate. This is
killing your performance. Basically, what is happening is that you are updating the display once for every cell in the UsedRange.
This is incredibly expensive. Turn ScreenUpdating off instead.

  • As I mentioned in comments to your previous question, updating cells


is more expensive in terms of performance than you might guess.
When you are working with large ranges of cells, you should pull
them into an array if at all possible, then write the entire array
back at once. This lowers your "Excel overhead" almost
quadratically.

  • You should avoid using the Cells collection (it is entirely


dependant on what object is active) and the Selection object
(almost every operation can be performed with the Range you
already have a reference to).

  • Use With blocks to your advantage, especially in loops. Every


'.' operator that you see in your code causes an object deference
call. Wrapping sections where the same object is used repeatedly
in a With block essentially caches the result of the deference
instead of repeatedly performing it.

  • Similar note with pulling cell values from a Range. Accessing


the object is much more expensive than storing it. If you have to
reference the same value twice (looking at you, If Cell.Text <> Cell.Value And Len(Cell.Value)

The above notes basically boil down to something more like this:

``
Const MaxTextLength = 255

Private Sub ConvertAllCellsToText()
StartNewTask "Converting all cells to text"
Application.ScreenUpdating = False

Dim MyRange As Range
Set MyRange = ActiveSheet.UsedRange
Dim cache() As Long
cache = GetColumnWidths(MyRange)

With MyRange
.ColumnWidth = MaxTextLength
Dim Values() As Variant
ReDim Values(.Rows.Count, .Columns.Count)

Dim col As Long
Dim row As Long

For row = 0 To UBound(Values, 1)
For col = 0 To UBound(Values, 2)
Dim temp As String
temp = .Cells(row + 1, col + 1).Text
If Len(temp) <= MaxTextLength Then
Values(row, col) = temp
End If
Next col
Next row
.NumberFormat = "@"
End With

MyRange = Values
SetColumnWidths MyRange, cache
Application.ScreenUpdating = True
End Sub

Private Function GetColumnWidths(Target As Range) As Long()
Dim output() As Long

Code Snippets

Const MaxTextLength = 255

Private Sub ConvertAllCellsToText()
    StartNewTask "Converting all cells to text"
    Application.ScreenUpdating = False

    Dim MyRange As Range
    Set MyRange = ActiveSheet.UsedRange
    Dim cache() As Long
    cache = GetColumnWidths(MyRange)

    With MyRange
        .ColumnWidth = MaxTextLength
        Dim Values() As Variant
        ReDim Values(.Rows.Count, .Columns.Count)

        Dim col As Long
        Dim row As Long

        For row = 0 To UBound(Values, 1)
            For col = 0 To UBound(Values, 2)
                Dim temp As String
                temp = .Cells(row + 1, col + 1).Text
                If Len(temp) <= MaxTextLength Then
                    Values(row, col) = temp
                End If
            Next col
        Next row
        .NumberFormat = "@"
    End With

    MyRange = Values
    SetColumnWidths MyRange, cache
    Application.ScreenUpdating = True
End Sub

Private Function GetColumnWidths(Target As Range) As Long()
    Dim output() As Long
    ReDim output(1 To Target.Columns.Count)
    Dim index As Long
    For index = 1 To Target.Columns.Count
        output(index) = Target.Columns(index).ColumnWidth
    Next index
    GetColumnWidths = output
End Function

Private Sub SetColumnWidths(Target As Range, widths() As Long)
    Dim index As Long
    For index = LBound(widths) To UBound(widths)
        Target.Columns(index).ColumnWidth = widths(index)
    Next index
End Sub

Context

StackExchange Code Review Q#124059, answer score: 7

Revisions (0)

No revisions yet.