HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Alternate Solution For REPLACE() In ntext Column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ntextcolumnreplaceforsolutionalternate

Problem

Overview

I have a picture database that catalogue information of pictures taken with cameras. The database contains a table pictures which has 256 columns which contain information about the picture that has been taken.

One column Comments is formatted as ntext and contains CR/LFs.

There are another 21 columns which have been configured as ntext.

I extract the data from the database into a flat file using the Tasks | Export Data... function found in SSMS. The exported data is then transferred by an external partner to a new system which will be used in the near future. The export file (CSV) contains some 256 columns with 21 columns that could possibly contain CR/LFs.
Problem

The Comments field/column contains numerous CR/LFs (SQL: CHAR(13), CHAR(10)) which is impacting the analysis of the data.

I tried using the REPLACE(expression, value, value) to search for the CR/LF and replace it with @@ and was thinking of implementing this during the export using Export Data in SSMS.

However, the REPLACE() function returns a

Msg 8116, Level 16, State 1, Line 4

Argument data type ntext is invalid for argument 1 of replace function.

...when I execute something like:

SELECT 'Start *******************', REPLACE(Comment,'
','@@'), ID, '********************End' FROM dbo.pictures
WHERE Comment LIKE '%
%';


Sample Data From Column Comment

Issuing the following statement:

SELECT Comment FROM dbo.Pictures
WHERE Comment like '%
%';


...will retrieve the following sample Comment record:

```
Zwei Fotos von Arenenberg auf einer Seite einer englischen Zeitschrift.
Seite 148 der Zeitung "The Graphic" vom 4. August 1906 = News from Abroad.
"The last stage of all": the retreat for aged actors opened last week near Meaux, in France
  1. General view of the home
  2. M. Coquelin reciting in the open-air theater



The château of Arenenberg which has been presented by the Empress Eugénie to the canton of Thurgovie
  1. View from the chateau [Arenenb

Solution

SQL Server's string handling functions are a bit inconsistent when dealing with long strings, but if you take care to make sure input to be searched is NVARCHAR(MAX) then the REPLACE() function will work over data longer than 8,000 bytes (4,000 characters for 'N'-types). This means that you can manipulate long NTEXT values by casting them to NVARCHAR(MAX) and back again, without them getting truncated, like so:

INSERT @t(tt) 
    SELECT CAST(
              REPLICATE(CAST(N'x' AS NVARCHAR(MAX)), 16000) -- Note: for replicate to work as expected make sure its input is explicitly a long type
            + N'☠'
        AS NTEXT)
SELECT DATALENGTH(tt) FROM @t -- 32,002 data length so we have more than the limit of non-MAX character types
-- Now test working with the long value:
SELECT DATALENGTH(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','y') AS NTEXT)) FROM @t 
-- still 32,002, has not been truncated
SELECT DATALENGTH(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','yz') AS NTEXT)) FROM @t 
-- 64,002
SELECT RIGHT(CAST(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','yz') AS NTEXT) AS NVARCHAR(MAX)), 10) FROM @t 
-- paranoia check, we've not lost the non-ASCII character at the end through truncation or conversion


This won't be particularly efficient though, especially if you have truly long values in those NTEXT columns.

So for your db-fiddle example:

select REPLACE(CAST(Comment AS NVARCHAR(MAX)), CHAR(10), '@@') from Pictures;


Note that the fiddle has replaced your double-byte EOL with just CHAR(10). Also you may need to cast back to NTEXT (as done in my earlier examples) if the receiving application expects that, for example SSIS will certainly complain if it received a normal long string value when expecting a legacy blob type.

Code Snippets

INSERT @t(tt) 
    SELECT CAST(
              REPLICATE(CAST(N'x' AS NVARCHAR(MAX)), 16000) -- Note: for replicate to work as expected make sure its input is explicitly a long type
            + N'☠'
        AS NTEXT)
SELECT DATALENGTH(tt) FROM @t -- 32,002 data length so we have more than the limit of non-MAX character types
-- Now test working with the long value:
SELECT DATALENGTH(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','y') AS NTEXT)) FROM @t 
-- still 32,002, has not been truncated
SELECT DATALENGTH(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','yz') AS NTEXT)) FROM @t 
-- 64,002
SELECT RIGHT(CAST(CAST(REPLACE(CAST(tt AS NVARCHAR(MAX)),'x','yz') AS NTEXT) AS NVARCHAR(MAX)), 10) FROM @t 
-- paranoia check, we've not lost the non-ASCII character at the end through truncation or conversion
select REPLACE(CAST(Comment AS NVARCHAR(MAX)), CHAR(10), '@@') from Pictures;

Context

StackExchange Database Administrators Q#300198, answer score: 8

Revisions (0)

No revisions yet.