debugsqlMinor
Is it possible to use OPENROWSET to import fixed width UTF8 encoded files?
Viewed 0 times
useutf8encodedpossiblewidthfilesfixedopenrowsetimport
Problem
I have an example data file with following contents and saved with UTF8 encoding.
The format of this file is fixed width with columns 1 to 3 each being allocated 1 character and column 4 reserved 5 characters.
I have created an XML format file as below
Disappointingly running the following SQL...
Produces the following results
from which I conclude the
Is there any way I can get this working correctly for fixed character widths with UTF8 encoding?
(Target environment is Azure SQL Database reading from Blob storage)
NB: It was suggested in the comments that adding
oab~opqr
öab~öpqr
öab~öpqrThe format of this file is fixed width with columns 1 to 3 each being allocated 1 character and column 4 reserved 5 characters.
I have created an XML format file as below
Disappointingly running the following SQL...
SELECT *
FROM OPENROWSET
(
BULK 'mydata.txt',
FORMATFILE = 'myformat_file.xml',
CODEPAGE = '65001'
) AS XProduces the following results
Col1 Col2 Col3 Col4
---- ---- ---- -----
o a b ~opqr
� � a b~öp
� � a b~öpfrom which I conclude the
LENGTH is counting bytes rather than characters.Is there any way I can get this working correctly for fixed character widths with UTF8 encoding?
(Target environment is Azure SQL Database reading from Blob storage)
NB: It was suggested in the comments that adding
COLLATION="LATIN1_GENERAL_100_CI_AS_SC_UTF8" to the FIELD elements might help but the results remain unchanged with this.Solution
One workaround is to just change the format file to bring the whole line in, in bulk, and do the substring-ing in TSQL
With format file
The following does return the desired results
With format file
The following does return the desired results
SELECT SUBSTRING(WholeLine, 1,1) AS Col1,
SUBSTRING(WholeLine, 2,1) AS Col2,
SUBSTRING(WholeLine, 3,1) AS Col3,
SUBSTRING(WholeLine, 4,5) AS Col4
FROM OPENROWSET
(
BULK 'mydata.txt',
FORMATFILE = 'myformat_file.xml',
CODEPAGE = '65001'
) AS XCode Snippets
<?xml version = "1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD xsi:type="CharTerm" ID="WholeLine" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="WholeLine" NAME="WholeLine" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>SELECT SUBSTRING(WholeLine, 1,1) AS Col1,
SUBSTRING(WholeLine, 2,1) AS Col2,
SUBSTRING(WholeLine, 3,1) AS Col3,
SUBSTRING(WholeLine, 4,5) AS Col4
FROM OPENROWSET
(
BULK 'mydata.txt',
FORMATFILE = 'myformat_file.xml',
CODEPAGE = '65001'
) AS XContext
StackExchange Database Administrators Q#303337, answer score: 9
Revisions (0)
No revisions yet.