HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Import poorly formatted CSV/JSON to SQL Server

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
serversqlcsvpoorlyformattedjsonimport

Problem

I am trying to parse Android notification logs into a database. The columns are uneven because different notifications omit certain attributes. Essentially, it is a CSV file of key-value pairs. To put it another way, it is a really poorly formatted (& syntactically incorrect) JSON file. Maybe it would be easier to show you...

{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}


I have truncated them for the sake of brevity, but they do not line up toward the end of the rows. Some text values are quoted, some are not. All of the keys are quoted. I can use string functions to trim the extraneous characters once I get the columns sorted, but therein lies the rub: How do I sort the values into columns? I can do this in about 4 different programming languages, but I am just learning T-SQL.

How can I parse this to SQL Server (2016) in a single query? Is it even possible?

I could import it to a temporary table & then transfer it to the permanent one, but I would like to have one query to digest the info, since this will be repeated on a daily (maybe more) basis. The magic would be if I could declare a second-level delimiter, since all values are preceded by a colon. Declaring the colon as the delimiter breaks it the other way...arg(s)!

The data is all around ragged. Even the notifications from the same app may be drastically different. The way I would do it in PHP, C, JS, etc would be to loop through the keys until I hit the ones I needed, grab the appropriate values following them (or the whole pair), th

Solution

You can use the technique shown in Loading line-delimited JSON files in SQL Server 2016 on the SQL Server Database Engine Blog, written by Jovan Popovic .

For example, given a file import.txt:

{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}

and a format file import.fmt:

13.0
1
1 SQLCHAR 0 0 "\r\n" 1 json ""

The file can be bulk imported and parsed with:

SELECT
    Parsed.packageName,
    Parsed.postTime,
    Parsed.systemTime,
    Parsed.offset,
    Parsed.isOngoing
FROM OPENROWSET
(
    BULK 'C:\Temp\import.txt',
    FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
    packageName varchar(50),
    postTime    bigint,
    systemTime  bigint,
    offset      bigint,
    isOngoing   bit
) AS Parsed;


Giving:

Code Snippets

SELECT
    Parsed.packageName,
    Parsed.postTime,
    Parsed.systemTime,
    Parsed.offset,
    Parsed.isOngoing
FROM OPENROWSET
(
    BULK 'C:\Temp\import.txt',
    FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
    packageName varchar(50),
    postTime    bigint,
    systemTime  bigint,
    offset      bigint,
    isOngoing   bit
) AS Parsed;

Context

StackExchange Database Administrators Q#167814, answer score: 3

Revisions (0)

No revisions yet.