patternsqlMinor
Import poorly formatted CSV/JSON to SQL Server
Viewed 0 times
serversqlcsvpoorlyformattedjsonimport
Problem
I am trying to parse Android notification logs into a database. The columns are uneven because different notifications omit certain attributes. Essentially, it is a CSV file of key-value pairs. To put it another way, it is a really poorly formatted (& syntactically incorrect) JSON file. Maybe it would be easier to show you...
I have truncated them for the sake of brevity, but they do not line up toward the end of the rows. Some text values are quoted, some are not. All of the keys are quoted. I can use string functions to trim the extraneous characters once I get the columns sorted, but therein lies the rub: How do I sort the values into columns? I can do this in about 4 different programming languages, but I am just learning T-SQL.
How can I parse this to SQL Server (2016) in a single query? Is it even possible?
I could import it to a temporary table & then transfer it to the permanent one, but I would like to have one query to digest the info, since this will be repeated on a daily (maybe more) basis. The magic would be if I could declare a second-level delimiter, since all values are preceded by a colon. Declaring the colon as the delimiter breaks it the other way...arg(s)!
The data is all around ragged. Even the notifications from the same app may be drastically different. The way I would do it in PHP, C, JS, etc would be to loop through the keys until I hit the ones I needed, grab the appropriate values following them (or the whole pair), th
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
I have truncated them for the sake of brevity, but they do not line up toward the end of the rows. Some text values are quoted, some are not. All of the keys are quoted. I can use string functions to trim the extraneous characters once I get the columns sorted, but therein lies the rub: How do I sort the values into columns? I can do this in about 4 different programming languages, but I am just learning T-SQL.
How can I parse this to SQL Server (2016) in a single query? Is it even possible?
I could import it to a temporary table & then transfer it to the permanent one, but I would like to have one query to digest the info, since this will be repeated on a daily (maybe more) basis. The magic would be if I could declare a second-level delimiter, since all values are preceded by a colon. Declaring the colon as the delimiter breaks it the other way...arg(s)!
The data is all around ragged. Even the notifications from the same app may be drastically different. The way I would do it in PHP, C, JS, etc would be to loop through the keys until I hit the ones I needed, grab the appropriate values following them (or the whole pair), th
Solution
You can use the technique shown in Loading line-delimited JSON files in SQL Server 2016 on the SQL Server Database Engine Blog, written by Jovan Popovic .
For example, given a file
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
and a format file
13.0
1
1 SQLCHAR 0 0 "\r\n" 1 json ""
The file can be bulk imported and parsed with:
Giving:
For example, given a file
import.txt:{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
and a format file
import.fmt:13.0
1
1 SQLCHAR 0 0 "\r\n" 1 json ""
The file can be bulk imported and parsed with:
SELECT
Parsed.packageName,
Parsed.postTime,
Parsed.systemTime,
Parsed.offset,
Parsed.isOngoing
FROM OPENROWSET
(
BULK 'C:\Temp\import.txt',
FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
packageName varchar(50),
postTime bigint,
systemTime bigint,
offset bigint,
isOngoing bit
) AS Parsed;Giving:
Code Snippets
SELECT
Parsed.packageName,
Parsed.postTime,
Parsed.systemTime,
Parsed.offset,
Parsed.isOngoing
FROM OPENROWSET
(
BULK 'C:\Temp\import.txt',
FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
packageName varchar(50),
postTime bigint,
systemTime bigint,
offset bigint,
isOngoing bit
) AS Parsed;Context
StackExchange Database Administrators Q#167814, answer score: 3
Revisions (0)
No revisions yet.