snippetsqlMinor
Export to format compatible with SQL Server
Viewed 0 times
formatwithsqlcompatibleexportserver
Problem
I need to provide data in a format that could be imported into a Microsoft SQL Server database; however, I do not have Windows or access to a SQL Server instance.
The data is currently in several CSV spreadsheets. I looked at
Is there a more reliable intermediary format that could be generated from Linux/OSX? For example MySQL dump, Excel, Sqlite, XML, etc?
I also need this import process to create the tables and not require coding. The person importing is non-technical. It would be a one-off process.
I won't be able to test the import because I don't have direct access to the SQL Server database, so am looking for a more reliable approach. For example a MySQL dump, XLS, XML, etc.
The data is currently in several CSV spreadsheets. I looked at
BULK INSERT but found their support for CSV is quite limited, such as not recognizing double quotes (documentation: Specifying Field and Row Terminators). I have read that edge cases like when a field happens to include a quote (and so needs to be escaped) are not supported.Is there a more reliable intermediary format that could be generated from Linux/OSX? For example MySQL dump, Excel, Sqlite, XML, etc?
I also need this import process to create the tables and not require coding. The person importing is non-technical. It would be a one-off process.
- The target system is SQL Server 2008, though I am ideally interested in a solution not tied to a specific version.
- The data to import is ~100MB in a single CSV file.
- There are embedded field/row delimiters, which works fine when the fields are quoted with the Python csv module.
- There is no need to handle foreign keys, indexes, constraints, etc.
- The encoding is UTF-8.
- The source is scraped data from German web pages currently stored in a CSV file. I could easily import it from there into MySQL / Postgres if that was a reliable vector.
I won't be able to test the import because I don't have direct access to the SQL Server database, so am looking for a more reliable approach. For example a MySQL dump, XLS, XML, etc.
Solution
There are some issues with this request:
-
What version and edition of SQL Server is the target system?
-
How much data is being imported? 10k, 10 Mb, more?
-
How many CSV files are there?
-
You have stated that handling of double-quotes is required, implying text-qualified fields and embedded text-qualifiers. Will there also be embedded field delimiters (i.e.
-
In a comment on Scott's answer you raise a concern about: "Double quotes is one example and I expect there are other shortcomings if such a basic feature is missing - unicode, size limits, etc.". Can you please update the question to include these and any other missing requirements?
-
In a comment on the question, you ask, regarding SSMS: "will this create the required tables or they must be predefined?". Is creating the tables a requirement of this import process? If so, or even if creating the tables is an optional benefit, can it please be stated clearly in the question?
-
If the tables do not already exist, what needs to happen with regards to Foreign Keys, Indexes, Default Constraints, Check Constraints, etc?
-
What collation is being used, or at least is desired for the SQL Server tables? Is the plan to simply inherit the current default collation for the database that you are importing into? And to be clear about the term "collation" since it can have slightly different meaning depending on the system you are using, I am speaking of: Locale / LCID / Culture / Code Page (if applicable). It would also help to know if the source is sensitive or insensitive for case, accents, etc, or if a binary collation is being used.
-
Is there a more reliable intermediary format that could be generated from Linux/OSX? For example MySQL dump, Excel, Sqlite, XML, etc?
The only truly reliable format will be something generated by SQL Server. XML is generally very reliable for transporting the data (this is what XML was meant to do), and doesn't have the parsing issues that are inherent in delimited files (i.e. embedded text qualifiers and delimiters). BUT, you still need to create the tables, and you need to write, and test the parsing of the XML into a result set so that it can be inserted into the tables. HOWEVER, (continued in the next item)...
-
You stated in that comment on Scott's answer:
Problem is I won't be able to test the import because I don't have direct access to the MS SQL database, so am looking for a more reliable approach.
"Reliability" can only be determined through testing. It doesn't matter what should work, anything can go wrong. For example, many folks are not aware that it is common for XML to have an encoding of UTF-8, yet SQL Server only handles UTF-16 (Little Endian) for XML or even NVARCHAR data. Seeing as how this data is coming from "Linux/OSX", I would expect the default encoding to be UTF-8.
The encoding issue and other nuances should all (well, "mostly") reveal themselves in testing, but you have no way to test. Hence it will be difficult to get a very reliable answer regarding a reliable import mechanism.
-
What is the source of the data? I assume it is either Linux or OSX as those were mentioned in the question. But is it coming from MySQL specifically (since a "MySQL dump" was also mentioned)? Knowing if the source is MySQL or PostgreSQL or flat files, etc will help determine what tools are both available and best suited to this request.
-
Regardless of the source, however, keep in mind that the destination (i.e. Windows / SQL Server) is most "comfortable" dealing with UTF-16 (Little Endian) encoded data. So while UTF-8 might be the default output type for Linux/OSX-based systems, if there is an option for using UTF-16 Little Endian for the output encoding, that will help reduce potential issues once the script is moved over to the destination system.
One last thing to consider: You mention "MySQL dump" as a potential "reliable format", and those dumps include the DDL (i.e.
And to make things even easier:
-
What version and edition of SQL Server is the target system?
-
How much data is being imported? 10k, 10 Mb, more?
-
How many CSV files are there?
-
You have stated that handling of double-quotes is required, implying text-qualified fields and embedded text-qualifiers. Will there also be embedded field delimiters (i.e.
, )? Those usually aren't a problem, but will there also be embedded row delimiters (i.e. \r\n or just \n )? These are the biggest problem for most CSV parsers.-
In a comment on Scott's answer you raise a concern about: "Double quotes is one example and I expect there are other shortcomings if such a basic feature is missing - unicode, size limits, etc.". Can you please update the question to include these and any other missing requirements?
-
In a comment on the question, you ask, regarding SSMS: "will this create the required tables or they must be predefined?". Is creating the tables a requirement of this import process? If so, or even if creating the tables is an optional benefit, can it please be stated clearly in the question?
-
If the tables do not already exist, what needs to happen with regards to Foreign Keys, Indexes, Default Constraints, Check Constraints, etc?
-
What collation is being used, or at least is desired for the SQL Server tables? Is the plan to simply inherit the current default collation for the database that you are importing into? And to be clear about the term "collation" since it can have slightly different meaning depending on the system you are using, I am speaking of: Locale / LCID / Culture / Code Page (if applicable). It would also help to know if the source is sensitive or insensitive for case, accents, etc, or if a binary collation is being used.
-
Is there a more reliable intermediary format that could be generated from Linux/OSX? For example MySQL dump, Excel, Sqlite, XML, etc?
The only truly reliable format will be something generated by SQL Server. XML is generally very reliable for transporting the data (this is what XML was meant to do), and doesn't have the parsing issues that are inherent in delimited files (i.e. embedded text qualifiers and delimiters). BUT, you still need to create the tables, and you need to write, and test the parsing of the XML into a result set so that it can be inserted into the tables. HOWEVER, (continued in the next item)...
-
You stated in that comment on Scott's answer:
Problem is I won't be able to test the import because I don't have direct access to the MS SQL database, so am looking for a more reliable approach.
"Reliability" can only be determined through testing. It doesn't matter what should work, anything can go wrong. For example, many folks are not aware that it is common for XML to have an encoding of UTF-8, yet SQL Server only handles UTF-16 (Little Endian) for XML or even NVARCHAR data. Seeing as how this data is coming from "Linux/OSX", I would expect the default encoding to be UTF-8.
The encoding issue and other nuances should all (well, "mostly") reveal themselves in testing, but you have no way to test. Hence it will be difficult to get a very reliable answer regarding a reliable import mechanism.
-
What is the source of the data? I assume it is either Linux or OSX as those were mentioned in the question. But is it coming from MySQL specifically (since a "MySQL dump" was also mentioned)? Knowing if the source is MySQL or PostgreSQL or flat files, etc will help determine what tools are both available and best suited to this request.
-
Regardless of the source, however, keep in mind that the destination (i.e. Windows / SQL Server) is most "comfortable" dealing with UTF-16 (Little Endian) encoded data. So while UTF-8 might be the default output type for Linux/OSX-based systems, if there is an option for using UTF-16 Little Endian for the output encoding, that will help reduce potential issues once the script is moved over to the destination system.
One last thing to consider: You mention "MySQL dump" as a potential "reliable format", and those dumps include the DDL (i.e.
CREATE TABLE) and DML (i.e. INSERT) statements. If you have the ability to write a script that contains the DDL statements (if necessary) and DML statements, then you don't have to worry about any formatting issues. Just convert the CSV data into INSERT statements and be done. The only issue you would be left with is not being able to test the scripts. But still, outside of which type of quote or brackets to use for object names and string literals, it's really hard to go wrong with:INSERT INTO table (column1_name, column2_name, ...) VALUES (column1_val, column2_val, ...);And to make things even easier:
- Enclose table and column names in square brackets:
[and](e.g.[Customers]), or double-quotes:"(e.g."Customers")
- Enclose string, date, and GUID literals in regular single-quotes / apostrophes:
'. (e.g.'some text')
Code Snippets
INSERT INTO table (column1_name, column2_name, ...) VALUES (column1_val, column2_val, ...);Context
StackExchange Database Administrators Q#150635, answer score: 7
Revisions (0)
No revisions yet.