principlesqlMinor
Multiple Inserts vs. Bulk Insert From Disk for hydrating SQL Server reference tables
Viewed 0 times
referencetablesinsertdiskinsertssqlbulkhydratingformultiple
Problem
When doing database design, I often use reference/support tables, as we all do. Each time I start up a new project there are inevitably tables which:
A perfect example of this might be a
I have used a number of different approaches to "hydrate" these kinds of tables, primarily:
-
A hydration SQL script, reading from disk, using
-
And in extreme cases a programmatic script (f#, python).
My preferred approach is the first if all tables only contain a few records. If the tables are beyond this limit, I typically like a script reading from CSV files. I elect to use CSV in that it's both compact and human-readable.
How is everyone else handling this situation?
I realize that this is a somewhat opinionated question. But I figured it was worth asking, since it will inevitably have concrete answers with technical rationale.
- Have a predefined set of values.
- Will never change, and likely never encounter new records (or very infrequently).
A perfect example of this might be a
Country table,.create table Country
(
CountryKey int not null identity
,CountryName varchar(64) not null
,IsoNumber int not null
,Iso3 varchar(8) not null
,constraint pk_country primary key clustered (CountryKey)
);I have used a number of different approaches to "hydrate" these kinds of tables, primarily:
- A hydration SQL script with insert statements.
insert into Country (CountryName, IsoNumber ...) values ('Canada', ...)
-
A hydration SQL script, reading from disk, using
bulk insert.bulk insert MyDatabase.dbo.Country from ...
-
And in extreme cases a programmatic script (f#, python).
My preferred approach is the first if all tables only contain a few records. If the tables are beyond this limit, I typically like a script reading from CSV files. I elect to use CSV in that it's both compact and human-readable.
How is everyone else handling this situation?
I realize that this is a somewhat opinionated question. But I figured it was worth asking, since it will inevitably have concrete answers with technical rationale.
Solution
Separate Files
Keep
INSERT vs Other
Bulk operations (f#,python,CSV) are faster than multiple single-row
I would definitely use
Usage of Code
I would only consider using code (f#,python) if and only if the data is Calculated based on input parameter(s). (eg a
CSV Scripts
For sizable data, you'd want to use a CSV Script for speed
The script that loads the CSV (or other format) file should use the Filename as a parameter. All utilities (eg SQL*Loader) do this; you just need to filter the parameter up to the Shell Script level.
This allows you to load a different (project specific/updated revision) file at-will.
Make sure you note if globbing (eg
Dumps
One of the easiest ways to port tables is to use Dumps (as Dave has suggested). (eg Oracle's
You should still have the DDL statements available so that you can recreate the tables (and dump) as necessary.
Keep
CREATE TABLE, INSERT, GRANT scripts separately. Data MIGHT change "per project" (eg 'Oui' vs 'YES'). Additionally, this seems to be a common design for most projects I've seen/worked on.INSERT vs Other
Bulk operations (f#,python,CSV) are faster than multiple single-row
INSERT statements. I would definitely use
INSERT for a few rows. The "cutoff point" between INSERT and Code/CSV would be a personal decision.Usage of Code
I would only consider using code (f#,python) if and only if the data is Calculated based on input parameter(s). (eg a
DAY_DIMENSION table that loads n days at a time)CSV Scripts
For sizable data, you'd want to use a CSV Script for speed
The script that loads the CSV (or other format) file should use the Filename as a parameter. All utilities (eg SQL*Loader) do this; you just need to filter the parameter up to the Shell Script level.
This allows you to load a different (project specific/updated revision) file at-will.
Make sure you note if globbing (eg
LoadMyData.sh data_0[1-3]*.csv) is allowed.Dumps
One of the easiest ways to port tables is to use Dumps (as Dave has suggested). (eg Oracle's
expdp/impdp)You should still have the DDL statements available so that you can recreate the tables (and dump) as necessary.
Context
StackExchange Database Administrators Q#228087, answer score: 4
Revisions (0)
No revisions yet.