principlesqlMinor

Multiple Inserts vs. Bulk Insert From Disk for hydrating SQL Server reference tables

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

referencetablesinsertdiskinsertssqlbulkhydratingformultiple

Problem

When doing database design, I often use reference/support tables, as we all do. Each time I start up a new project there are inevitably tables which:

Have a predefined set of values.

Will never change, and likely never encounter new records (or very infrequently).

A perfect example of this might be a Country table,.

create table Country
(
  CountryKey  int         not null identity
 ,CountryName varchar(64) not null
 ,IsoNumber   int         not null
 ,Iso3        varchar(8)  not null
 ,constraint pk_country primary key clustered (CountryKey)
);

I have used a number of different approaches to "hydrate" these kinds of tables, primarily:

A hydration SQL script with insert statements.

insert into Country (CountryName, IsoNumber ...) values ('Canada', ...)

-
A hydration SQL script, reading from disk, using bulk insert.

bulk insert MyDatabase.dbo.Country from ...

-
And in extreme cases a programmatic script (f#, python).

My preferred approach is the first if all tables only contain a few records. If the tables are beyond this limit, I typically like a script reading from CSV files. I elect to use CSV in that it's both compact and human-readable.

How is everyone else handling this situation?

I realize that this is a somewhat opinionated question. But I figured it was worth asking, since it will inevitably have concrete answers with technical rationale.

Solution

Separate Files

Keep CREATE TABLE, INSERT, GRANT scripts separately. Data MIGHT change "per project" (eg 'Oui' vs 'YES'). Additionally, this seems to be a common design for most projects I've seen/worked on.

INSERT vs Other

Bulk operations (f#,python,CSV) are faster than multiple single-row INSERT statements.

I would definitely use INSERT for a few rows. The "cutoff point" between INSERT and Code/CSV would be a personal decision.

Usage of Code

I would only consider using code (f#,python) if and only if the data is Calculated based on input parameter(s). (eg a DAY_DIMENSION table that loads n days at a time)

CSV Scripts

For sizable data, you'd want to use a CSV Script for speed

The script that loads the CSV (or other format) file should use the Filename as a parameter. All utilities (eg SQL*Loader) do this; you just need to filter the parameter up to the Shell Script level.
This allows you to load a different (project specific/updated revision) file at-will.

Make sure you note if globbing (eg LoadMyData.sh data_0[1-3]*.csv) is allowed.

Dumps

One of the easiest ways to port tables is to use Dumps (as Dave has suggested). (eg Oracle's expdp/impdp)

You should still have the DDL statements available so that you can recreate the tables (and dump) as necessary.

Context

StackExchange Database Administrators Q#228087, answer score: 4

Revisions (0)

No revisions yet.