patternsqlModerate
Best way to re-import large amount of data with minimal downtime
Viewed 0 times
downtimeamountwithwaylargeminimaldataimportbest
Problem
I need to import about 500,000 records containing IP lookup (read-only reference) data about once a week (only three int/bigint cols).
I don't really want to worry about merging the data with the existing table, I'd prefer to clear down the old and reimport.
Ideally queries running over the data would continue to run (we don't get a lot of these and it is acceptable for them to run a little bit slower whilst the import occurs, but need to be up 24/7, so running this "out of hours" is not an option).
Things Tried So far
SSIS: I have created an SSIS package that truncates the table and imports - it takes about 30 seconds to run (too long really).
Temp Table: Importing into a temp table, truncating and copying across also takes about 30 seconds.
BCP: Bulk Import also is rather too slow (for some reason it's slower than SSIS (even with no indices to maintain) - I'm guessing it's something to do with the char->int/bigint transactions :/
Mirror table? So, at the moment, I'm wondering about reading the table through a view, importing the data into a mirror table, and altering the view to point to this table... this seems like it will be quick, but it seems tiny bit hacky to me.
This seems like it should be a common problem, but I can't find recommended practises - any ideas would be most appreciated!
Thanks
I don't really want to worry about merging the data with the existing table, I'd prefer to clear down the old and reimport.
Ideally queries running over the data would continue to run (we don't get a lot of these and it is acceptable for them to run a little bit slower whilst the import occurs, but need to be up 24/7, so running this "out of hours" is not an option).
Things Tried So far
SSIS: I have created an SSIS package that truncates the table and imports - it takes about 30 seconds to run (too long really).
Temp Table: Importing into a temp table, truncating and copying across also takes about 30 seconds.
BCP: Bulk Import also is rather too slow (for some reason it's slower than SSIS (even with no indices to maintain) - I'm guessing it's something to do with the char->int/bigint transactions :/
Mirror table? So, at the moment, I'm wondering about reading the table through a view, importing the data into a mirror table, and altering the view to point to this table... this seems like it will be quick, but it seems tiny bit hacky to me.
This seems like it should be a common problem, but I can't find recommended practises - any ideas would be most appreciated!
Thanks
Solution
A solution I've used in the past (and have recommended here and on StackOverflow before) is to create two additional schemas:
Now create a mimic of your table in the
Now when you are doing your switch operation:
This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.
CREATE SCHEMA shadow AUTHORIZATION dbo;
CREATE SCHEMA cache AUTHORIZATION dbo;Now create a mimic of your table in the
cache schema:CREATE TABLE cache.IPLookup(...columns...);Now when you are doing your switch operation:
TRUNCATE TABLE cache.IPLookup;
BULK INSERT cache.IPLookup FROM ...;
-- the nice thing about the above is that it doesn't really
-- matter if it takes one minute or ten - you're not messing
-- with a table that anyone is using, so you aren't going to
-- interfere with active users.
-- this is a metadata operation so extremely fast - it will wait
-- for existing locks to be released, but won't block new locks
-- for very long at all:
BEGIN TRANSACTION;
ALTER SCHEMA shadow TRANSFER dbo.IPLookup;
ALTER SCHEMA dbo TRANSFER cache.IPLookup;
COMMIT TRANSACTION;
-- now let's move the shadow table back over to
-- the cache schema so it's ready for next load:
ALTER SCHEMA cache TRANSFER shadow.IPLookup;
TRUNCATE TABLE cache.IPLookup;
-- truncate is optional - I usually keep the data
-- around for debugging, but that's probably not
-- necessary in this case.This will be more cumbersome if you have foreign keys and other dependencies (since you may have to drop those and re-create them), and of course it completely invalidates statistics etc. and this, in turn, can affect plans, but if the most important thing is getting accurate data in front of your users with minimal interruption, this can be an approach to consider.
Code Snippets
CREATE SCHEMA shadow AUTHORIZATION dbo;
CREATE SCHEMA cache AUTHORIZATION dbo;CREATE TABLE cache.IPLookup(...columns...);TRUNCATE TABLE cache.IPLookup;
BULK INSERT cache.IPLookup FROM ...;
-- the nice thing about the above is that it doesn't really
-- matter if it takes one minute or ten - you're not messing
-- with a table that anyone is using, so you aren't going to
-- interfere with active users.
-- this is a metadata operation so extremely fast - it will wait
-- for existing locks to be released, but won't block new locks
-- for very long at all:
BEGIN TRANSACTION;
ALTER SCHEMA shadow TRANSFER dbo.IPLookup;
ALTER SCHEMA dbo TRANSFER cache.IPLookup;
COMMIT TRANSACTION;
-- now let's move the shadow table back over to
-- the cache schema so it's ready for next load:
ALTER SCHEMA cache TRANSFER shadow.IPLookup;
TRUNCATE TABLE cache.IPLookup;
-- truncate is optional - I usually keep the data
-- around for debugging, but that's probably not
-- necessary in this case.Context
StackExchange Database Administrators Q#22293, answer score: 13
Revisions (0)
No revisions yet.