patternsqlMinor
Insert an SQL Server Table rows into a Bigquery table in one bloc
Viewed 0 times
rowsinsertsqlintobigquerybloconeservertable
Problem
I want to insert all rows of an SQL server Table into a BigQuery Table having the same schema.
The streaming insert row by row is very slow: to insert 1000 rows the execution of the code below took about 10 minutes.
In this code I loop over the first 10 files in a certain folder, and I insert the content of this file in a unique SQL Server Table. Once I looped over the desire files, I loop over the SQL Server table (which contain all rows of all files) and I insert the content row by row in a BigQuery Table.
This operation is very slow. Does someone have a better (faster) solution to insert the content of an SQL server Table into a BigQuery Table automatically (via a code)?
The streaming insert row by row is very slow: to insert 1000 rows the execution of the code below took about 10 minutes.
In this code I loop over the first 10 files in a certain folder, and I insert the content of this file in a unique SQL Server Table. Once I looped over the desire files, I loop over the SQL Server table (which contain all rows of all files) and I insert the content row by row in a BigQuery Table.
This operation is very slow. Does someone have a better (faster) solution to insert the content of an SQL server Table into a BigQuery Table automatically (via a code)?
FileList: #FileList#
--->
--->
Solution
Wrong Approach
I think what you are trying to do makes sense, however I think that your row-by-row approach is what is making this so slow. Anything you do row-by-row with SQL (e.g., a cursor or loop) will be slow because you are executing the whole section of query anew for each row. All SQL is optimized to work with large data sets, not single rows.
Suggested Approach
I feel sure that you could establish a connection so the BigQuery server directly and pass it sets of data via sp_addlinkedserver or similar approach. Contact the people at BigQuery to help with this.
Do scan through your files to insert the data into your local SQL server. Then you could just do something like:
I think what you are trying to do makes sense, however I think that your row-by-row approach is what is making this so slow. Anything you do row-by-row with SQL (e.g., a cursor or loop) will be slow because you are executing the whole section of query anew for each row. All SQL is optimized to work with large data sets, not single rows.
Suggested Approach
I feel sure that you could establish a connection so the BigQuery server directly and pass it sets of data via sp_addlinkedserver or similar approach. Contact the people at BigQuery to help with this.
Do scan through your files to insert the data into your local SQL server. Then you could just do something like:
INSERT INTO [BigQueryServer].[database].[schema].[table]
SELECT * FROM [LocalServer].[database].[schema].[table]
WHERE [LocalServer].[database].[schema].[table].[added_timestamp] -- or whatever column you use to keep track of records added
>= '2014-07-01' -- or whatever dateCode Snippets
INSERT INTO [BigQueryServer].[database].[schema].[table]
SELECT * FROM [LocalServer].[database].[schema].[table]
WHERE [LocalServer].[database].[schema].[table].[added_timestamp] -- or whatever column you use to keep track of records added
>= '2014-07-01' -- or whatever dateContext
StackExchange Code Review Q#51828, answer score: 3
Revisions (0)
No revisions yet.