HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Insert an SQL Server Table rows into a Bigquery table in one bloc

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rowsinsertsqlintobigquerybloconeservertable

Problem

I want to insert all rows of an SQL server Table into a BigQuery Table having the same schema.

The streaming insert row by row is very slow: to insert 1000 rows the execution of the code below took about 10 minutes.

In this code I loop over the first 10 files in a certain folder, and I insert the content of this file in a unique SQL Server Table. Once I looped over the desire files, I loop over the SQL Server table (which contain all rows of all files) and I insert the content row by row in a BigQuery Table.

This operation is very slow. Does someone have a better (faster) solution to insert the content of an SQL server Table into a BigQuery Table automatically (via a code)?


    
    

 

 

FileList: #FileList#

    
    
     
        
        
         
        
        
            
        
    

 --->

 
 
    
    
    
    

    
    

    

    

      

 

    
    
     
        
        
         --->
        
        
            
        
    

    

 
 

Solution

Wrong Approach

I think what you are trying to do makes sense, however I think that your row-by-row approach is what is making this so slow. Anything you do row-by-row with SQL (e.g., a cursor or loop) will be slow because you are executing the whole section of query anew for each row. All SQL is optimized to work with large data sets, not single rows.
Suggested Approach

I feel sure that you could establish a connection so the BigQuery server directly and pass it sets of data via sp_addlinkedserver or similar approach. Contact the people at BigQuery to help with this.

Do scan through your files to insert the data into your local SQL server. Then you could just do something like:

INSERT INTO [BigQueryServer].[database].[schema].[table]
    SELECT * FROM [LocalServer].[database].[schema].[table]
    WHERE [LocalServer].[database].[schema].[table].[added_timestamp] -- or whatever column you use to keep track of records added
        >= '2014-07-01' -- or whatever date

Code Snippets

INSERT INTO [BigQueryServer].[database].[schema].[table]
    SELECT * FROM [LocalServer].[database].[schema].[table]
    WHERE [LocalServer].[database].[schema].[table].[added_timestamp] -- or whatever column you use to keep track of records added
        >= '2014-07-01' -- or whatever date

Context

StackExchange Code Review Q#51828, answer score: 3

Revisions (0)

No revisions yet.