patternrubyMinor
Optimize mass import of XML to SQLServer in Ruby
Viewed 0 times
masssqlserverxmloptimizerubyimport
Problem
I have the following Ruby code that's designed to update item, price and stock data for items in a MSSQL database. It's running on a Ruby 1.8.6/Rails 1.2.3 installation, in its own controller (for now)
What I'm looking for is ways to optimize performance.
Right now each item takes about 0,2 seconds (200ms) to process. Edit: There may be 10,000+ items in the XML file, and 10,000,000+ items in the "Items" and "Price" SQL tables.
I've been told that
-
composing my own SQL queries directly instead of using the models will be faster, since I'm doing quite a few lookups, how much performance would that lend me?
-
and selecting many rows at once, doing processing, and inserting at once should be faster, instead of doing it one by one. (How would this be done in practice?)
```
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product|
itemid = product.attributes['id'].to_i
itemprice = product.elements['price'].text.to_i
item = Item.find_by_id(itemid)
if item.nil?
skips += 1
next
end
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
What I'm looking for is ways to optimize performance.
Right now each item takes about 0,2 seconds (200ms) to process. Edit: There may be 10,000+ items in the XML file, and 10,000,000+ items in the "Items" and "Price" SQL tables.
I've been told that
-
composing my own SQL queries directly instead of using the models will be faster, since I'm doing quite a few lookups, how much performance would that lend me?
-
and selecting many rows at once, doing processing, and inserting at once should be faster, instead of doing it one by one. (How would this be done in practice?)
```
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product|
itemid = product.attributes['id'].to_i
itemprice = product.elements['price'].text.to_i
item = Item.find_by_id(itemid)
if item.nil?
skips += 1
next
end
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
Solution
As for performance, I definitely think you want to use SQL's Bulk Insert from XML functionality. I don't know how to use it from Ruby, so I'll leave that to another reviewer.
I do want to point out something else though. I had to scroll down two pages to find the end of the loop that this opens up.
So, the refactoring step that should be taken is to define a method that takes
The next step would be to break down the steps inside of the new
For example:
Instead of this:
Define this:
And call it from
I do want to point out something else though. I had to scroll down two pages to find the end of the loop that this opens up.
doc.elements.each('products') { |product|. In fact, the entirety of the code is inside of this loop. Blocks of code should fit on the screen in their entirety. I call it the "Single Page Principle". So, the refactoring step that should be taken is to define a method that takes
product as a parameter and put all of the logic necessary to handle a single doc.element inside of it.def update_product!(product)
#ALL THE CODEZ
end
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }The next step would be to break down the steps inside of the new
update_product method into smaller methods as well. A good start would be to take the code between the comments and place them inside of a method with a good name. This replaces your comment and breaks the code down into logical chunks.For example:
Instead of this:
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})Define this:
def update_price
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})
endAnd call it from
update_product!. Continuing doing this until each method has exactly one responsibility. You should end up with code that looks something like this psuedocode.def update_product(product)
update_price
update_item_stock_data
update_stock_site
update_price_lock
count += 1
# progress bar
if count%100 == 0
printf '.'
end
end
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }
puts 'Item Count: ' + count.to_s
puts 'Items skipped: ' + skips.to_sCode Snippets
def update_product!(product)
#ALL THE CODEZ
end
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})def update_price
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})
enddef update_product(product)
update_price
update_item_stock_data
update_stock_site
update_price_lock
count += 1
# progress bar
if count%100 == 0
printf '.'
end
end
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }
puts 'Item Count: ' + count.to_s
puts 'Items skipped: ' + skips.to_sContext
StackExchange Code Review Q#58483, answer score: 2
Revisions (0)
No revisions yet.