HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Optimize mass import of XML to SQLServer in Ruby

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
masssqlserverxmloptimizerubyimport

Problem

I have the following Ruby code that's designed to update item, price and stock data for items in a MSSQL database. It's running on a Ruby 1.8.6/Rails 1.2.3 installation, in its own controller (for now)

What I'm looking for is ways to optimize performance.

Right now each item takes about 0,2 seconds (200ms) to process. Edit: There may be 10,000+ items in the XML file, and 10,000,000+ items in the "Items" and "Price" SQL tables.

I've been told that

-
composing my own SQL queries directly instead of using the models will be faster, since I'm doing quite a few lookups, how much performance would that lend me?

-
and selecting many rows at once, doing processing, and inserting at once should be faster, instead of doing it one by one. (How would this be done in practice?)

```
# counters for statistics
count = 0
skips = 0

# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file

# loop through products in XML file
doc.elements.each('products') { |product|
itemid = product.attributes['id'].to_i
itemprice = product.elements['price'].text.to_i

item = Item.find_by_id(itemid)
if item.nil?
skips += 1
next
end

# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save

# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s

Solution

As for performance, I definitely think you want to use SQL's Bulk Insert from XML functionality. I don't know how to use it from Ruby, so I'll leave that to another reviewer.

I do want to point out something else though. I had to scroll down two pages to find the end of the loop that this opens up. doc.elements.each('products') { |product|. In fact, the entirety of the code is inside of this loop. Blocks of code should fit on the screen in their entirety. I call it the "Single Page Principle".

So, the refactoring step that should be taken is to define a method that takes product as a parameter and put all of the logic necessary to handle a single doc.element inside of it.

def update_product!(product)
    #ALL THE CODEZ
end

# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }


The next step would be to break down the steps inside of the new update_product method into smaller methods as well. A good start would be to take the code between the comments and place them inside of a method with a good name. This replaces your comment and breaks the code down into logical chunks.

For example:

Instead of this:

# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
    if price.nil?
      price = Price.new
      price.item_id = itemid
    end
    price.price = product.elements['price'].text.to_i
    price.save

    # find + update item stock data
    product.elements.each('stocks/location') { |location|
      item_stock_location_id = location.attributes['location_code']
      item_stock_location = ItemStockLocationCount.find(:first,
                                                    :conditions => {
                                                        :item_stock_location_id => item_stock_location_id,
                                                        :item_id => itemid.to_s
                                                    })


Define this:

def update_price
    # update prices
    price = Price.find(:first, :conditions => { :item_id => itemid })
    # create new price if price not found
    if price.nil?
      price = Price.new
      price.item_id = itemid
    end
    price.price = product.elements['price'].text.to_i
    price.save

    # find + update item stock data
    product.elements.each('stocks/location') { |location|
      item_stock_location_id = location.attributes['location_code']
      item_stock_location = ItemStockLocationCount.find(:first,
                                                    :conditions => {
                                                        :item_stock_location_id => item_stock_location_id,
                                                        :item_id => itemid.to_s
                                                    })
end


And call it from update_product!. Continuing doing this until each method has exactly one responsibility. You should end up with code that looks something like this psuedocode.

def update_product(product)
    update_price
    update_item_stock_data
    update_stock_site
    update_price_lock

    count += 1
    # progress bar
    if count%100 == 0
      printf '.'
    end
end

 # counters for statistics
  count = 0
  skips = 0

  # load + parse XML
  file = File.read(XML_PATH)
  doc = REXML::Document.new file

  # loop through products in XML file
  doc.elements.each('products') { |product| update_product!(product) }

  puts 'Item Count: ' + count.to_s
  puts 'Items skipped: ' + skips.to_s

Code Snippets

def update_product!(product)
    #ALL THE CODEZ
end

# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
    if price.nil?
      price = Price.new
      price.item_id = itemid
    end
    price.price = product.elements['price'].text.to_i
    price.save

    # find + update item stock data
    product.elements.each('stocks/location') { |location|
      item_stock_location_id = location.attributes['location_code']
      item_stock_location = ItemStockLocationCount.find(:first,
                                                    :conditions => {
                                                        :item_stock_location_id => item_stock_location_id,
                                                        :item_id => itemid.to_s
                                                    })
def update_price
    # update prices
    price = Price.find(:first, :conditions => { :item_id => itemid })
    # create new price if price not found
    if price.nil?
      price = Price.new
      price.item_id = itemid
    end
    price.price = product.elements['price'].text.to_i
    price.save

    # find + update item stock data
    product.elements.each('stocks/location') { |location|
      item_stock_location_id = location.attributes['location_code']
      item_stock_location = ItemStockLocationCount.find(:first,
                                                    :conditions => {
                                                        :item_stock_location_id => item_stock_location_id,
                                                        :item_id => itemid.to_s
                                                    })
end
def update_product(product)
    update_price
    update_item_stock_data
    update_stock_site
    update_price_lock

    count += 1
    # progress bar
    if count%100 == 0
      printf '.'
    end
end

 # counters for statistics
  count = 0
  skips = 0

  # load + parse XML
  file = File.read(XML_PATH)
  doc = REXML::Document.new file

  # loop through products in XML file
  doc.elements.each('products') { |product| update_product!(product) }

  puts 'Item Count: ' + count.to_s
  puts 'Items skipped: ' + skips.to_s

Context

StackExchange Code Review Q#58483, answer score: 2

Revisions (0)

No revisions yet.