HiveBrain v1.2.0
Get Started
← Back to all entries
debugrubyrailsMinor

Optimize the CSV Upload function in Ruby On Rails

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
theuploadcsvfunctionrailsoptimizeruby

Problem

The code is working fine however, I would like to fix code after unless and can I store thread ID in the database too? Is there a way to improve performance?

```
require 'csv'
class CsvDb
class << self
def convert_save(model_name, csv_data, field_name=nil)
target_model = model_name.classify.constantize
csv_file = csv_data.read
row_headers={}
counter=0;
#Thread.new do
CSV.parse(csv_file) do |row|
if counter==0
temp=row
row_headers = Hash[temp.map.with_index.to_a]
counter +=1
next
end
unless row[row_headers["name"]].nil?
temp={}
business_type = row[row_headers["business_type_id"]]
business_type_id = business_type=='Online' ? 1: business_type=='Local' ? 2: 3
temp[:business_type_id] = business_type_id
temp[:user_id] = row[row_headers["user_id"]]
temp[:name] = row[row_headers["name"]]
temp[:country_id] = row[row_headers["country_id"]]
temp[:homepage] = row[row_headers["homepage"]] ||=""
temp[:telephone] = row[row_headers["telephone"]] ||=""
temp[:email] = row[row_headers["email"]] ||=""
temp[:address] = row[row_headers["address"]] ||=""
temp[:latitude] = row[row_headers["latitude"]]
temp[:longitude] = row[row_header

Solution

The following code is not tested, maybe some minor changes will be necessary: I'm not 100% sure it will work as is, don't hesitate to tell me if there is anything wrong with the code

Refactoring the existing code

The first thing I did was to set the headers option to true it will enable the parsing of the first line of the CSV and instead of giving you an array for each row it will return an hashmap with keys extracted from the header. It replace the code you wrote to parse the header of the file.

Then I replace the imbricated ternary operations to use a case when expression for the business_type_id which is far more readable.

Next I made two small function to extract all the needed values, I use reduce to automatically fill a hashmap from an array of symboles.

At least I merge all the hashmap in the constructor of the target_model before saving.

CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
  business_type_id =  case row[:business_type_id]
                        when 'Online' then 1
                        when 'Local' then 2
                        else 3
                      end

  target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
              .save()

end

def extract_required_fields(row)
  [:user_id, :name, :country_id, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
    .reduce({}) do |carry, item|
      carry[item] = row[item]
  end
end

def extract_optionals_fiels(row)
  [:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
    .reduce({}) do |carry, item|
      carry[item] = row[item] ||= ''
  end
end


Bulk update

But this code, even if it has a better separation of concern and less duplication will not really speed your import process because your performance bottleneck is in the insert operation performed for each item by the save method.

So the solution is to make a bulk update instead of saving each row separately: you'll group the insert !

To do so, you can use the https://github.com/zdennis/activerecord-import gem which will allow you to do something like this :

entities = []
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
  business_type_id =  case row[:business_type_id]
                        when 'Online' then 1
                        when 'Local' then 2
                        else 3
                      end

  entities << target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
  target_model.import entities if entities.size == 100 
end

#don't forget to save the remaining entities:
target_model.import entities unless entities.size == 0


In this case we proceed the save operation each 100 row, which will make the import process way faster !

Getting the Thread's ID

To collect the thread's Id, I suggest to use the inspect method of the current_thread, you'll get more information than only the thread's id but you'll be able to have all the needed informations about which thread processed the insert operation.

thread_identifier = Thread.current.inspect

Code Snippets

CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
  business_type_id =  case row[:business_type_id]
                        when 'Online' then 1
                        when 'Local' then 2
                        else 3
                      end

  target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
              .save()

end

def extract_required_fields(row)
  [:user_id, :name, :country_id, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
    .reduce({}) do |carry, item|
      carry[item] = row[item]
  end
end

def extract_optionals_fiels(row)
  [:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
    .reduce({}) do |carry, item|
      carry[item] = row[item] ||= ''
  end
end
entities = []
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
  business_type_id =  case row[:business_type_id]
                        when 'Online' then 1
                        when 'Local' then 2
                        else 3
                      end

  entities << target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
  target_model.import entities if entities.size == 100 
end

#don't forget to save the remaining entities:
target_model.import entities unless entities.size == 0
thread_identifier = Thread.current.inspect

Context

StackExchange Code Review Q#125464, answer score: 3

Revisions (0)

No revisions yet.