debugrubyrailsMinor
Optimize the CSV Upload function in Ruby On Rails
Viewed 0 times
theuploadcsvfunctionrailsoptimizeruby
Problem
The code is working fine however, I would like to fix code after unless and can I store thread ID in the database too? Is there a way to improve performance?
```
require 'csv'
class CsvDb
class << self
def convert_save(model_name, csv_data, field_name=nil)
target_model = model_name.classify.constantize
csv_file = csv_data.read
row_headers={}
counter=0;
#Thread.new do
CSV.parse(csv_file) do |row|
if counter==0
temp=row
row_headers = Hash[temp.map.with_index.to_a]
counter +=1
next
end
unless row[row_headers["name"]].nil?
temp={}
business_type = row[row_headers["business_type_id"]]
business_type_id = business_type=='Online' ? 1: business_type=='Local' ? 2: 3
temp[:business_type_id] = business_type_id
temp[:user_id] = row[row_headers["user_id"]]
temp[:name] = row[row_headers["name"]]
temp[:country_id] = row[row_headers["country_id"]]
temp[:homepage] = row[row_headers["homepage"]] ||=""
temp[:telephone] = row[row_headers["telephone"]] ||=""
temp[:email] = row[row_headers["email"]] ||=""
temp[:address] = row[row_headers["address"]] ||=""
temp[:latitude] = row[row_headers["latitude"]]
temp[:longitude] = row[row_header
```
require 'csv'
class CsvDb
class << self
def convert_save(model_name, csv_data, field_name=nil)
target_model = model_name.classify.constantize
csv_file = csv_data.read
row_headers={}
counter=0;
#Thread.new do
CSV.parse(csv_file) do |row|
if counter==0
temp=row
row_headers = Hash[temp.map.with_index.to_a]
counter +=1
next
end
unless row[row_headers["name"]].nil?
temp={}
business_type = row[row_headers["business_type_id"]]
business_type_id = business_type=='Online' ? 1: business_type=='Local' ? 2: 3
temp[:business_type_id] = business_type_id
temp[:user_id] = row[row_headers["user_id"]]
temp[:name] = row[row_headers["name"]]
temp[:country_id] = row[row_headers["country_id"]]
temp[:homepage] = row[row_headers["homepage"]] ||=""
temp[:telephone] = row[row_headers["telephone"]] ||=""
temp[:email] = row[row_headers["email"]] ||=""
temp[:address] = row[row_headers["address"]] ||=""
temp[:latitude] = row[row_headers["latitude"]]
temp[:longitude] = row[row_header
Solution
The following code is not tested, maybe some minor changes will be necessary: I'm not 100% sure it will work as is, don't hesitate to tell me if there is anything wrong with the code
Refactoring the existing code
The first thing I did was to set the
Then I replace the imbricated ternary operations to use a
Next I made two small function to extract all the needed values, I use
At least I merge all the hashmap in the constructor of the
Bulk update
But this code, even if it has a better separation of concern and less duplication will not really speed your import process because your performance bottleneck is in the insert operation performed for each item by the
So the solution is to make a bulk update instead of saving each row separately: you'll group the insert !
To do so, you can use the https://github.com/zdennis/activerecord-import gem which will allow you to do something like this :
In this case we proceed the save operation each 100 row, which will make the import process way faster !
Getting the Thread's ID
To collect the thread's Id, I suggest to use the
Refactoring the existing code
The first thing I did was to set the
headers option to true it will enable the parsing of the first line of the CSV and instead of giving you an array for each row it will return an hashmap with keys extracted from the header. It replace the code you wrote to parse the header of the file. Then I replace the imbricated ternary operations to use a
case when expression for the business_type_id which is far more readable.Next I made two small function to extract all the needed values, I use
reduce to automatically fill a hashmap from an array of symboles.At least I merge all the hashmap in the constructor of the
target_model before saving.CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
.save()
end
def extract_required_fields(row)
[:user_id, :name, :country_id, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
.reduce({}) do |carry, item|
carry[item] = row[item]
end
end
def extract_optionals_fiels(row)
[:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
.reduce({}) do |carry, item|
carry[item] = row[item] ||= ''
end
endBulk update
But this code, even if it has a better separation of concern and less duplication will not really speed your import process because your performance bottleneck is in the insert operation performed for each item by the
save method.So the solution is to make a bulk update instead of saving each row separately: you'll group the insert !
To do so, you can use the https://github.com/zdennis/activerecord-import gem which will allow you to do something like this :
entities = []
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
entities << target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
target_model.import entities if entities.size == 100
end
#don't forget to save the remaining entities:
target_model.import entities unless entities.size == 0In this case we proceed the save operation each 100 row, which will make the import process way faster !
Getting the Thread's ID
To collect the thread's Id, I suggest to use the
inspect method of the current_thread, you'll get more information than only the thread's id but you'll be able to have all the needed informations about which thread processed the insert operation.thread_identifier = Thread.current.inspectCode Snippets
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
.save()
end
def extract_required_fields(row)
[:user_id, :name, :country_id, :free_shipping, :category_ids, :style_ids, :shipping_country_ids]
.reduce({}) do |carry, item|
carry[item] = row[item]
end
end
def extract_optionals_fiels(row)
[:homepage, :telephone, :email, :address, :facebook, :twitter, :google, :instagram, :pinterest, :ship_details]
.reduce({}) do |carry, item|
carry[item] = row[item] ||= ''
end
endentities = []
CSV.parse(csv_file, {headers: true, header_converters: :symbol}) do |row|
business_type_id = case row[:business_type_id]
when 'Online' then 1
when 'Local' then 2
else 3
end
entities << target_model.new( {business_type_id: business_type_id} + extract_required_fields(row) + extract_optionals_fiels(row) )
target_model.import entities if entities.size == 100
end
#don't forget to save the remaining entities:
target_model.import entities unless entities.size == 0thread_identifier = Thread.current.inspectContext
StackExchange Code Review Q#125464, answer score: 3
Revisions (0)
No revisions yet.