patternrubyMinor
Generating a complicated data structure in a more efficient manner
Viewed 0 times
moreefficientmannergeneratingstructuredatacomplicated
Problem
I need to convert some CSV data into a format recognizable by some old script my work uses. Changing the format is pretty much out of the question. So I've written this:
Yeah, complicated. This is running on up to a million lines per file, usually two files, every day. So it needs to be faster than it currently is.
FYI, the CSV is loosely separated into many products belong to many offers, and products have many
$rate_plan_terms = [
"pp",
"01",
"12",
"24",
"36"
]
def generate(fname)
puts "generating doc for #{fname}"
file = File.open(fname)
standard_offer_hash = {}
features_hash = {}
file.readlines[1..-1].each do |row|
# standard_offers
_region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.split(",")
#used? no yes no yes kinda yes yes yes
standard_offer_hash[product_cid] ||= {}
standard_offer_hash[product_cid][:offerType] = product_name # just keep it as product_cid?
standard_offer_hash[product_cid][:ratePlan] ||= []
$rate_plan_terms.each do |term|
rate_plan = {
:ratePlanName => offer_name,
:ratePlanCode => offer_cid,
:ratePlanTerm => term,
:offerEffectiveDate => "",
:ratePlanMRC => ""
}
standard_offer_hash[product_cid][:ratePlan] component_name,
:featureCode => component_cid,
:featureType => "FEATURE",
:featureTerm => "",
:featureEffectiveDate => "",
:featureMRC => ""
}
end
feature_arr = features_hash.map {|k,f| f}
standard_offer_arr = standard_offer_hash.map {|k,o| o}
product_arr = []
pricelist = {
:features => { :feature => feature_arr }
:ChannelOrgType => "OCM",
:Region => "ALL",
:product => product_arr,
:businessRelationship => "",
:standardOffer => standard_offer_arr,
}
{:PriceList => pricelist}
endYeah, complicated. This is running on up to a million lines per file, usually two files, every day. So it needs to be faster than it currently is.
FYI, the CSV is loosely separated into many products belong to many offers, and products have many
Solution
-
You open the file, but you don't explicitly close it. Bad karma.
-
You read all lines into memory, it seems, rather than go through them one by one. Memory-wise that's pretty inefficient.
-
Any reason for
-
Have you considered using Ruby's bundled CSV parser? It'll be more robust.
-
It'd be easier to just use
As for the use of
Still not pretty (it's a tad over the 5-10 line limit you might strive for for methods in Ruby).
It might be nice to break things into methods like
Either way you cut it, though, it probably won't win any beauty contests.
You open the file, but you don't explicitly close it. Bad karma.
-
You read all lines into memory, it seems, rather than go through them one by one. Memory-wise that's pretty inefficient.
-
Any reason for
$rate_plan_terms to be global? That too seems like bad karma. A constant would make more sense, I think.-
Have you considered using Ruby's bundled CSV parser? It'll be more robust.
-
It'd be easier to just use
Hash#values rather than map { |k, v| v }As for the use of
include?, you skip that by making :rate_plan a hash, and use a key like "#{offer_name}#{offer_cid}". Simpler to explain in code, so see below.require "csv"
def generate(fname)
puts "generating doc for #{fname}"
terms = %w{pp 01 12 24 36}.freeze # just made this local here, a const would still be better
offers = {}
features = {}
CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
# If the CSV file has a proper header line, you can also access cells by their name
_region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
offers[product_cid] ||= {
offerType: product_name
ratePlan: {}
}
key = "#{offer_name}#{offer_cid}"
offers[product_cid][:ratePlan][key] ||= terms.map do |term|
{
ratePlanName: offer_name,
ratePlanCode: offer_cid,
ratePlanTerm: term,
offerEffectiveDate: "",
ratePlanMRC: ""
}
end
features[component_cid] ||= {
featureName: component_name,
featureCode: component_cid,
featureType: "FEATURE",
featureTerm: "",
featureEffectiveDate: "",
featureMRC: ""
}
end
offsers.each do |key, offer|
offer[:ratePlan] = offer[:ratePlan].values
end
{
PriceList: {
features: { feature: features.values }
ChannelOrgType: "OCM",
Region: "ALL",
product: [],
businessRelationship: "",
standardOffer: offers.values
}
}
endStill not pretty (it's a tad over the 5-10 line limit you might strive for for methods in Ruby).
It might be nice to break things into methods like
feature_hash(name, cid), which would generate the separate hashes.require "csv"
TERMS = %w{pp 01 12 24 36}.freeze
def generate(fname)
puts "generating doc for #{fname}"
offers = {}
features = {}
CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
# If the CSV file has a proper header line, you can also access cells by their name
_region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
offers[product_cid] ||= {
offerType: product_name
ratePlan: {}
}
key = "#{offer_name}#{offer_cid}"
offers[product_cid][:ratePlan][key] ||= rate_plan_array(offer_name, offer_cid)
features[component_cid] ||= feature_hash(component_name, component_cid)
end
offsers.each do |key, offer|
offer[:ratePlan] = offer[:ratePlan].values
end
{
PriceList: {
features: { feature: features.values }
ChannelOrgType: "OCM",
Region: "ALL",
product: [],
businessRelationship: "",
standardOffer: offers.values
}
}
end
def rate_plan_array(name, cid)
TERMS.map do |term|
{
ratePlanName: offer_name,
ratePlanCode: offer_cid,
ratePlanTerm: term,
offerEffectiveDate: "",
ratePlanMRC: ""
}
end
end
def feature_hash(name, cid)
{
featureName: name,
featureCode: cid,
featureType: "FEATURE",
featureTerm: "",
featureEffectiveDate: "",
featureMRC: ""
}
endEither way you cut it, though, it probably won't win any beauty contests.
Code Snippets
require "csv"
def generate(fname)
puts "generating doc for #{fname}"
terms = %w{pp 01 12 24 36}.freeze # just made this local here, a const would still be better
offers = {}
features = {}
CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
# If the CSV file has a proper header line, you can also access cells by their name
_region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
offers[product_cid] ||= {
offerType: product_name
ratePlan: {}
}
key = "#{offer_name}#{offer_cid}"
offers[product_cid][:ratePlan][key] ||= terms.map do |term|
{
ratePlanName: offer_name,
ratePlanCode: offer_cid,
ratePlanTerm: term,
offerEffectiveDate: "",
ratePlanMRC: ""
}
end
features[component_cid] ||= {
featureName: component_name,
featureCode: component_cid,
featureType: "FEATURE",
featureTerm: "",
featureEffectiveDate: "",
featureMRC: ""
}
end
offsers.each do |key, offer|
offer[:ratePlan] = offer[:ratePlan].values
end
{
PriceList: {
features: { feature: features.values }
ChannelOrgType: "OCM",
Region: "ALL",
product: [],
businessRelationship: "",
standardOffer: offers.values
}
}
endrequire "csv"
TERMS = %w{pp 01 12 24 36}.freeze
def generate(fname)
puts "generating doc for #{fname}"
offers = {}
features = {}
CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
# If the CSV file has a proper header line, you can also access cells by their name
_region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields
offers[product_cid] ||= {
offerType: product_name
ratePlan: {}
}
key = "#{offer_name}#{offer_cid}"
offers[product_cid][:ratePlan][key] ||= rate_plan_array(offer_name, offer_cid)
features[component_cid] ||= feature_hash(component_name, component_cid)
end
offsers.each do |key, offer|
offer[:ratePlan] = offer[:ratePlan].values
end
{
PriceList: {
features: { feature: features.values }
ChannelOrgType: "OCM",
Region: "ALL",
product: [],
businessRelationship: "",
standardOffer: offers.values
}
}
end
def rate_plan_array(name, cid)
TERMS.map do |term|
{
ratePlanName: offer_name,
ratePlanCode: offer_cid,
ratePlanTerm: term,
offerEffectiveDate: "",
ratePlanMRC: ""
}
end
end
def feature_hash(name, cid)
{
featureName: name,
featureCode: cid,
featureType: "FEATURE",
featureTerm: "",
featureEffectiveDate: "",
featureMRC: ""
}
endContext
StackExchange Code Review Q#68094, answer score: 3
Revisions (0)
No revisions yet.