HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Generating a complicated data structure in a more efficient manner

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
moreefficientmannergeneratingstructuredatacomplicated

Problem

I need to convert some CSV data into a format recognizable by some old script my work uses. Changing the format is pretty much out of the question. So I've written this:

$rate_plan_terms = [
  "pp",
  "01",
  "12",
  "24",
  "36"
]

def generate(fname)
  puts "generating doc for #{fname}"
  file = File.open(fname)
  standard_offer_hash = {}
  features_hash = {}

  file.readlines[1..-1].each do |row|
    # standard_offers
    _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.split(",")
#used? no   yes        no          yes         kinda        yes           yes             yes

    standard_offer_hash[product_cid] ||= {}  
    standard_offer_hash[product_cid][:offerType] = product_name # just keep it as product_cid?
    standard_offer_hash[product_cid][:ratePlan] ||= []

    $rate_plan_terms.each do |term|
      rate_plan = {
        :ratePlanName       => offer_name,
        :ratePlanCode       => offer_cid,
        :ratePlanTerm       => term,
        :offerEffectiveDate => "",
        :ratePlanMRC        => ""
      }

      standard_offer_hash[product_cid][:ratePlan]  component_name,
      :featureCode          => component_cid,
      :featureType          => "FEATURE",
      :featureTerm          => "",
      :featureEffectiveDate => "",
      :featureMRC           => ""
    }

  end
  feature_arr = features_hash.map {|k,f| f}
  standard_offer_arr = standard_offer_hash.map {|k,o| o}
  product_arr = []

  pricelist = {
    :features => { :feature => feature_arr }
    :ChannelOrgType => "OCM",
    :Region => "ALL",
    :product => product_arr,
    :businessRelationship => "",
    :standardOffer => standard_offer_arr,
  }

  {:PriceList => pricelist}
end


Yeah, complicated. This is running on up to a million lines per file, usually two files, every day. So it needs to be faster than it currently is.

FYI, the CSV is loosely separated into many products belong to many offers, and products have many

Solution

-
You open the file, but you don't explicitly close it. Bad karma.

-
You read all lines into memory, it seems, rather than go through them one by one. Memory-wise that's pretty inefficient.

-
Any reason for $rate_plan_terms to be global? That too seems like bad karma. A constant would make more sense, I think.

-
Have you considered using Ruby's bundled CSV parser? It'll be more robust.

-
It'd be easier to just use Hash#values rather than map { |k, v| v }

As for the use of include?, you skip that by making :rate_plan a hash, and use a key like "#{offer_name}#{offer_cid}". Simpler to explain in code, so see below.

require "csv"

def generate(fname)
  puts "generating doc for #{fname}"

  terms = %w{pp 01 12 24 36}.freeze # just made this local here, a const would still be better
  offers = {}
  features = {}

  CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
    # If the CSV file has a proper header line, you can also access cells by their name
    _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields

    offers[product_cid] ||= {
      offerType: product_name
      ratePlan: {}
    }

    key = "#{offer_name}#{offer_cid}"
    offers[product_cid][:ratePlan][key] ||= terms.map do |term|
      {
        ratePlanName: offer_name,
        ratePlanCode: offer_cid,
        ratePlanTerm: term,
        offerEffectiveDate: "",
        ratePlanMRC: ""
      }
    end

    features[component_cid] ||= {
      featureName: component_name,
      featureCode: component_cid,
      featureType: "FEATURE",
      featureTerm: "",
      featureEffectiveDate: "",
      featureMRC: ""
    }
  end

  offsers.each do |key, offer|
    offer[:ratePlan] = offer[:ratePlan].values
  end

  {
    PriceList: {
      features: { feature: features.values }
      ChannelOrgType: "OCM",
      Region: "ALL",
      product: [],
      businessRelationship: "",
      standardOffer: offers.values
    }
  }
end


Still not pretty (it's a tad over the 5-10 line limit you might strive for for methods in Ruby).

It might be nice to break things into methods like feature_hash(name, cid), which would generate the separate hashes.

require "csv"

TERMS = %w{pp 01 12 24 36}.freeze

def generate(fname)
  puts "generating doc for #{fname}"

  offers = {}
  features = {}

  CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
    # If the CSV file has a proper header line, you can also access cells by their name
    _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields

    offers[product_cid] ||= {
      offerType: product_name
      ratePlan: {}
    }

    key = "#{offer_name}#{offer_cid}"
    offers[product_cid][:ratePlan][key] ||= rate_plan_array(offer_name, offer_cid)

    features[component_cid] ||= feature_hash(component_name, component_cid)
  end

  offsers.each do |key, offer|
    offer[:ratePlan] = offer[:ratePlan].values
  end

  {
    PriceList: {
      features: { feature: features.values }
      ChannelOrgType: "OCM",
      Region: "ALL",
      product: [],
      businessRelationship: "",
      standardOffer: offers.values
    }
  }
end

def rate_plan_array(name, cid)
  TERMS.map do |term|
    {
      ratePlanName: offer_name,
      ratePlanCode: offer_cid,
      ratePlanTerm: term,
      offerEffectiveDate: "",
      ratePlanMRC: ""
    }
  end
end

def feature_hash(name, cid)
  {
    featureName: name,
    featureCode: cid,
    featureType: "FEATURE",
    featureTerm: "",
    featureEffectiveDate: "",
    featureMRC: ""
  }
end


Either way you cut it, though, it probably won't win any beauty contests.

Code Snippets

require "csv"

def generate(fname)
  puts "generating doc for #{fname}"

  terms = %w{pp 01 12 24 36}.freeze # just made this local here, a const would still be better
  offers = {}
  features = {}

  CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
    # If the CSV file has a proper header line, you can also access cells by their name
    _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields

    offers[product_cid] ||= {
      offerType: product_name
      ratePlan: {}
    }

    key = "#{offer_name}#{offer_cid}"
    offers[product_cid][:ratePlan][key] ||= terms.map do |term|
      {
        ratePlanName: offer_name,
        ratePlanCode: offer_cid,
        ratePlanTerm: term,
        offerEffectiveDate: "",
        ratePlanMRC: ""
      }
    end

    features[component_cid] ||= {
      featureName: component_name,
      featureCode: component_cid,
      featureType: "FEATURE",
      featureTerm: "",
      featureEffectiveDate: "",
      featureMRC: ""
    }
  end

  offsers.each do |key, offer|
    offer[:ratePlan] = offer[:ratePlan].values
  end

  {
    PriceList: {
      features: { feature: features.values }
      ChannelOrgType: "OCM",
      Region: "ALL",
      product: [],
      businessRelationship: "",
      standardOffer: offers.values
    }
  }
end
require "csv"

TERMS = %w{pp 01 12 24 36}.freeze

def generate(fname)
  puts "generating doc for #{fname}"

  offers = {}
  features = {}

  CSV.foreach(fname, headers: true, skip_blanks: true) do |row|
    # If the CSV file has a proper header line, you can also access cells by their name
    _region, offer_cid, _offer_code, offer_name, product_cid, product_name, component_name, component_cid = *row.fields

    offers[product_cid] ||= {
      offerType: product_name
      ratePlan: {}
    }

    key = "#{offer_name}#{offer_cid}"
    offers[product_cid][:ratePlan][key] ||= rate_plan_array(offer_name, offer_cid)

    features[component_cid] ||= feature_hash(component_name, component_cid)
  end

  offsers.each do |key, offer|
    offer[:ratePlan] = offer[:ratePlan].values
  end

  {
    PriceList: {
      features: { feature: features.values }
      ChannelOrgType: "OCM",
      Region: "ALL",
      product: [],
      businessRelationship: "",
      standardOffer: offers.values
    }
  }
end

def rate_plan_array(name, cid)
  TERMS.map do |term|
    {
      ratePlanName: offer_name,
      ratePlanCode: offer_cid,
      ratePlanTerm: term,
      offerEffectiveDate: "",
      ratePlanMRC: ""
    }
  end
end

def feature_hash(name, cid)
  {
    featureName: name,
    featureCode: cid,
    featureType: "FEATURE",
    featureTerm: "",
    featureEffectiveDate: "",
    featureMRC: ""
  }
end

Context

StackExchange Code Review Q#68094, answer score: 3

Revisions (0)

No revisions yet.