patternrubyMinor
Consolidating a raw csv of two ids to grouped set of ids in json
Viewed 0 times
consolidatingcsvidsrawtwogroupedjsonset
Problem
The file I'm processing here is an outputted file from a Java library called duke.
Its goal was go through all purchases made on a site and group them into a single customer. The output is a csv file in the format, you'll find an example of it below the code.
I want to know if I've written this in a readable way or if there are any problems with the way I've written it.
```
#!/usr/bin/env ruby
require 'thor'
require 'csv'
require 'json'
# This is the class we use to extend and use thor
class DukeFileProcessor ', 'Will process a file generated by Duke.'
# Thor dedupe command
#
# @param csv_file_path [String]
#
# @return File
def dedupe(csv_file_path)
unmerged_ids = parse_csv(csv_file_path)
merged_ids = merge_ids(unmerged_ids)
File.open(csv_file_path + '.matches.json', 'w').write(merged_ids.to_json)
end
private
# Merges the unmerged ids so they are all consolidated under a single
# group that we can call a customer
#
# Process:
# - Iterate through all unmerged_ids
# - Assign the current value of what's being iterated over to matches
# - Iterate through the current value to find matches in unmerged_ids
# and then merge matches into the current matches
# - Finally add the current key and matches to the merged_ids variable
#
# @param unmerged_ids [Array]
#
# @return File
def merge_ids(unmerged_ids)
merged_ids = {}
unmerged_ids.each_pair do |key, value|
value.each do |match_id|
next if unmerged_ids[match_id].nil?
unmerged_ids[match_id].each do |policy_id|
value.push(policy_id) unless value.include?(policy_id)
end
end
merged_ids[key] = value - [key]
end
end
# Parses the given file and does a simple merge on ids to create
# an array of all secondary_ids that match the primary_id
#
# CSV Format: (is_match?(-/+),target,match,probability)
#
# @param csv_file_path [String]
#
# @return Hash
def parse_csv
Its goal was go through all purchases made on a site and group them into a single customer. The output is a csv file in the format, you'll find an example of it below the code.
I want to know if I've written this in a readable way or if there are any problems with the way I've written it.
```
#!/usr/bin/env ruby
require 'thor'
require 'csv'
require 'json'
# This is the class we use to extend and use thor
class DukeFileProcessor ', 'Will process a file generated by Duke.'
# Thor dedupe command
#
# @param csv_file_path [String]
#
# @return File
def dedupe(csv_file_path)
unmerged_ids = parse_csv(csv_file_path)
merged_ids = merge_ids(unmerged_ids)
File.open(csv_file_path + '.matches.json', 'w').write(merged_ids.to_json)
end
private
# Merges the unmerged ids so they are all consolidated under a single
# group that we can call a customer
#
# Process:
# - Iterate through all unmerged_ids
# - Assign the current value of what's being iterated over to matches
# - Iterate through the current value to find matches in unmerged_ids
# and then merge matches into the current matches
# - Finally add the current key and matches to the merged_ids variable
#
# @param unmerged_ids [Array]
#
# @return File
def merge_ids(unmerged_ids)
merged_ids = {}
unmerged_ids.each_pair do |key, value|
value.each do |match_id|
next if unmerged_ids[match_id].nil?
unmerged_ids[match_id].each do |policy_id|
value.push(policy_id) unless value.include?(policy_id)
end
end
merged_ids[key] = value - [key]
end
end
# Parses the given file and does a simple merge on ids to create
# an array of all secondary_ids that match the primary_id
#
# CSV Format: (is_match?(-/+),target,match,probability)
#
# @param csv_file_path [String]
#
# @return Hash
def parse_csv
Solution
In Ruby, it's better to use string interpolation instead of adding strings together. So I'd replace
Also,
And one last thing is that this:
could be replaced with
csv_file_path + '.matches.json' with "#{csv_file_path}.matches.json".Also,
unmerged_ids[primary_id] = [] if unmerged_ids[primary_id].nil? can be replaced with unmerged_ids[primary_id] ||= []. That will assign unmerged_ids[primary_id] to [] only if unmerged_ids[primary_id] is false or nil. There's a good question on it here.And one last thing is that this:
primary_id = row[1].to_i
secondary_id = row[2].to_icould be replaced with
primary_id, secondary_id = row[1, 2].map &:to_i. There's also a good stackoverflow question on that syntax here.Code Snippets
primary_id = row[1].to_i
secondary_id = row[2].to_iContext
StackExchange Code Review Q#94425, answer score: 2
Revisions (0)
No revisions yet.