HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Consolidating a raw csv of two ids to grouped set of ids in json

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
consolidatingcsvidsrawtwogroupedjsonset

Problem

The file I'm processing here is an outputted file from a Java library called duke.

Its goal was go through all purchases made on a site and group them into a single customer. The output is a csv file in the format, you'll find an example of it below the code.

I want to know if I've written this in a readable way or if there are any problems with the way I've written it.

```
#!/usr/bin/env ruby

require 'thor'
require 'csv'
require 'json'

# This is the class we use to extend and use thor
class DukeFileProcessor ', 'Will process a file generated by Duke.'

# Thor dedupe command
#
# @param csv_file_path [String]
#
# @return File
def dedupe(csv_file_path)
unmerged_ids = parse_csv(csv_file_path)
merged_ids = merge_ids(unmerged_ids)

File.open(csv_file_path + '.matches.json', 'w').write(merged_ids.to_json)
end

private

# Merges the unmerged ids so they are all consolidated under a single
# group that we can call a customer
#
# Process:
# - Iterate through all unmerged_ids
# - Assign the current value of what's being iterated over to matches
# - Iterate through the current value to find matches in unmerged_ids
# and then merge matches into the current matches
# - Finally add the current key and matches to the merged_ids variable
#
# @param unmerged_ids [Array]
#
# @return File
def merge_ids(unmerged_ids)
merged_ids = {}

unmerged_ids.each_pair do |key, value|
value.each do |match_id|
next if unmerged_ids[match_id].nil?

unmerged_ids[match_id].each do |policy_id|
value.push(policy_id) unless value.include?(policy_id)
end
end

merged_ids[key] = value - [key]
end
end

# Parses the given file and does a simple merge on ids to create
# an array of all secondary_ids that match the primary_id
#
# CSV Format: (is_match?(-/+),target,match,probability)
#
# @param csv_file_path [String]
#
# @return Hash
def parse_csv

Solution

In Ruby, it's better to use string interpolation instead of adding strings together. So I'd replace csv_file_path + '.matches.json' with "#{csv_file_path}.matches.json".

Also, unmerged_ids[primary_id] = [] if unmerged_ids[primary_id].nil? can be replaced with unmerged_ids[primary_id] ||= []. That will assign unmerged_ids[primary_id] to [] only if unmerged_ids[primary_id] is false or nil. There's a good question on it here.

And one last thing is that this:

primary_id   = row[1].to_i
secondary_id = row[2].to_i


could be replaced with primary_id, secondary_id = row[1, 2].map &:to_i. There's also a good stackoverflow question on that syntax here.

Code Snippets

primary_id   = row[1].to_i
secondary_id = row[2].to_i

Context

StackExchange Code Review Q#94425, answer score: 2

Revisions (0)

No revisions yet.