HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Distribution problem optimization

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
distributionproblemoptimization

Problem

The situation is this: I have a bunch of servers which are synced so that they all have the same data. Unfortunately, a disaster happened and the servers are all out of sync. My task is to re-sync all of the servers so that they all have the same sets of data. That is, I need to ensure that each server has a copy of every data set.

The first line of input denotes the number of servers. Following that will be one line of input for each server with a space-separated list of data set ID's. i.e, x y z. Data set id's are positive integers.

I must output a list of instructions to most optimally re-sync the servers in the format of:

#input = {
#1:[1, 3, 4],
#2:[1, 2, 3],
#3:[1, 3],
#4:[1, 4, 2]}

numCenters = int(raw_input('Number of data centers: '))

input = {}
print("Input data set information as: x y z")

# Grab dataset ID information from stdin
for x in range(1, numCenters+1):
        dataSet = raw_input("Data set %s: " % (x))
        input[x] = sorted(map(int, dataSet.split()))

#Map datasets (the numbers / dataset ID) to data centers (the individual lists) that they belong to

#New dictionary for the map
alldatasets = {}

for k,v in input.iteritems():
        for dataset in v: #
                if dataset not in alldatasets:
                        alldatasets[dataset] = [] # Make a dictionary with the key as the dataset ID,
                alldatasets[dataset].append(k) # and the value as a list of which datacenters have that value.

allsets = list(alldatasets.keys())

print("One Possible Correct Output:\n")
for Id,datacenter in input.iteritems():
        for sets in allsets: #go through every datacenter, and compare the datasets it has to a list of all datasets.
                if sets not in datacenter:
                        print("%s %s %s" % (sets, alldatasets[sets][0], Id))
print("done")


Input:

4
1 3 4
1 2 3
1 3
1 4 2


Output:

One possible correct output:

2 2 1
4 1 2
2 2 3
4 1 3
3 1 4
done


I am looking to improve the code that I have

Solution

Your code is full of variables named somethingSet — but they aren't actually sets! Why not?

This solution, which takes advantage of Python's built-in set operations, is shorter. Just being able to write have_nots = all_centers - haves is worth it.

from collections import defaultdict

num_centers = int(raw_input('Number of data centers: '))
print("Input data set information as: x y z")

all_centers = set(xrange(1, num_centers + 1))
centers_with_data = defaultdict(set)

# Grab dataset ID information from stdin
for center in range(1, num_centers + 1):
    for data_set in map(int, raw_input("Data set %s: " % (center)).split()):
        centers_with_data[data_set].add(center)

print "One possible solution:\n"

for data_set, haves in centers_with_data.iteritems():
    have_nots = all_centers - haves
    donor = next(iter(haves))   # Pick an arbitrary source
    for acceptor in have_nots:
        print "%d %d %d" % (data_set, donor, acceptor)

Code Snippets

from collections import defaultdict

num_centers = int(raw_input('Number of data centers: '))
print("Input data set information as: x y z")

all_centers = set(xrange(1, num_centers + 1))
centers_with_data = defaultdict(set)

# Grab dataset ID information from stdin
for center in range(1, num_centers + 1):
    for data_set in map(int, raw_input("Data set %s: " % (center)).split()):
        centers_with_data[data_set].add(center)

print "One possible solution:\n"

for data_set, haves in centers_with_data.iteritems():
    have_nots = all_centers - haves
    donor = next(iter(haves))   # Pick an arbitrary source
    for acceptor in have_nots:
        print "%d %d %d" % (data_set, donor, acceptor)

Context

StackExchange Code Review Q#73405, answer score: 3

Revisions (0)

No revisions yet.