patternpythonMinor
Perfect Hash Family Non-Distinct t Columns Calculator
Viewed 0 times
distinctcolumnsnonperfecthashfamilycalculator
Problem
Perfect Hash Families (PHFs) are a widely studied combinatorial object. Essentially, they are 4-tuples
I want to be able to count how many "non-distinct"
I plan to use this code as a metric for how "good" the array is - what I mean by this is that I want, given the number of rows
(N; t, k, v), where the PHF is an N by k array of v symbols. For any t columns chosen, there is at least one row of those chosen columns that has no symbol duplicated in that row.I want to be able to count how many "non-distinct"
t-column choices there are in a given PHF. My code is below, with some sample inputs that I made up:from itertools import combinations
def get_column(phf, c):
return [x[c] for x in phf]
# returns the PHF as if we restrict to the "cs" columns (a list)
def combine_columns_into_array(phf, cs):
r = []
for c in cs:
r.append(get_column(phf, c))
return list(map(list, zip(*r)))
# returns the number of t-column choices that have no distinct rows
def count_non_distinct_combinations(phf, t):
k = len(phf[0])
true_count = 0
# get all combinations
for c in combinations([i for i in range(k)], t):
b = False
choices = combine_columns_into_array(phf, list(c))
for row in choices:
# if the row is completely distinct
if len(row) == len(set(row)):
b = True
# if all rows are non-distinct:
if not b:
true_count += 1
return true_count
phf = [
[4, 2, 3],
[2, 1, 3],
[1, 1, 4]]
# first 2 columns have no distinct rows
phf2 = [
[4, 4, 3],
[2, 2, 3],
[1, 1, 4]]
print(count_non_distinct_combinations(phf, 2)) # 0
print(count_non_distinct_combinations(phf2, 2)) # 1I plan to use this code as a metric for how "good" the array is - what I mean by this is that I want, given the number of rows
N, try to reduce this metric as much as possible. Therefore, speed/efficiency is the most important part of any review of this code.Solution
For a start, use iterators. Don't convert to lists constantly:
In
Rather than:
To further simplify,
Combining
The
The whole
section can be simplified (I think) to:
And so to:
I admit I lost readability somewhere along the line, but this new 6 line version takes 55.3% of the time needed to complete the
get_column can be simplified to (use generator rather than list):def get_column(phf, c):
return (x[c] for x in phf)combine_columns_into_array to:def combine_columns_into_array(phf, cs):
return zip(*[get_column(phf, c) for c in cs])In
count_non_distinct_combinations:for c in combinations(range(k), t):Rather than:
for c in combinations([i for i in range(k)], t):To further simplify,
combine_columns_into_array can be merged with get_column:def combine_columns_into_array(phf, cs):
return zip(*[[x[c] for x in phf] for c in cs])Combining
combine_columns_into_array into count_non_distinct_combinations:def count_non_distinct_combinations(phf, t):
k = len(phf[0])
true_count = 0
# get all combinations
for c in combinations(range(k), t):
b = False
for row in zip(*[[x[i] for x in phf] for i in c]):
# if the row is completely distinct
if len(row) == len(set(row)):
b = True
# if all rows are non-distinct:
if not b:
true_count += 1
return true_countThe
zip(*[[x[i] for x in phf] for i in c]) can be simplified to ([x[i] for i in c] for x in phf)The whole
k = len(phf[0])
true_count = 0
# get all combinations
for c in combinations([i for i in range(k)], t):
b = False
choices = combine_columns_into_array(phf, list(c))
for row in choices:
# if the row is completely distinct
if len(row) == len(set(row)):
b = True
# if all rows are non-distinct:
if not b:
true_count += 1
return true_countsection can be simplified (I think) to:
true_count = 0
# get all combinations
for c in combinations(range(len(phf[0])), t):
s = [[x[i] for i in c] for x in phf]
# if all rows are non-distinct:
if not [len(i) for i in s] == [len(set(i)) for i in s]:
true_count += 1
return true_countAnd so to:
def count_non_distinct_combinations(phf, t):
true_count = 0
for c in combinations(range(len(phf[0])), t):
if not any(len(set(i)) == len(i) for i in ([x[i] for i in c] for x in phf)):
true_count += 1
return true_countI admit I lost readability somewhere along the line, but this new 6 line version takes 55.3% of the time needed to complete the
phf2 case, so its almost twice as fast as your version. There may be improvements that can be done to the algorithm, but I'll stop here.Code Snippets
def get_column(phf, c):
return (x[c] for x in phf)def combine_columns_into_array(phf, cs):
return zip(*[get_column(phf, c) for c in cs])for c in combinations(range(k), t):for c in combinations([i for i in range(k)], t):def combine_columns_into_array(phf, cs):
return zip(*[[x[c] for x in phf] for c in cs])Context
StackExchange Code Review Q#96807, answer score: 2
Revisions (0)
No revisions yet.