patternpythonMinor
High execution time to count overlapping substrings
Viewed 0 times
substringshightimeoverlappingcountexecution
Problem
I was doing this HackerRank problem which basically boils down to counting overlapping substrings in a string. I used this solution from StackOverflow to build this program -
I've used a dictionary
My problem is that this program takes a very long time and exceeds the time limit on most of the test cases. I do think that this algorithm is CORRECT, since for the few test cases where it doesn't time out, it gives the correct answer. So I'm just looking for a way to speed up the pr
def overlapping_occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
def find_health(start, end, d, genes, gene_health):
hv = 0
beneficial = set(genes[start:end+1])
for b in beneficial:
# b is just a single letter, so no possibilities of overlapping
if len(b) == 1:
occurrences = d.count(b)
else:
occurrences = overlapping_occurrences(d, b)
total_value = sum(gene_health[b])
hv += total_value*occurrences
return hv
if __name__ == '__main__':
n = int(raw_input())
genes = list(raw_input().split())
health = list(raw_input().split())
gene_health = {}
for idx in xrange(n):
try:
gene_health[genes[idx]].append(int(health[idx]))
except:
gene_health[genes[idx]] = []
gene_health[genes[idx]].append(int(health[idx]))
s = int(raw_input())
health_values = []
for _ in xrange(s):
start, end, d = raw_input().split()
health_values.append(find_health(int(start), int(end), d, genes, gene_health))
low = min(health_values)
high = max(health_values)
print "%d %d" % (low, high)I've used a dictionary
gene_health because as the first example in the problem description shows, a single gene can have multiple different values, so I store all those values in a list which I sum over to get the total_value. My problem is that this program takes a very long time and exceeds the time limit on most of the test cases. I do think that this algorithm is CORRECT, since for the few test cases where it doesn't time out, it gives the correct answer. So I'm just looking for a way to speed up the pr
Solution
overlapping_occurrences has \$O(MN)\$ complexity where \$M\$ and \$N\$ are the sizes of the string and the substring respectively. You can reduce it to \$O(M+N)\$ using KMP algorithm. However, looking at the constraints, I feel that even that might not be sufficient - the highest score for the problem is only 50.Context
StackExchange Code Review Q#158388, answer score: 6
Revisions (0)
No revisions yet.