HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

High execution time to count overlapping substrings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
substringshightimeoverlappingcountexecution

Problem

I was doing this HackerRank problem which basically boils down to counting overlapping substrings in a string. I used this solution from StackOverflow to build this program -

def overlapping_occurrences(string, sub):
    count = start = 0
    while True:
        start = string.find(sub, start) + 1
        if start > 0:
            count+=1
        else:
            return count

def find_health(start, end, d, genes, gene_health):
    hv = 0
    beneficial = set(genes[start:end+1])
    for b in beneficial:
        # b is just a single letter, so no possibilities of overlapping
        if len(b) == 1:
            occurrences = d.count(b)
        else:
            occurrences = overlapping_occurrences(d, b)
        total_value = sum(gene_health[b])
        hv += total_value*occurrences
    return hv

if __name__ == '__main__':
    n = int(raw_input())
    genes = list(raw_input().split())
    health = list(raw_input().split())
    gene_health = {}
    for idx in xrange(n):
        try:
            gene_health[genes[idx]].append(int(health[idx]))
        except:
            gene_health[genes[idx]] = []
            gene_health[genes[idx]].append(int(health[idx]))
    s = int(raw_input())
    health_values = []
    for _ in xrange(s):
        start, end, d = raw_input().split()
        health_values.append(find_health(int(start), int(end), d, genes, gene_health))
    low = min(health_values)
    high = max(health_values)
    print "%d %d" % (low, high)


I've used a dictionary gene_health because as the first example in the problem description shows, a single gene can have multiple different values, so I store all those values in a list which I sum over to get the total_value.

My problem is that this program takes a very long time and exceeds the time limit on most of the test cases. I do think that this algorithm is CORRECT, since for the few test cases where it doesn't time out, it gives the correct answer. So I'm just looking for a way to speed up the pr

Solution

overlapping_occurrences has \$O(MN)\$ complexity where \$M\$ and \$N\$ are the sizes of the string and the substring respectively. You can reduce it to \$O(M+N)\$ using KMP algorithm. However, looking at the constraints, I feel that even that might not be sufficient - the highest score for the problem is only 50.

Context

StackExchange Code Review Q#158388, answer score: 6

Revisions (0)

No revisions yet.