HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Number of sub strings with same first and last character

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
samenumberlastwithsubcharacterfirstandstrings

Problem

Problem Statement:


You are given a function \$f(x)\$, where \$f(x)\$ is \$1\$ if the first and last characters of string \$x\$ are equal; else it is \$0\$.
You are given a string \$S\$ and you have to find the sum of \$f(x)\$ for all substrings \$x\$ of given string \$S\$.

Sample Input:

7
ababaca


Sample Output:

14


Explanation:

f("a")=1, f("aba")=1, f("abaca")=1 but f("ab")=0, f("bac")=0. Hence counting all substrings we get 14.

The 14 substring are

a - 4(times) 
b - 2 
c - 1 
aba - 2 
bab - 1 
aca - 1 
ababa - 1 
abaca - 1 
ababaca - 1


My code:

l = int(input())
s = [x for x in input().strip()]
print(l + sum([1 for i in range(0,l-1) for j in range(i+1,l) if s[i] == s[j]]))


I have tried the following snippets too:

Timeout:

from collections import Counter
import math
l, s = int(input()), [x for x in input()]
print(l + sum([math.factorial(value-1) for value in Counter(s).values() if value != 1]))


Wrong answer:

import itertools
l, s = int(input()), input()
print(sum(int(i[0] == i[-1]) for i in itertools.combinations_with_replacement(s,2)))


The solutions is working fine except that it times out in certain test cases. Can anyone suggest a better way to do this?

Solution

First review

It is almost always a good idea to put your logic in a function (or class) that can be easily documented and tested. That way, in anything goes wrong during the optimisation, you'll know it straight-away. Also, this can be useful if you need to measure your result to ensure your optimisation is an actual optimisation.

Just moving the code around, here is what I start with :

def get_nb_substring(s):
    l = len(s)
    return(l + sum([1 for i in range(0,l-1) for j in range(i+1,l) if s[i] == s[j]]))

def test_from_input():
    _ = int(input()) # useful ?
    s = [x for x in input().strip()]
    print(get_nb_substring(s))

def automatic_tests():
    assert get_nb_substring("ababaca") == 14

if __name__ == '__main__':
    automatic_tests()


Now, the actual review may start.

-
you don't need 0 as a first argument for range.

-
you usually don't need the length when you are using iterations in Python. When you do need it, it usually means you are doing something wrong.

-
for that reason, what you are doing with indices can be achieved with itertools.combinations(s, 2)

Now, the code can be rewritten :

def get_nb_substring(s):
    return len(s) + sum(1 for i, j in itertools.combinations(s, 2) if i == j)


which is still 0(n^2).

Optimisations

An idea could be to check where the different letters appear. Then if a letter c appears at positions p1, p2, ..., pn, you know it will contribute for substrings : p1, p1-p2, p1-p3, ..., p1-pn, p2,p2-p3, p2-p4, ... p2-pn, .... pn. There will be 1+2+...+n = n*(n+1)/2 such substrings.

There, if I did everything correctly, I end up with O(n) code :

def get_nb_substring(s):
    position = dict()
    for i, c in enumerate(s):
        position.setdefault(c, []).append(i)
    return sum(l*(l+1)/2 for l in (len (l) for l in position.values()))


On the test example, it seems to work. On other cases, I obtain different results than your code so I don't know which is correct.

I've used a dict to map characters to a list of position but a mapping from characters to number of positions would have been enough. This corresponds to using the Counter collection)

def get_nb_substring(s):
    return sum(v*(v+1)/2 for v in collections.Counter(s).values())

Code Snippets

def get_nb_substring(s):
    l = len(s)
    return(l + sum([1 for i in range(0,l-1) for j in range(i+1,l) if s[i] == s[j]]))

def test_from_input():
    _ = int(input()) # useful ?
    s = [x for x in input().strip()]
    print(get_nb_substring(s))

def automatic_tests():
    assert get_nb_substring("ababaca") == 14

if __name__ == '__main__':
    automatic_tests()
def get_nb_substring(s):
    return len(s) + sum(1 for i, j in itertools.combinations(s, 2) if i == j)
def get_nb_substring(s):
    position = dict()
    for i, c in enumerate(s):
        position.setdefault(c, []).append(i)
    return sum(l*(l+1)/2 for l in (len (l) for l in position.values()))
def get_nb_substring(s):
    return sum(v*(v+1)/2 for v in collections.Counter(s).values())

Context

StackExchange Code Review Q#94214, answer score: 4

Revisions (0)

No revisions yet.