patternpythonMinor

Finding the shortest excerpt that contains all search terms

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

theshortestallsearchthatcontainsfindingtermsexcerpt

Problem

Write a function called answer(document, searchTerms) which
returns the shortest snippet of the document, containing all of the
given search terms. The search terms can appear in any order.

Inputs:
(string) document = "many google employees can program"
(string list) searchTerms = ["google", "program"]
Output:
(string) "google employees can program"

 Inputs:
(string) document = "a b c d a"
(string list) searchTerms = ["a", "c", "d"]
 Output:
(string) "c d a"

My program below is giving the correct answer but the time complexity is very high since I am doing the Cartesian product. If the input is very high then I am not able to clear to those test cases. I am not able to reduce the complexity of this program, and any help will be greatly appreciated.

import itertools

import sys

def answer(document, searchTerms):

    min = sys.maxint

    matchedString = ""

    stringList = document.split(" ")

    d = dict()

    for j in range(len(searchTerms)):

        for i in range(len(stringList)):

            if searchTerms[j] == stringList[i]:

                d.setdefault(searchTerms[j], []).append(i)

    for element in itertools.product(*d.values()):

        sortedList = sorted(list(element))

        differenceList = [t - s for s, t in zip(sortedList, sortedList[1:])]

       if min > sum(differenceList):

          min = sum(differenceList)
          sortedElement = sortedList

          if sum(differenceList) == len(sortedElement) - 1:
            break

    try:
        for i in range(sortedElement[0], sortedElement[len(sortedElement)-1]+1):

            matchedString += "".join(stringList[i]) + " "

    except:
        pass

    return matchedString

If anyone wants to clone my program here is code

Solution

Pythonicity

Your solution feels very procedural and not really Pythonic. I believe that this function accomplishes the same task using roughly the same approach:

from itertools import product

def answer(document, search_terms):
    words = document.split(" ")

    search_term_positions = [
        [i for i, word in enumerate(words) if word == term] 
        for term in search_terms
    ]

    combos = [sorted(combo) for combo in product(*search_term_positions)]
    if not combos:
        return None    # Could not find all search terms
    best_combo = min(combos, key=lambda combo: combo[-1] - combo[0])

    return " ".join(words[best_combo[0] : best_combo[-1] + 1])

Observations:

d is a variable name that suggests that it's a dict, but is not much more helpful than that. Instead of d.values(), I've defined search_term_positions, which is a much more descriptive variable name. You don't really even need a dictionary, since you only care about the values, not the keys.

You can simplify the second for loop by taking full advantage of the built-in min() function.

-
You wrote sum(differenceList) three times! sum(differenceList) is just sortedList[-1] - sortedList[0], because

$$(n_1 - n_0) + (n_2 - n_1) + (n_3 - n_2) + \ldots + (n_N - n_{N - 1})$$

is a telescoping series where all of the terms cancel except \$-n_0\$ and \$n_N\$.

sortedElement[len(sortedElement)-1] can be written as sortedElement[-1].

Swallowing all exceptions with except: pass is a no-no.

Bugs

This function only recognizes words that are delimited by spaces. Any other punctuation could cause words to fail to be recognized.

Your algorithm works based on word count rather than by character positions. Therefore, you end up finding the excerpt with the fewest intervening words, which is not necessarily what I would consider to be the "shortest snippet". Example:

>>> answer('many google supercalifragilistic program to google a program', ['google', 'program'])
'google supercalifragilistic program'

I've also written a different solution to solve the same problem. There, as here, I've made no attempt to choose a speedy algorithm, but rather aimed for clarity.

Code Snippets

from itertools import product

def answer(document, search_terms):
    words = document.split(" ")

    search_term_positions = [
        [i for i, word in enumerate(words) if word == term] 
        for term in search_terms
    ]

    combos = [sorted(combo) for combo in product(*search_term_positions)]
    if not combos:
        return None    # Could not find all search terms
    best_combo = min(combos, key=lambda combo: combo[-1] - combo[0])

    return " ".join(words[best_combo[0] : best_combo[-1] + 1])

>>> answer('many google supercalifragilistic program to google a program', ['google', 'program'])
'google supercalifragilistic program'

Context

StackExchange Code Review Q#102351, answer score: 4

Revisions (0)

No revisions yet.