patternpythonMajor
Algorithm to transform one word to another through valid words
Viewed 0 times
wordsalgorithmonewordanothervalidthroughtransform
Problem
I have been practicing backtracking and I wanted to know how I can improve my code. For eg, I don't want to use global. Also, I am not sure if my code will work for all the cases.
# Given two words of equal length that are in a dictionary, write a method to transform one word into another word by changing only
# one letter at a time. The new word you get in each step must be in the
# dictionary.
# def transform(english_words, start, end):
# transform(english_words, 'damp', 'like')
# ['damp', 'lamp', 'limp', 'lime', 'like']
# ['damp', 'camp', 'came', 'lame', 'lime', 'like']
def is_diff_one(str1, str2):
if len(str1) != len(str2):
return False
count = 0
for i in range(0, len(str1)):
if str1[i] != str2[i]:
count = count + 1
if count == 1:
return True
return False
potential_ans = []
def transform(english_words, start, end, count):
global potential_ans
if count == 0:
count = count + 1
potential_ans = [start]
if start == end:
print potential_ans
return potential_ans
for w in english_words:
if is_diff_one(w, start) and w not in potential_ans:
potential_ans.append(w)
transform(english_words, w, end, count)
potential_ans[:-1]
return None
english_words = set(['damp', 'camp', 'came', 'lame', 'lime', 'like'])
transform(english_words, 'damp', 'lame', 0)Solution
There are four other answers here, and they all have very sensible advice. But none of them, as far as I can see, points out the terrible runtime complexity of the chosen algorithm, so I think an additional answer is necessary.
As pointed out by Barry, ths is because the line
is a mistake for:
In what follows, I'm going to assume that the bug is fixed.
The problem being solved here can be thought of as a problem on a graph, in which each word is a node of the graph, and two words are connected if they differ by one letter. Supposing that the words are cake, came, camp, dame, damp, dike, dime, lake, lame, lamp, like, lime, and limp, then the graph looks like this:
A transformation of one word to another corresponds to a path in this graph. For example one possible transformation of cake to limp corresponds to the highlighted path here:
Finding the shortest transformation between two words corresponds to finding the shortest path between the two nodes in the graph.
There are several benefits of thinking about a problem in terms of abstract data structures like graphs, including:
-
There are standard terms for describing the problem: here nodes, edges, paths.
-
There are concrete representations of the data structure that support efficient operations.
-
You can look up the best known algorithms for solving the problem, instead of having to invent your own.
With the bug noted in §1 fixed, what your program does is to generate all the simple paths (paths with no repeated nodes) in the graph of words beginning at
But this can take a very very long time:
Why does the algorithm take three-quarters of an hour to find this result? Well, the graph being searched looks like this:
The algorithm starts at bar and keeps adding nodes to its path until it can't go any further without repeating a node. For example (depending on the order in which the words are retrieved from the
After backtracking a couple of steps, it will find this path:
and so on. You'll see that it will explore every path that starts bar—bat and then visits all the —at words. There are \$ 11! = 39,916,800 \$ such paths, and only when it has explored them all will it consider the path bar—war.
Checking to see if two words with \$ m \$ letters differ by one letter takes time \$ O(m) \$, so if there are \$ n \$ words of \$ m \$ letters each, the total runtime is \$ O(n!m) \$.
It is possible to do much better than this if you use one of the standard graph search algorithms such as Dijkstra's or A*. Here's an implementation using the latter.
```
from collections import defaultdict, namedtuple
from heapq import heappush, heappop
class NotFound(Exception):
pass
def word_ladder(words, start, end):
"""Return a word ladder (a list of words each of which differs from
the last by one letter) linking start and end, using the given
collection of words. Raise NotFound if there is no ladder.
>>> words = 'card care cold cord core ward warm'.split()
>>> ' '.join(word_ladder(words, 'cold', 'warm'))
'cold cord card ward warm'
"""
# Find the neighbourhood of each word.
placeholder = object()
matches = defaultdict(list)
neighbours = defaultdict(list)
for word in words:
for i in range(len(word)):
pattern = tuple(placeholder if i == j else c
for j, c in enumerate(word))
m = matches[pattern]
m.append(word)
neighbours[word].append(m)
# A algorithm: see https://en.wikipedia.org/wiki/A_search_algorithm
# Admissible estimate of the steps to get from word to end.
def h_score(word):
return sum(a != b for a, b in zip(word, end))
# Closed set: of words visited in the search.
closed_set = set()
# Open set: search nodes that have been found but not yet
# processed. Accompanied by a min-heap of 4-tuples (f-score,
# g-score, word, previous-node) so that we can efficiently find
# the node with the smallest f-score.
Node = namedtuple('Node', 'f g word previous')
open_set = set([start])
open_heap = [Node(h_score(start), 0, start, None)]
while open_heap:
node = heappop(open_heap)
if node.word == end:
result = []
while node:
result.append(node.word)
node = node.previous
return result[::
- Bug
>>> english_words = set('bar bat cat war'.split())
>>> transform(english_words, 'bar', 'war', 0)
['bar', 'bat', 'cat', 'war']As pointed out by Barry, ths is because the line
potential_ans[:-1]is a mistake for:
potential_ans.pop()In what follows, I'm going to assume that the bug is fixed.
- Abstracting the problem
The problem being solved here can be thought of as a problem on a graph, in which each word is a node of the graph, and two words are connected if they differ by one letter. Supposing that the words are cake, came, camp, dame, damp, dike, dime, lake, lame, lamp, like, lime, and limp, then the graph looks like this:
A transformation of one word to another corresponds to a path in this graph. For example one possible transformation of cake to limp corresponds to the highlighted path here:
Finding the shortest transformation between two words corresponds to finding the shortest path between the two nodes in the graph.
There are several benefits of thinking about a problem in terms of abstract data structures like graphs, including:
-
There are standard terms for describing the problem: here nodes, edges, paths.
-
There are concrete representations of the data structure that support efficient operations.
-
You can look up the best known algorithms for solving the problem, instead of having to invent your own.
- Description and complexity of algorithm
With the bug noted in §1 fixed, what your program does is to generate all the simple paths (paths with no repeated nodes) in the graph of words beginning at
start, until it finds a path that leads to end.But this can take a very very long time:
>>> from timeit import timeit
>>> english_words = set('bar bat cat eat fat hat mat oat pat rat sat tat vat war'.split())
>>> timeit(lambda:transform(english_words, 'bar', 'war', 0), number=1)
['bar', 'war']
2732.44891167Why does the algorithm take three-quarters of an hour to find this result? Well, the graph being searched looks like this:
The algorithm starts at bar and keeps adding nodes to its path until it can't go any further without repeating a node. For example (depending on the order in which the words are retrieved from the
set), it might end up with this path:After backtracking a couple of steps, it will find this path:
and so on. You'll see that it will explore every path that starts bar—bat and then visits all the —at words. There are \$ 11! = 39,916,800 \$ such paths, and only when it has explored them all will it consider the path bar—war.
Checking to see if two words with \$ m \$ letters differ by one letter takes time \$ O(m) \$, so if there are \$ n \$ words of \$ m \$ letters each, the total runtime is \$ O(n!m) \$.
- Better algorithm
It is possible to do much better than this if you use one of the standard graph search algorithms such as Dijkstra's or A*. Here's an implementation using the latter.
```
from collections import defaultdict, namedtuple
from heapq import heappush, heappop
class NotFound(Exception):
pass
def word_ladder(words, start, end):
"""Return a word ladder (a list of words each of which differs from
the last by one letter) linking start and end, using the given
collection of words. Raise NotFound if there is no ladder.
>>> words = 'card care cold cord core ward warm'.split()
>>> ' '.join(word_ladder(words, 'cold', 'warm'))
'cold cord card ward warm'
"""
# Find the neighbourhood of each word.
placeholder = object()
matches = defaultdict(list)
neighbours = defaultdict(list)
for word in words:
for i in range(len(word)):
pattern = tuple(placeholder if i == j else c
for j, c in enumerate(word))
m = matches[pattern]
m.append(word)
neighbours[word].append(m)
# A algorithm: see https://en.wikipedia.org/wiki/A_search_algorithm
# Admissible estimate of the steps to get from word to end.
def h_score(word):
return sum(a != b for a, b in zip(word, end))
# Closed set: of words visited in the search.
closed_set = set()
# Open set: search nodes that have been found but not yet
# processed. Accompanied by a min-heap of 4-tuples (f-score,
# g-score, word, previous-node) so that we can efficiently find
# the node with the smallest f-score.
Node = namedtuple('Node', 'f g word previous')
open_set = set([start])
open_heap = [Node(h_score(start), 0, start, None)]
while open_heap:
node = heappop(open_heap)
if node.word == end:
result = []
while node:
result.append(node.word)
node = node.previous
return result[::
Code Snippets
>>> english_words = set('bar bat cat war'.split())
>>> transform(english_words, 'bar', 'war', 0)
['bar', 'bat', 'cat', 'war']potential_ans[:-1]potential_ans.pop()>>> from timeit import timeit
>>> english_words = set('bar bat cat eat fat hat mat oat pat rat sat tat vat war'.split())
>>> timeit(lambda:transform(english_words, 'bar', 'war', 0), number=1)
['bar', 'war']
2732.44891167from collections import defaultdict, namedtuple
from heapq import heappush, heappop
class NotFound(Exception):
pass
def word_ladder(words, start, end):
"""Return a word ladder (a list of words each of which differs from
the last by one letter) linking start and end, using the given
collection of words. Raise NotFound if there is no ladder.
>>> words = 'card care cold cord core ward warm'.split()
>>> ' '.join(word_ladder(words, 'cold', 'warm'))
'cold cord card ward warm'
"""
# Find the neighbourhood of each word.
placeholder = object()
matches = defaultdict(list)
neighbours = defaultdict(list)
for word in words:
for i in range(len(word)):
pattern = tuple(placeholder if i == j else c
for j, c in enumerate(word))
m = matches[pattern]
m.append(word)
neighbours[word].append(m)
# A* algorithm: see https://en.wikipedia.org/wiki/A*_search_algorithm
# Admissible estimate of the steps to get from word to end.
def h_score(word):
return sum(a != b for a, b in zip(word, end))
# Closed set: of words visited in the search.
closed_set = set()
# Open set: search nodes that have been found but not yet
# processed. Accompanied by a min-heap of 4-tuples (f-score,
# g-score, word, previous-node) so that we can efficiently find
# the node with the smallest f-score.
Node = namedtuple('Node', 'f g word previous')
open_set = set([start])
open_heap = [Node(h_score(start), 0, start, None)]
while open_heap:
node = heappop(open_heap)
if node.word == end:
result = []
while node:
result.append(node.word)
node = node.previous
return result[::-1]
open_set.remove(node.word)
closed_set.add(node.word)
g = node.g + 1
for neighbourhood in neighbours[node.word]:
for w in neighbourhood:
if w not in closed_set and w not in open_set:
next_node = Node(h_score(w) + g, g, w, node)
heappush(open_heap, next_node)
open_set.add(w)
raise NotFound("No ladder from {} to {}".format(start, end))Context
StackExchange Code Review Q#107823, answer score: 30
Revisions (0)
No revisions yet.