patternpythonMinor
Implementation of a Markov Chain
Viewed 0 times
implementationmarkovchain
Problem
I read about how markov-chains were handy at creating text-generators and wanted to give it a try in python.
I'm not sure if this is the proper way to make a markov-chain. I've left comments in the code. Any feedback would be appreciated.
I'm not sure if this is the proper way to make a markov-chain. I've left comments in the code. Any feedback would be appreciated.
import random
def Markov(text_file):
with open(text_file) as f: # provide a text-file to parse
data = f.read()
data = [i for i in data.split(' ') if i != ''] # create a list of all words
data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuation
markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
pos = 0
while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
markov[data[pos]].append(data[pos+1])
pos += 1
new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with
length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
seed = random.randint(0, len(new) - 1) # randomly pick a starting point
sentence_data = [new[start_index]] # use that word as the first word and starting point
current_word = new[start_index]
while len(sentence_data) < length_sentence:
next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
next_word = markov[current_word][next_index]
sentence_data.append(next_word)
current_word = next_word
return ' '.join([i for i in sentence_data])Solution
import random
def Markov(text_file):Python convention is to name function lowercase_with_underscores. I'd also probably have this function take a string as input rather then a filename. That way this function doesn't make assumptions about where the data is coming from
with open(text_file) as f: # provide a text-file to parse
data = f.read()data is a bit too generic. I'd call it text.
data = [i for i in data.split(' ') if i != ''] # create a list of all words
data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuationSince ''.isalpha() == False, you could easily combine these two lines
markov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
pos = 0
while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
markov[data[pos]].append(data[pos+1])
pos += 1Whenever possible, avoid iterating over indexes. In this case I'd use
for before, after in zip(data, data[1:]):
markov[before] += afterI think that's much clearer.
new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])} # create another dict for the seed to match up with[i for i in markov] can be written list(markov) and it produces a copy of the markov list. But there is no reason to making a copy here, so just pass markov directly.zip(range(len(x)), x) can be written as enumerate(x) {k:v for k,v in x} is the same as dict(x) So that whole line can be written as
new = dict(enumerate(markov))But that's a strange construct to build. Since you are indexing with numbers, it'd make more sense to have a list. An equivalent list would be
new = markov.keys()Which gives you a list of the keys
length_sentence = random.randint(15, 20) # create a random length for a sentence stopping point
seed = random.randint(0, len(new) - 1) # randomly pick a starting pointPython has a function random.randrange such that random.randrange(x) = random.randint(0, x -1) It good to use that when selecting from a range of indexes like this
sentence_data = [new[start_index]] # use that word as the first word and starting point
current_word = new[start_index]To select a random item from a list, use
random.choice, so in this case I'd usecurrent_word = random.choice(markov.keys())
while len(sentence_data) < length_sentence:Since you know how many iterations you'll need I'd use a for loop here.
next_index = random.randint(0, len(markov[current_word]) - 1) # randomly pick a word from the last words list.
next_word = markov[current_word][next_index]Instead do
next_word = random.choice(markov[current_word])sentence_data.append(next_word)
current_word = next_word
return ' '.join([i for i in sentence_data])Again, no reason to be doing this
i for i dance. Just use ' '.join(sentence_data)Code Snippets
import random
def Markov(text_file):with open(text_file) as f: # provide a text-file to parse
data = f.read()data = [i for i in data.split(' ') if i != ''] # create a list of all words
data = [i.lower() for i in data if i.isalpha()] # i've been removing punctuationmarkov = {i:[] for i in data} # i create a dict with the words as keys and empty lists as values
pos = 0
while pos < len(data) - 1: # add a word to the word-key's list if it immediately follows that word
markov[data[pos]].append(data[pos+1])
pos += 1for before, after in zip(data, data[1:]):
markov[before] += afterContext
StackExchange Code Review Q#24276, answer score: 8
Revisions (0)
No revisions yet.