HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Implementation of a Markov Chain

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationmarkovchain

Problem

I read about how markov-chains were handy at creating text-generators and wanted to give it a try in python.

I'm not sure if this is the proper way to make a markov-chain. I've left comments in the code. Any feedback would be appreciated.

import random

def Markov(text_file):

    with open(text_file) as f:    # provide a text-file to parse
        data = f.read()

    data = [i for i in data.split(' ') if i != '']     # create a list of all words 
    data = [i.lower() for i in data if i.isalpha()]    # i've been removing punctuation

    markov = {i:[] for i in data}    # i create a dict with the words as keys and empty lists as values

    pos = 0
    while pos < len(data) - 1:    # add a word to the word-key's list if it immediately follows that word
        markov[data[pos]].append(data[pos+1])
        pos += 1

   new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])}    # create another dict for the seed to match up with 

    length_sentence = random.randint(15, 20)    # create a random length for a sentence stopping point

    seed = random.randint(0, len(new) - 1)    # randomly pick a starting point

    sentence_data = [new[start_index]]     # use that word as the first word and starting point
    current_word = new[start_index]

    while len(sentence_data) < length_sentence:
        next_index = random.randint(0, len(markov[current_word]) - 1)    # randomly pick a word from the last words list.
        next_word = markov[current_word][next_index]
        sentence_data.append(next_word)
        current_word = next_word

   return ' '.join([i for i in sentence_data])

Solution

import random

def Markov(text_file):


Python convention is to name function lowercase_with_underscores. I'd also probably have this function take a string as input rather then a filename. That way this function doesn't make assumptions about where the data is coming from

with open(text_file) as f:    # provide a text-file to parse
        data = f.read()


data is a bit too generic. I'd call it text.

data = [i for i in data.split(' ') if i != '']     # create a list of all words 
    data = [i.lower() for i in data if i.isalpha()]    # i've been removing punctuation


Since ''.isalpha() == False, you could easily combine these two lines

markov = {i:[] for i in data}    # i create a dict with the words as keys and empty lists as values

    pos = 0
    while pos < len(data) - 1:    # add a word to the word-key's list if it immediately follows that word
        markov[data[pos]].append(data[pos+1])
        pos += 1


Whenever possible, avoid iterating over indexes. In this case I'd use

for before, after in zip(data, data[1:]):
       markov[before] += after


I think that's much clearer.

new = {k:v for k,v in zip(range(len(markov)), [i for i in markov])}    # create another dict for the seed to match up with


[i for i in markov] can be written list(markov) and it produces a copy of the markov list. But there is no reason to making a copy here, so just pass markov directly.

zip(range(len(x)), x) can be written as enumerate(x)

{k:v for k,v in x} is the same as dict(x)

So that whole line can be written as

new = dict(enumerate(markov))


But that's a strange construct to build. Since you are indexing with numbers, it'd make more sense to have a list. An equivalent list would be

new = markov.keys()


Which gives you a list of the keys

length_sentence = random.randint(15, 20)    # create a random length for a sentence stopping point

    seed = random.randint(0, len(new) - 1)    # randomly pick a starting point


Python has a function random.randrange such that random.randrange(x) = random.randint(0, x -1) It good to use that when selecting from a range of indexes like this

sentence_data = [new[start_index]]     # use that word as the first word and starting point
    current_word = new[start_index]


To select a random item from a list, use random.choice, so in this case I'd use

current_word = random.choice(markov.keys())

    while len(sentence_data) < length_sentence:


Since you know how many iterations you'll need I'd use a for loop here.

next_index = random.randint(0, len(markov[current_word]) - 1)    # randomly pick a word from the last words list.
        next_word = markov[current_word][next_index]


Instead do next_word = random.choice(markov[current_word])

sentence_data.append(next_word)
        current_word = next_word

   return ' '.join([i for i in sentence_data])


Again, no reason to be doing this i for i dance. Just use ' '.join(sentence_data)

Code Snippets

import random

def Markov(text_file):
with open(text_file) as f:    # provide a text-file to parse
        data = f.read()
data = [i for i in data.split(' ') if i != '']     # create a list of all words 
    data = [i.lower() for i in data if i.isalpha()]    # i've been removing punctuation
markov = {i:[] for i in data}    # i create a dict with the words as keys and empty lists as values

    pos = 0
    while pos < len(data) - 1:    # add a word to the word-key's list if it immediately follows that word
        markov[data[pos]].append(data[pos+1])
        pos += 1
for before, after in zip(data, data[1:]):
       markov[before] += after

Context

StackExchange Code Review Q#24276, answer score: 8

Revisions (0)

No revisions yet.