HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Genetic algorithm for “Hello World”

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
geneticalgorithmworldforhello

Problem

This programme uses a genetic algorithm to get the string "Hello, World!". It can be summarized as follows:

  • Create a random initial population.



  • Let the best reproduce and kill the worst until we get a 'good_enough' result.



I used doctest to verify correctness.

```
import doctest
import random

def random_char(start=' ',end='|'):
"""
Returns a random char from ' ' [SPACE] to '|'.

>>> random.seed('EXAMPLE')

>>> random_char()
'v'
"""
return chr(random.randint(ord(' '),ord('|')))

def random_string(length):
"""
Returns a random string os the given length.

>>> random.seed('EXAMPLE')

>>> random_string(10)
"vG5iHyPWG'"
"""
return ''.join([random_char() for _ in range(length)])

def create_strings_pool(length,number):
"""
Creates a 'pool' (list) of number strings of the
given length.

>>> random.seed('EXAMPLE')

>>> create_strings_pool(3,4)
['vG5', 'iHy', 'PWG', "'TH"]
"""
return [random_string(length) for _ in range(number)]

def char_difference(char_1,char_2):
"""
Return the absolute difference between the ASCII values
of two chars.

>>> char_difference('a','b')
1

>>> char_difference('a','f')
5

>>> char_difference('#','A')
30
"""
return abs(ord(char_1) - ord(char_2))

def fitness(candidate,result):
"""
Detrmines how much similar the candidate string is
compared to the result. Char distance is counted:
ex. 'cbr' is more similar to 'car' than 'ccr'
because 'b' is nearer to 'a' then 'c'.

>>> fitness('Hfllo','Hello')
1

>>> fitness('! QWE','World')
218

>>> fitness('ccc','abc')
3
"""
fitness = 0
for index,candidate_char in enumerate(candidate):
result_char = result[index]
fitness += char_difference(candidate_char,result_char)
return fitness

def sort_by_fitness(population,result):
"""
Returns a list where the most fit elements are at the start.

Solution

Python comes with "batteries included", and you can simplify your code by using them. For example:

from difflib import SequenceMatcher

def fitness(candidate, result):
    """Determine how similar the candidate and result strings are.

    >>> fitness('Hfllo','Hello')
    0.8
    >>> fitness('! QWE','World')
    0.2
    >>> fitness('ccc','abc')
    0.3333333333333333

    """
    return SequenceMatcher(None, candidate, result).ratio()


Note that the ratio is always between 0 and 1, and you would need to invert the ratio or your sorting (reverse=True) as low numbers are now worse.

Rather than playing with ord and chr, I would use the constants defined by string:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;?@[\\]^_`{|}~'
>>> string.digits
'0123456789'
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'


You could define e.g.

CHARACTERS = "".join([" ", string.punctuation, string.digits, 
                      string.ascii_uppercase, string.ascii_lowercase])


then simplify your random_char function:

def random_char():
    """Provide a randomly-selected character.

    >>> random.seed(0)
    >>> for _ in range(5):
        random_char()

    'l'
    'd'
    '6'
    '\\'
    'F'

    """
    return random.choice(CHARACTERS)


crossover could also be much simpler:

def crossover(father, mother, mutation):
    """Return a combination of father and mother with random mutations.

    >>> random.seed(0)
    >>> crossover("hello", "world", 0)
    'welld'
    >>> crossover("hello", "world", 0.5)
    'wclld'
    >>> crossover("hello", "world", 1)
    'aV)Pw'

    """
    return "".join([random.choice(pair) if random.random() > mutation
                    else random_char() for pair in zip(father, mother)])


I would move the constants from main up to the top of the script, so it's easier for someone to open the script and quickly tweak them. Then you can remove main entirely, and just call genetic_process from the if __name__ == '__main__' block.

Code Snippets

from difflib import SequenceMatcher

def fitness(candidate, result):
    """Determine how similar the candidate and result strings are.

    >>> fitness('Hfllo','Hello')
    0.8
    >>> fitness('! QWE','World')
    0.2
    >>> fitness('ccc','abc')
    0.3333333333333333

    """
    return SequenceMatcher(None, candidate, result).ratio()
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> string.digits
'0123456789'
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'
CHARACTERS = "".join([" ", string.punctuation, string.digits, 
                      string.ascii_uppercase, string.ascii_lowercase])
def random_char():
    """Provide a randomly-selected character.

    >>> random.seed(0)
    >>> for _ in range(5):
        random_char()


    'l'
    'd'
    '6'
    '\\'
    'F'

    """
    return random.choice(CHARACTERS)
def crossover(father, mother, mutation):
    """Return a combination of father and mother with random mutations.

    >>> random.seed(0)
    >>> crossover("hello", "world", 0)
    'welld'
    >>> crossover("hello", "world", 0.5)
    'wclld'
    >>> crossover("hello", "world", 1)
    'aV)Pw'

    """
    return "".join([random.choice(pair) if random.random() > mutation
                    else random_char() for pair in zip(father, mother)])

Context

StackExchange Code Review Q#77915, answer score: 8

Revisions (0)

No revisions yet.