HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Performing an algebraic regression on syntax trees

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
performingtreessyntaxregressionalgebraic

Problem

I am quite new to Clojure and would need some advice on the following genetic programming gist I wrote as my first working clj program. The aim of it is to perform an algebraic regression using genetic programming on syntax trees with +, - and x, 1 genes to fit \$3x + 1\$ function.

The code works, but I doubt it does so in the most elegant way available. More than understanding of the actual algorithm, I would appreciate any advice on improvement of the coding style or some shortcuts to avoid hard coding present in the code (for example, in the bracket counting in crossover function?)

```
; Algebraic regression test v.1
;
; This is a simple GP programming trial.
; Motivated by: Introduction to Genetic Programing by M. Walker

; As an example, perform a fit through the following
; set of points 3x + 1
(def to-fit '([0, 1] [1, 4]))

; The following is an implementation of Koza's growth
; Genes to use are (di-arity functions only),
; terminals are specified as expressions for simpler parsing.
(def genes {:functions ["+", "-"], :terminals ["(+ 0 1)", "(+ 0 x)"]})

;; Function to reproduce:
;; Koza's growth:
(def settings {:terminal-probability 0.3 ; probability a Koza growth node will be terminal
:max-depth 5
:init-population-size 100})

;; Recursive definition of the 2-ary growth
(defn grow-rec [depth]
(if (and (> (rand) (:terminal-probability settings)) ( new 2 individuals
; 90% of the population!
; Randomly select a subtrees from both individuals & swap

;; Select a left bracket at random:
(defn get-left-bracket-at-random [tree]
(loop [index 0
max-index 0
local-male tree
max-rand 0]

(if (= (first local-male) \( )
(let [r (rand)]
(if (> r max-rand)
(recur (inc index) index (rest local-male) r)
(recur (inc index) max-index (rest local-male) max-rand)))

(if (= nil (next local-male))
max-index
(recur (inc index) max-index (rest local-male)

Solution

Most glaring problem is abuse of def and dotimes as previously noted.

Rule of thumb:

-
If you are calling def in a function body, you are doing it wrong.

-
If you are redefining a variable at all, you are doing it wrong.

But as you are reassigning the vars you will not be able to convert them to lets. But using reduce you won't need temporary variables.

For example:

(defn fitness-test [element]
    (def weight 0)
    (dotimes [n (count to-fit)]
      (let [point (nth to-fit n)]
        (def x (first point)) ; defined as global binding ...
        (def dist (- (eval (read-string element)) (last point)))
        (def weight (+ weight (* dist dist))))
      )
  {:weight weight :std-weight (/ 1.0 (+ 1.0 weight)) :expression element})


can be rewritten like this:

(defn sum-sq-err [f to-fit]
  (let [errors (for [[x y] to-fit] (- (f x) y))]
        (reduce + (map #(* % %) errors))))

(defn make-function [exp-str]
  (eval (read-string (str "(fn [x]" element ")"))))

(defn fitness-test [element]
  (let [f (make-function element)
        weight (sum-sq-err f to-fit)]
    {:weight weight 
     :std-weight (/ 1.0 (+ 1.0 weight)) 
     :expression element}))


Some remarks:

-
Now obtaining fitness function from the string genome (element) is logically separated.

-
This calculation is done once, and not for each data point in the to-fit.

-
No global var defined, no def abuse.

-
(eval (read-string (str "(fn [x]" element ")"))) can be equivalently written as

(let [params '[x] body (read-string element)] (eval (list 'fn params body)))


This version will be more useful if the names and number of parameters may change; or you start manipulating clojure lists instead of strings, you just pass in such lists as body.

-
Problem decomposed into smaller, easy-to-understand functions.

Similarly this:

(def population [])
(dotimes [x (:init-population-size settings)]
  (def population (conj population (full-rec (+ 1 x)))) ;; do the looping nicely?
  (def population (conj population (grow-rec (+ 1 x)))))


can be rewritten as:

(defn make-population [size]
  (apply concat
         (for [n (range size)]
           [(full-rec (inc n)) (grow-rec (inc n))])))


A good rule of thumb: If it is not constant, or if it is not data it is a function. Think defn not def.

Do not make your functions depend on global vars, including settings. Use Hollywood Principle to reduce coupling between your functions.

You can break dependency of :

(defn grow-rec [depth]
  (if (and (> (rand) (:terminal-probability settings)) (< depth (:max-depth settings)))

    ; True, grow further:
    (str "(" (rand-nth (:functions genes)) \space
         (grow-rec (inc depth)) \space
         (grow-rec (inc depth)) ")" )

    ; False, stop growing
    (str (rand-nth (:terminals genes)))))


on settings like this:

(defn grow-rec [terminal-probability max-depth functions terminals depth]
  (letfn [(f [d] 
             (if (and (> (rand) terminal-probability) (< depth max-depth))
               (str "(" (rand-nth functions) \space
                    (f (inc d)) \space
                    (f (inc d)) ")" )
               (str (rand-nth terminals))))] 
    (f depth))


You can use partial, if you need to convert a function that takes many configuration parameters to a function that takes fewer parameters (to pass it as an argument for example).

Style note: Never leave )s on a line by themselves.

Code Snippets

(defn fitness-test [element]
    (def weight 0)
    (dotimes [n (count to-fit)]
      (let [point (nth to-fit n)]
        (def x (first point)) ; defined as global binding ...
        (def dist (- (eval (read-string element)) (last point)))
        (def weight (+ weight (* dist dist))))
      )
  {:weight weight :std-weight (/ 1.0 (+ 1.0 weight)) :expression element})
(defn sum-sq-err [f to-fit]
  (let [errors (for [[x y] to-fit] (- (f x) y))]
        (reduce + (map #(* % %) errors))))

(defn make-function [exp-str]
  (eval (read-string (str "(fn [x]" element ")"))))

(defn fitness-test [element]
  (let [f (make-function element)
        weight (sum-sq-err f to-fit)]
    {:weight weight 
     :std-weight (/ 1.0 (+ 1.0 weight)) 
     :expression element}))
(let [params '[x] body (read-string element)] (eval (list 'fn params body)))
(def population [])
(dotimes [x (:init-population-size settings)]
  (def population (conj population (full-rec (+ 1 x)))) ;; do the looping nicely?
  (def population (conj population (grow-rec (+ 1 x)))))
(defn make-population [size]
  (apply concat
         (for [n (range size)]
           [(full-rec (inc n)) (grow-rec (inc n))])))

Context

StackExchange Code Review Q#64253, answer score: 3

Revisions (0)

No revisions yet.