HiveBrain v1.2.0
Get Started
← Back to all entries
patternswiftMinor

Classification tree in Swift

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
swiftclassificationtree

Problem

As an effort to teach myself Swift as well as to get familiar with machine learning algorithms, I've been trying to implement common algorithms, starting with a Random Forest. This is, for the moment just one of the tree, but I have been trying to implement it just from the theory, without looking at pseudo-code, in order to really understand the process.

It was harder than I thought, due to the lack of convenience statistical and data-related functions and methods that are common in R or Python. This code seems to work and builds correct trees, although some of the methods I use, (lots of mapping...) sometimes seem a bit convoluted.

First, a node to store the splits:

import Foundation

class Node: CustomStringConvertible
{
    let isTerminal:Bool
    var value:Double? = nil
    var leftChild:Node? = nil
    var rightChild:Node? = nil
    var variable:Int? = nil
    var description: String
    var result:Int? = nil

    init(value:Double, variable:Int)
    {
        //Split node
        self.value = value
        self.isTerminal = false
        self.variable = variable
        self.description = "\(variable): \(value)"
    }

    init(result: Int)
    {
        //Terminal node
        self.result = result
        self.isTerminal = true
        self.description = "Terminal node: \(result)\n"
    }

    func addLeftChild(child:Node)
    {
        self.leftChild = child
        self.description += " L -> \(child)\n"
    }

    func addRightChild(child:Node)
    {
        self.rightChild = child
        self.description += " R -> \(child)\n"
    }

    //For prediction
    func getChild(x:Double) -> Node
    {
        if x < value
        {
            return leftChild!
        }
        else
        {
            return rightChild!
        }
    }
}


Some data taken from the famous Iris dataset:

```
let x = [[6.8,6.2,5.9,5.9,5.7,7.7,4.5,5.8,5,6.3,5.1,4.3,5.7,4.9,7],
[3,3.4,3.2,3,2.6,3,2.3,2.7,2.3,2.5,3.8,3,3.8,3,3.2],
[5.5,5.4,4.8,5.1,3.5,6.

Solution

So, let's focus on your Node class at the top. I started off, without looking at anything else, and just swiftlinting it.

You have a 55 line file with 20 violations.

Fortunately, 19 of them are autocorrectable. Seven of the violations are for opening brace placement. In Swift, we prefer our opening brace to be on the same line rather than new line. Another twelve of the violations are for your colon spacing. When declaring variables or parameters, the colon should appear next to the variable name without a space, followed by a space, and then the type.

Running

$ swiftlint autocorrect


Results in a file which looks like this:

import Foundation

class Node: CustomStringConvertible {
    let isTerminal: Bool
    var value: Double? = nil
    var leftChild: Node? = nil
    var rightChild: Node? = nil
    var variable: Int? = nil
    var description: String
    var result: Int? = nil

    init(value: Double, variable: Int) {
        //Split node
        self.value = value
        self.isTerminal = false
        self.variable = variable
        self.description = "\(variable): \(value)"
    }

    init(result: Int) {
        //Terminal node
        self.result = result
        self.isTerminal = true
        self.description = "Terminal node: \(result)\n"
    }

    func addLeftChild(child: Node) {
        self.leftChild = child
        self.description += " L -> \(child)\n"
    }

    func addRightChild(child: Node) {
        self.rightChild = child
        self.description += " R -> \(child)\n"
    }

    //For prediction
    func getChild(x: Double) -> Node {
        if x < value {
            return leftChild!
        }
        else {
            return rightChild!
        }
    }
}


But swiftlint still identifies one major error.


Node.swift:39:19: error: Variable Name Violation: Variable name should be between 3 and 40 characters long: 'x' (variable_name)

The parameter name to your getChild function is unacceptably short. x does not serve as a descriptive variable name, and it makes the code hard to read.

But... this linting was with swiftlint's default configuration, which is FAR too lax in my opinion, and egregiously, it lets you get away with force unwrapping!

If I pull in the configuration file that I use*, swiftlint finds six new serious violations.

Node.swift:14:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:22:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:38:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:3:1: error: Empty First Line Violation: There should be an empty line after a declaration (empty_first_line)
Node.swift:41:29: error: Force Unwrapping Violation: Force unwrapping should be avoided. (force_unwrapping)
Node.swift:44:30: error: Force Unwrapping Violation: Force unwrapping should be avoided. (force_unwrapping)


The first four violations here, I believe, are pretty self explanatory.

The last two are also pretty explanatory I believe... but it probably requires a lot more thinking on your part in terms of how you're going to handle your child nodes.

And that's where we can start getting into an actual discussion on the logic here.

First, let's handle the simplest problem you have...

var description: String


Anyone can edit this property. Its setter has the same scope as its getter, and really, it shouldn't be writable at all. In fact, it should simply be a computed property which is calculated when it is called.

var description: String {
    return "" // build description of the object
}


Some of your comments make it clear that you're thinking of two different sorts of objects...

init(value: Double, variable: Int) {
    //Split node


init(result: Int) {
    //Terminal node


Importantly, the getChild method doesn't even make sense to call on a terminal node. And the isTerminal property becomes completely obsolete if we just make these two different nodes into two different types.

It also allows us to get rid of all of the optional values (unless it makes sense for a split node to have just one child).

So, I'd propose something approximately looking like this...

protocol Node: CustomStringConvertible {}


class TerminalNode: Node {
    let result: Int

    var description: String {
        // build & return description
    }

    init(result: Int) {
        self.result = result
    }
}


```
class SplitNode: Node {
let value: Double
let variable: Int
let leftChild: Node
let rightChild: Node

var description: String {
// build and return description string
}

init(value: Double, variable: Int, leftChild: Node, rightChild: Node) {
self.value = value
self.variable = variable
self.leftChild = leftChild
self.rightChild = rightChild
}

func childForVal

Code Snippets

$ swiftlint autocorrect
import Foundation

class Node: CustomStringConvertible {
    let isTerminal: Bool
    var value: Double? = nil
    var leftChild: Node? = nil
    var rightChild: Node? = nil
    var variable: Int? = nil
    var description: String
    var result: Int? = nil


    init(value: Double, variable: Int) {
        //Split node
        self.value = value
        self.isTerminal = false
        self.variable = variable
        self.description = "\(variable): \(value)"
    }

    init(result: Int) {
        //Terminal node
        self.result = result
        self.isTerminal = true
        self.description = "Terminal node: \(result)\n"
    }

    func addLeftChild(child: Node) {
        self.leftChild = child
        self.description += " L -> \(child)\n"
    }

    func addRightChild(child: Node) {
        self.rightChild = child
        self.description += " R -> \(child)\n"
    }

    //For prediction
    func getChild(x: Double) -> Node {
        if x < value {
            return leftChild!
        }
        else {
            return rightChild!
        }
    }
}
var description: String
var description: String {
    return "" // build description of the object
}
init(value: Double, variable: Int) {
    //Split node

Context

StackExchange Code Review Q#124518, answer score: 6

Revisions (0)

No revisions yet.