patternswiftMinor
Classification tree in Swift
Viewed 0 times
swiftclassificationtree
Problem
As an effort to teach myself Swift as well as to get familiar with machine learning algorithms, I've been trying to implement common algorithms, starting with a Random Forest. This is, for the moment just one of the tree, but I have been trying to implement it just from the theory, without looking at pseudo-code, in order to really understand the process.
It was harder than I thought, due to the lack of convenience statistical and data-related functions and methods that are common in R or Python. This code seems to work and builds correct trees, although some of the methods I use, (lots of mapping...) sometimes seem a bit convoluted.
First, a node to store the splits:
Some data taken from the famous Iris dataset:
```
let x = [[6.8,6.2,5.9,5.9,5.7,7.7,4.5,5.8,5,6.3,5.1,4.3,5.7,4.9,7],
[3,3.4,3.2,3,2.6,3,2.3,2.7,2.3,2.5,3.8,3,3.8,3,3.2],
[5.5,5.4,4.8,5.1,3.5,6.
It was harder than I thought, due to the lack of convenience statistical and data-related functions and methods that are common in R or Python. This code seems to work and builds correct trees, although some of the methods I use, (lots of mapping...) sometimes seem a bit convoluted.
First, a node to store the splits:
import Foundation
class Node: CustomStringConvertible
{
let isTerminal:Bool
var value:Double? = nil
var leftChild:Node? = nil
var rightChild:Node? = nil
var variable:Int? = nil
var description: String
var result:Int? = nil
init(value:Double, variable:Int)
{
//Split node
self.value = value
self.isTerminal = false
self.variable = variable
self.description = "\(variable): \(value)"
}
init(result: Int)
{
//Terminal node
self.result = result
self.isTerminal = true
self.description = "Terminal node: \(result)\n"
}
func addLeftChild(child:Node)
{
self.leftChild = child
self.description += " L -> \(child)\n"
}
func addRightChild(child:Node)
{
self.rightChild = child
self.description += " R -> \(child)\n"
}
//For prediction
func getChild(x:Double) -> Node
{
if x < value
{
return leftChild!
}
else
{
return rightChild!
}
}
}Some data taken from the famous Iris dataset:
```
let x = [[6.8,6.2,5.9,5.9,5.7,7.7,4.5,5.8,5,6.3,5.1,4.3,5.7,4.9,7],
[3,3.4,3.2,3,2.6,3,2.3,2.7,2.3,2.5,3.8,3,3.8,3,3.2],
[5.5,5.4,4.8,5.1,3.5,6.
Solution
So, let's focus on your
You have a 55 line file with 20 violations.
Fortunately, 19 of them are autocorrectable. Seven of the violations are for opening brace placement. In Swift, we prefer our opening brace to be on the same line rather than new line. Another twelve of the violations are for your colon spacing. When declaring variables or parameters, the colon should appear next to the variable name without a space, followed by a space, and then the type.
Running
Results in a file which looks like this:
But swiftlint still identifies one major error.
Node.swift:39:19: error: Variable Name Violation: Variable name should be between 3 and 40 characters long: 'x' (variable_name)
The parameter name to your
But... this linting was with swiftlint's default configuration, which is FAR too lax in my opinion, and egregiously, it lets you get away with force unwrapping!
If I pull in the configuration file that I use*, swiftlint finds six new serious violations.
The first four violations here, I believe, are pretty self explanatory.
The last two are also pretty explanatory I believe... but it probably requires a lot more thinking on your part in terms of how you're going to handle your child nodes.
And that's where we can start getting into an actual discussion on the logic here.
First, let's handle the simplest problem you have...
Anyone can edit this property. Its setter has the same scope as its getter, and really, it shouldn't be writable at all. In fact, it should simply be a computed property which is calculated when it is called.
Some of your comments make it clear that you're thinking of two different sorts of objects...
Importantly, the
It also allows us to get rid of all of the optional values (unless it makes sense for a split node to have just one child).
So, I'd propose something approximately looking like this...
```
class SplitNode: Node {
let value: Double
let variable: Int
let leftChild: Node
let rightChild: Node
var description: String {
// build and return description string
}
init(value: Double, variable: Int, leftChild: Node, rightChild: Node) {
self.value = value
self.variable = variable
self.leftChild = leftChild
self.rightChild = rightChild
}
func childForVal
Node class at the top. I started off, without looking at anything else, and just swiftlinting it. You have a 55 line file with 20 violations.
Fortunately, 19 of them are autocorrectable. Seven of the violations are for opening brace placement. In Swift, we prefer our opening brace to be on the same line rather than new line. Another twelve of the violations are for your colon spacing. When declaring variables or parameters, the colon should appear next to the variable name without a space, followed by a space, and then the type.
Running
$ swiftlint autocorrectResults in a file which looks like this:
import Foundation
class Node: CustomStringConvertible {
let isTerminal: Bool
var value: Double? = nil
var leftChild: Node? = nil
var rightChild: Node? = nil
var variable: Int? = nil
var description: String
var result: Int? = nil
init(value: Double, variable: Int) {
//Split node
self.value = value
self.isTerminal = false
self.variable = variable
self.description = "\(variable): \(value)"
}
init(result: Int) {
//Terminal node
self.result = result
self.isTerminal = true
self.description = "Terminal node: \(result)\n"
}
func addLeftChild(child: Node) {
self.leftChild = child
self.description += " L -> \(child)\n"
}
func addRightChild(child: Node) {
self.rightChild = child
self.description += " R -> \(child)\n"
}
//For prediction
func getChild(x: Double) -> Node {
if x < value {
return leftChild!
}
else {
return rightChild!
}
}
}But swiftlint still identifies one major error.
Node.swift:39:19: error: Variable Name Violation: Variable name should be between 3 and 40 characters long: 'x' (variable_name)
The parameter name to your
getChild function is unacceptably short. x does not serve as a descriptive variable name, and it makes the code hard to read.But... this linting was with swiftlint's default configuration, which is FAR too lax in my opinion, and egregiously, it lets you get away with force unwrapping!
If I pull in the configuration file that I use*, swiftlint finds six new serious violations.
Node.swift:14:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:22:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:38:1: error: Space After Comment Violation: There should be a space after // (comments_space)
Node.swift:3:1: error: Empty First Line Violation: There should be an empty line after a declaration (empty_first_line)
Node.swift:41:29: error: Force Unwrapping Violation: Force unwrapping should be avoided. (force_unwrapping)
Node.swift:44:30: error: Force Unwrapping Violation: Force unwrapping should be avoided. (force_unwrapping)
The first four violations here, I believe, are pretty self explanatory.
The last two are also pretty explanatory I believe... but it probably requires a lot more thinking on your part in terms of how you're going to handle your child nodes.
And that's where we can start getting into an actual discussion on the logic here.
First, let's handle the simplest problem you have...
var description: StringAnyone can edit this property. Its setter has the same scope as its getter, and really, it shouldn't be writable at all. In fact, it should simply be a computed property which is calculated when it is called.
var description: String {
return "" // build description of the object
}Some of your comments make it clear that you're thinking of two different sorts of objects...
init(value: Double, variable: Int) {
//Split nodeinit(result: Int) {
//Terminal nodeImportantly, the
getChild method doesn't even make sense to call on a terminal node. And the isTerminal property becomes completely obsolete if we just make these two different nodes into two different types.It also allows us to get rid of all of the optional values (unless it makes sense for a split node to have just one child).
So, I'd propose something approximately looking like this...
protocol Node: CustomStringConvertible {}class TerminalNode: Node {
let result: Int
var description: String {
// build & return description
}
init(result: Int) {
self.result = result
}
}```
class SplitNode: Node {
let value: Double
let variable: Int
let leftChild: Node
let rightChild: Node
var description: String {
// build and return description string
}
init(value: Double, variable: Int, leftChild: Node, rightChild: Node) {
self.value = value
self.variable = variable
self.leftChild = leftChild
self.rightChild = rightChild
}
func childForVal
Code Snippets
$ swiftlint autocorrectimport Foundation
class Node: CustomStringConvertible {
let isTerminal: Bool
var value: Double? = nil
var leftChild: Node? = nil
var rightChild: Node? = nil
var variable: Int? = nil
var description: String
var result: Int? = nil
init(value: Double, variable: Int) {
//Split node
self.value = value
self.isTerminal = false
self.variable = variable
self.description = "\(variable): \(value)"
}
init(result: Int) {
//Terminal node
self.result = result
self.isTerminal = true
self.description = "Terminal node: \(result)\n"
}
func addLeftChild(child: Node) {
self.leftChild = child
self.description += " L -> \(child)\n"
}
func addRightChild(child: Node) {
self.rightChild = child
self.description += " R -> \(child)\n"
}
//For prediction
func getChild(x: Double) -> Node {
if x < value {
return leftChild!
}
else {
return rightChild!
}
}
}var description: Stringvar description: String {
return "" // build description of the object
}init(value: Double, variable: Int) {
//Split nodeContext
StackExchange Code Review Q#124518, answer score: 6
Revisions (0)
No revisions yet.