HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Averages and standard deviations for a Knn application

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
applicationdeviationsstandardaveragesforknnand

Problem

I'm working on a small command line application in c# where data is read from a CSV file along with a query and KNN is calculated for the query based on the data.

My problem isn't the algorithm at all, I understand it. Im having difficulty working with my code though. I think it's down to bad planning.

Here's my object for holding a "row" of data. the data is based on weather data and if something was played that day or not.

class dataRow
{
    public double sun {get; set;} 
    public double temp {get; set;} 
    public double humidity {get; set;} 
    public double wind {get; set;}
    public bool play {get; set;}

    public override string ToString()
    {
        return string.Format("\n\tSun: {0}\n\tTemp: {1}\n\tHumidity: {2}\n\tWind: {3}\n\tPlay: {4}", sun, temp, humidity, wind, play);
    }

    public double[] toArray()
    {
        return new double[] { sun, temp, humidity, wind };
    }    
}


I'm trying to Znormalize the data and the query before I work out euclidean distance between points but I'm stuck in a rut of annoying nested loops and I have a hunch there's a better way to code what I'm doing

```
class KNN
{
public int numK;
List dataSet;
dataRow QueryRow;
dataRow stdDev;
dataRow avg;

public KNN(String[] input)
{
numK = Int32.Parse(input[0]);
dataSet = dataImport.import(input[1]);
//inputRow = dataImport.import(input[2])[0];
avg = getAverage(toDoubleList(dataSet));
}

public List toDoubleList(List input)
{
var doubleList = new List();

for(int i =0; i input)
{
var avgTotals = new double[4];
var avgRow = new dataRow();

for (int y = 0; y input)
{
var stdDevTotals = new double[4];
var stdDevRow = new dataRow();

for (int y = 0; y < 3; y++)
{
double stdDev = 0;

for (int x = 0; x < input.Count; x++)
{
stdDev += Math.Pow(input[

Solution

-
As others mentioned in comments, you should follow the .Net naming guidelines. It will make your code consistent with all other .Net code.

-
Try to avoid abbreviated names like KNN, because they can be confusing. And if that's supposed to mean k-nearest neighbors, then I don't see how does your class compute that.

This applies also to names like numK.

-
Computing some statistics on your data should be separate from parsing inputs. So the constructor of KNN should take directly numK and dataSet and it should be the job of its caller to parse the inputs.

-
If the common approach is to create a DataRow once and then never change it again, consider making it immutable: make the property setters private and create a constructor that can be used to set them.

You could also add a constructor from an array of doubles, as an counterpart of ToArray().

-
Both your GetAverage() and GetStdDev() follow the same pattern. I would try to abstract away this pattern into a separate method, to avoid repetition:

public static DataRow AggregateRows(
    this IList rows, Func, double> func)
{
    return new DataRow
    {
        Sun = func(rows.Select(row => row.Sun)),
        Temp = func(rows.Select(row => row.Temp)),
        Humidity = func(rows.Select(row => row.Humidity)),
        Wind = func(rows.Select(row => row.Wind))
    };
}


With this, both methods can be significantly simplified using the LINQ method Average():

public DataRow Average()
{
    return dataSet.AggregateRows(values => values.Average());
}

public DataRow StdDev()
{
    return dataSet.AggregateRows(
        values =>
        {
            var average = values.Average();
            var variance = values.Average(value => Math.Pow(value - average, 2));
            return Math.Sqrt(variance);
        });
}


Note that the structure of the expression for computing variance is very similar to the way it's normally written: E[(X - E[X])2].

Also, if you do it this way, you won't need the ToArray() method anymore.

-
If you have many types whose ToString() follows the same pattern, you could implement it just once using reflection and then use that implementation everywhere.

Code Snippets

public static DataRow AggregateRows(
    this IList<DataRow> rows, Func<IEnumerable<double>, double> func)
{
    return new DataRow
    {
        Sun = func(rows.Select(row => row.Sun)),
        Temp = func(rows.Select(row => row.Temp)),
        Humidity = func(rows.Select(row => row.Humidity)),
        Wind = func(rows.Select(row => row.Wind))
    };
}
public DataRow Average()
{
    return dataSet.AggregateRows(values => values.Average());
}

public DataRow StdDev()
{
    return dataSet.AggregateRows(
        values =>
        {
            var average = values.Average();
            var variance = values.Average(value => Math.Pow(value - average, 2));
            return Math.Sqrt(variance);
        });
}

Context

StackExchange Code Review Q#41146, answer score: 2

Revisions (0)

No revisions yet.