patterncsharpMinor
Averages and standard deviations for a Knn application
Viewed 0 times
applicationdeviationsstandardaveragesforknnand
Problem
I'm working on a small command line application in c# where data is read from a CSV file along with a query and KNN is calculated for the query based on the data.
My problem isn't the algorithm at all, I understand it. Im having difficulty working with my code though. I think it's down to bad planning.
Here's my object for holding a "row" of data. the data is based on weather data and if something was played that day or not.
I'm trying to Znormalize the data and the query before I work out euclidean distance between points but I'm stuck in a rut of annoying nested loops and I have a hunch there's a better way to code what I'm doing
```
class KNN
{
public int numK;
List dataSet;
dataRow QueryRow;
dataRow stdDev;
dataRow avg;
public KNN(String[] input)
{
numK = Int32.Parse(input[0]);
dataSet = dataImport.import(input[1]);
//inputRow = dataImport.import(input[2])[0];
avg = getAverage(toDoubleList(dataSet));
}
public List toDoubleList(List input)
{
var doubleList = new List();
for(int i =0; i input)
{
var avgTotals = new double[4];
var avgRow = new dataRow();
for (int y = 0; y input)
{
var stdDevTotals = new double[4];
var stdDevRow = new dataRow();
for (int y = 0; y < 3; y++)
{
double stdDev = 0;
for (int x = 0; x < input.Count; x++)
{
stdDev += Math.Pow(input[
My problem isn't the algorithm at all, I understand it. Im having difficulty working with my code though. I think it's down to bad planning.
Here's my object for holding a "row" of data. the data is based on weather data and if something was played that day or not.
class dataRow
{
public double sun {get; set;}
public double temp {get; set;}
public double humidity {get; set;}
public double wind {get; set;}
public bool play {get; set;}
public override string ToString()
{
return string.Format("\n\tSun: {0}\n\tTemp: {1}\n\tHumidity: {2}\n\tWind: {3}\n\tPlay: {4}", sun, temp, humidity, wind, play);
}
public double[] toArray()
{
return new double[] { sun, temp, humidity, wind };
}
}I'm trying to Znormalize the data and the query before I work out euclidean distance between points but I'm stuck in a rut of annoying nested loops and I have a hunch there's a better way to code what I'm doing
```
class KNN
{
public int numK;
List dataSet;
dataRow QueryRow;
dataRow stdDev;
dataRow avg;
public KNN(String[] input)
{
numK = Int32.Parse(input[0]);
dataSet = dataImport.import(input[1]);
//inputRow = dataImport.import(input[2])[0];
avg = getAverage(toDoubleList(dataSet));
}
public List toDoubleList(List input)
{
var doubleList = new List();
for(int i =0; i input)
{
var avgTotals = new double[4];
var avgRow = new dataRow();
for (int y = 0; y input)
{
var stdDevTotals = new double[4];
var stdDevRow = new dataRow();
for (int y = 0; y < 3; y++)
{
double stdDev = 0;
for (int x = 0; x < input.Count; x++)
{
stdDev += Math.Pow(input[
Solution
-
As others mentioned in comments, you should follow the .Net naming guidelines. It will make your code consistent with all other .Net code.
-
Try to avoid abbreviated names like
This applies also to names like
-
Computing some statistics on your data should be separate from parsing inputs. So the constructor of
-
If the common approach is to create a
You could also add a constructor from an array of
-
Both your
With this, both methods can be significantly simplified using the LINQ method
Note that the structure of the expression for computing variance is very similar to the way it's normally written: E[(X - E[X])2].
Also, if you do it this way, you won't need the
-
If you have many types whose
As others mentioned in comments, you should follow the .Net naming guidelines. It will make your code consistent with all other .Net code.
-
Try to avoid abbreviated names like
KNN, because they can be confusing. And if that's supposed to mean k-nearest neighbors, then I don't see how does your class compute that.This applies also to names like
numK.-
Computing some statistics on your data should be separate from parsing inputs. So the constructor of
KNN should take directly numK and dataSet and it should be the job of its caller to parse the inputs.-
If the common approach is to create a
DataRow once and then never change it again, consider making it immutable: make the property setters private and create a constructor that can be used to set them.You could also add a constructor from an array of
doubles, as an counterpart of ToArray().-
Both your
GetAverage() and GetStdDev() follow the same pattern. I would try to abstract away this pattern into a separate method, to avoid repetition:public static DataRow AggregateRows(
this IList rows, Func, double> func)
{
return new DataRow
{
Sun = func(rows.Select(row => row.Sun)),
Temp = func(rows.Select(row => row.Temp)),
Humidity = func(rows.Select(row => row.Humidity)),
Wind = func(rows.Select(row => row.Wind))
};
}With this, both methods can be significantly simplified using the LINQ method
Average():public DataRow Average()
{
return dataSet.AggregateRows(values => values.Average());
}
public DataRow StdDev()
{
return dataSet.AggregateRows(
values =>
{
var average = values.Average();
var variance = values.Average(value => Math.Pow(value - average, 2));
return Math.Sqrt(variance);
});
}Note that the structure of the expression for computing variance is very similar to the way it's normally written: E[(X - E[X])2].
Also, if you do it this way, you won't need the
ToArray() method anymore.-
If you have many types whose
ToString() follows the same pattern, you could implement it just once using reflection and then use that implementation everywhere.Code Snippets
public static DataRow AggregateRows(
this IList<DataRow> rows, Func<IEnumerable<double>, double> func)
{
return new DataRow
{
Sun = func(rows.Select(row => row.Sun)),
Temp = func(rows.Select(row => row.Temp)),
Humidity = func(rows.Select(row => row.Humidity)),
Wind = func(rows.Select(row => row.Wind))
};
}public DataRow Average()
{
return dataSet.AggregateRows(values => values.Average());
}
public DataRow StdDev()
{
return dataSet.AggregateRows(
values =>
{
var average = values.Average();
var variance = values.Average(value => Math.Pow(value - average, 2));
return Math.Sqrt(variance);
});
}Context
StackExchange Code Review Q#41146, answer score: 2
Revisions (0)
No revisions yet.