patternMinor
Predicting energy consumption of households
Viewed 0 times
householdspredictingenergyconsumption
Problem
I have the dataset which you can find here, containing many different characteristics of different houses, including their types of heating, or the number of adults and children living in the house. In total there are about 500 records. I want to use an algorithm, that can be trained using the dataset above, in order to be able to predict the electricity consumption of a house that is not in the set.
I have tried every possible machine learning algorithm (using weka) (linear regression, SVM etc) . However I had about 350 mean absolute error, which is not good. I tried to make my data to take values from 0 to 1, or to delete some characteristics. I did not managed to find some good results.
I also tried to use R tool, and I did not have good results either...
I would be very grateful, if someone could give me some advice, or if you could examine a little the dataset and run some algorithms on it. What type of preprocessing should I use, and what type of algorithm?
I have posted a similar question last month, but I did not get any useful answers.
I have tried every possible machine learning algorithm (using weka) (linear regression, SVM etc) . However I had about 350 mean absolute error, which is not good. I tried to make my data to take values from 0 to 1, or to delete some characteristics. I did not managed to find some good results.
I also tried to use R tool, and I did not have good results either...
I would be very grateful, if someone could give me some advice, or if you could examine a little the dataset and run some algorithms on it. What type of preprocessing should I use, and what type of algorithm?
I have posted a similar question last month, but I did not get any useful answers.
Solution
I am not an expert in machine learning, but here is one problem: most of your data is binary. Since you have many such paramters, very little can be derived in terms of correlation between any given parameter and the target quantity. Therefore, statistic methods will have a hard time.
Furthermore, you have a small data set but many parameters. Get rid of some.
Another problem can be that you have mutually exclusive data: for example, the first three parameters (
If you can make assumptions about which areas inherently cause more energy consumption, consider turning the parameter into an appropriate interval (of reals).
As a general rule, machine learning is what you do if other methods don't work. In this case, I don't see the need for machine learning unless you show that reasonable other approaches fail.
For instance: Research energy consumption of the appliances you have parameters for and assume average values resp. values fitting the total area covered. Research and/or make reasonable assumptions for the number of light bulbs, air conditioning, etc based on total area. If this does not already solve your problem, it should reduce the number of parameters with unknown influence.
Furthermore, you have a small data set but many parameters. Get rid of some.
Another problem can be that you have mutually exclusive data: for example, the first three parameters (
URBAN, RURAL and MOUNTAINOUS) can not be set at the same time. You might want to combine them into one category parameter; thus, the algorithms don't have to find a multi-dimensional anti-correlation and correlate the vector to energy consumption, but use that one parameter directly. Note how this also reduces the number of parameters.If you can make assumptions about which areas inherently cause more energy consumption, consider turning the parameter into an appropriate interval (of reals).
As a general rule, machine learning is what you do if other methods don't work. In this case, I don't see the need for machine learning unless you show that reasonable other approaches fail.
For instance: Research energy consumption of the appliances you have parameters for and assume average values resp. values fitting the total area covered. Research and/or make reasonable assumptions for the number of light bulbs, air conditioning, etc based on total area. If this does not already solve your problem, it should reduce the number of parameters with unknown influence.
Context
StackExchange Computer Science Q#11527, answer score: 2
Revisions (0)
No revisions yet.