patternModerate
Selection of parameters for genetic algorithm
Viewed 0 times
selectiongeneticalgorithmforparameters
Problem
How can one select the proper number of parameters for a genetic algorithm to model a given system?
For example, say you want to optimize production of cars, and you have 1,000 measurements of hourly efficiency at various tasks for each of 1,000 different employees. So, you have 1,000,000 data points. Most of these are likely to be weakly correlated to the overall efficiency of your factory, but not so weakly that you can say they are irrelevant with statistical confidence. How do you go about picking inputs for your GA so that you don't have 1,000,000+ degrees of freedom, resulting in very slow convergence or no convergence at all?
Specifically, what are the algorithms one could use to pre-select or selectively eliminate features?
One approach I have used myself in this scenario is to evolve the parameter selection itself, so I might have parents like
For example, say you want to optimize production of cars, and you have 1,000 measurements of hourly efficiency at various tasks for each of 1,000 different employees. So, you have 1,000,000 data points. Most of these are likely to be weakly correlated to the overall efficiency of your factory, but not so weakly that you can say they are irrelevant with statistical confidence. How do you go about picking inputs for your GA so that you don't have 1,000,000+ degrees of freedom, resulting in very slow convergence or no convergence at all?
Specifically, what are the algorithms one could use to pre-select or selectively eliminate features?
One approach I have used myself in this scenario is to evolve the parameter selection itself, so I might have parents like
{a,b,c}, {b,d,e,q,x,y,z}, and so on. I would then mutate the children to add or drop features. This works well for a few dozen features. But the problem is that it is inefficient if there is a large number of degrees of freedom. In that case, you are looking at 10^n combinations (in the example above, 10^1,000,000), which makes some pre-filtering of features critical to get any kind of useful performance.Solution
First of all - the example doesn't seem well suited because you would probably use some regression or classical ML methods to solve this. Secondly - you are referring to a general problem of feature selection (Kira, Rendell, 1992) or attribute selection (Hall, Holmes, 2003) or variable selection (Guyon, Elisseeff, 2003) or variable subset selection (Stecking, Schebesch, 2005) or feature extraction (Hillion, Masson, Roux, 1988) or dimensionality reduction (Roweis, Saul, 200) or state abstraction (Amarel, 1968). This problem is relevant not only to genetic algorithms but for almost all machine learning techniques when dealing with high dimensional data.
Three cases can be distinguished here: the last instance of this problem known as state abstraction is usually related to process modelling (which suits your example, but not the GA context). The first three, i.e. feature selection, attribute selection or variable selection seem to be most relevant when taking your question literally. In this context a common solution is the mRMR approach (Peng, Long, Ding, 2005). From my experience it doesn't always work well with continuous data - however, mutual information can be substituted with other coefficients, like correlation for example. Another possible approach is to use cross-validation (Picard, Cook, 1984) for this. You can have multiple models each using different features, and by means of model selection with cross-validation techniques you choose the best model, which gives you the information on which features work best for the given task.
The feature extraction and dimensionality reduction cases allow to not only select initial features, but also their combinations. A well-known example solution for this case is the PCA algorithm (Pearson, 1901), which produces the optimal, in terms of explained variance, set of features being linear combinations of input features.
Also note, that there are many models that handle feature extraction task by themselves. Some examples are: Growing Neural Gas Network (Fritzke, 1995), LASSO (Tibshirani, 2011), RFE SVM (Zeng, Chen, Tao, 2009), Decision Trees (Quinlan, 1986).
References:
-
Kenji Kira and Larry A. Rendell. 1992. A practical approach to feature selection. In Proceedings of the ninth international workshop on Machine learning (ML92), Derek Sleeman and Peter Edwards (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 249-256.
-
Hall, M.A.; Holmes, G., "Benchmarking attribute selection techniques for discrete class data mining," Knowledge and Data Engineering, IEEE Transactions on , vol.15, no.6, pp.1437,1447, Nov.-Dec. 2003
-
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (March 2003), 1157-1182.
-
Ralf Stecking, Klaus B. Schebesch. 2005. Variable Subset Selection for Credit Scoring with Support Vector Machines, Operations Research Proceedings 2005, Springer Berlin Heidelberg, (2005), pp.251-256
-
Hillion, A.; Masson, P.; Roux, C., "A nonparametric approach to linear feature extraction; application to classification of binary synthetic textures," Pattern Recognition, 1988., 9th International Conference on , vol., no., pp.1036,1039 vol.2, 14-17 Nov 1988
-
Nonlinear dimensionality reduction by locally linear embedding.
Sam Roweis & Lawrence Saul.
Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
-
Amarel, S. (1968). On representations of problems of reasoning about actions, Machine Intelligence, (3), 131--171
-
Peng, H.; Fulmi Long; Ding, C., "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.8, pp.1226,1238, Aug. 2005
-
Richard R. Picard and R. Dennis Cook, Journal of the American Statistical Association Vol. 79, No. 387 (Sep., 1984), pp. 575-583
-
On lines and planes of closest fit to systems of points in space Philosophical Magazine, Vol. 2, No. 6. (1901), pp. 559-572 by K. Pearson
-
A Growing Neural Gas Network Learns Topologies, Bernd Fritzke, Conference: Neural Information Processing Systems - NIPS , pp. 625-632, 1994
-
Regression shrinkage and selection via the lasso: a retrospective, Robert Tibshirani, Journal of the Royal Statistical Society Series B, 2011, vol. 73, issue 3, pages 273-282
-
Xiangyan Zeng; Yen-wei Chen; Caixia Tao; Van Alphen, D., "Feature Selection Using Recursive Feature Elimination for Handwritten Digit Recognition," Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP '09. Fifth International Conference on , vol., no., pp.1205,1208, 12-14 Sept. 2009
-
J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81-106.
Three cases can be distinguished here: the last instance of this problem known as state abstraction is usually related to process modelling (which suits your example, but not the GA context). The first three, i.e. feature selection, attribute selection or variable selection seem to be most relevant when taking your question literally. In this context a common solution is the mRMR approach (Peng, Long, Ding, 2005). From my experience it doesn't always work well with continuous data - however, mutual information can be substituted with other coefficients, like correlation for example. Another possible approach is to use cross-validation (Picard, Cook, 1984) for this. You can have multiple models each using different features, and by means of model selection with cross-validation techniques you choose the best model, which gives you the information on which features work best for the given task.
The feature extraction and dimensionality reduction cases allow to not only select initial features, but also their combinations. A well-known example solution for this case is the PCA algorithm (Pearson, 1901), which produces the optimal, in terms of explained variance, set of features being linear combinations of input features.
Also note, that there are many models that handle feature extraction task by themselves. Some examples are: Growing Neural Gas Network (Fritzke, 1995), LASSO (Tibshirani, 2011), RFE SVM (Zeng, Chen, Tao, 2009), Decision Trees (Quinlan, 1986).
References:
-
Kenji Kira and Larry A. Rendell. 1992. A practical approach to feature selection. In Proceedings of the ninth international workshop on Machine learning (ML92), Derek Sleeman and Peter Edwards (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 249-256.
-
Hall, M.A.; Holmes, G., "Benchmarking attribute selection techniques for discrete class data mining," Knowledge and Data Engineering, IEEE Transactions on , vol.15, no.6, pp.1437,1447, Nov.-Dec. 2003
-
Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (March 2003), 1157-1182.
-
Ralf Stecking, Klaus B. Schebesch. 2005. Variable Subset Selection for Credit Scoring with Support Vector Machines, Operations Research Proceedings 2005, Springer Berlin Heidelberg, (2005), pp.251-256
-
Hillion, A.; Masson, P.; Roux, C., "A nonparametric approach to linear feature extraction; application to classification of binary synthetic textures," Pattern Recognition, 1988., 9th International Conference on , vol., no., pp.1036,1039 vol.2, 14-17 Nov 1988
-
Nonlinear dimensionality reduction by locally linear embedding.
Sam Roweis & Lawrence Saul.
Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
-
Amarel, S. (1968). On representations of problems of reasoning about actions, Machine Intelligence, (3), 131--171
-
Peng, H.; Fulmi Long; Ding, C., "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.8, pp.1226,1238, Aug. 2005
-
Richard R. Picard and R. Dennis Cook, Journal of the American Statistical Association Vol. 79, No. 387 (Sep., 1984), pp. 575-583
-
On lines and planes of closest fit to systems of points in space Philosophical Magazine, Vol. 2, No. 6. (1901), pp. 559-572 by K. Pearson
-
A Growing Neural Gas Network Learns Topologies, Bernd Fritzke, Conference: Neural Information Processing Systems - NIPS , pp. 625-632, 1994
-
Regression shrinkage and selection via the lasso: a retrospective, Robert Tibshirani, Journal of the Royal Statistical Society Series B, 2011, vol. 73, issue 3, pages 273-282
-
Xiangyan Zeng; Yen-wei Chen; Caixia Tao; Van Alphen, D., "Feature Selection Using Recursive Feature Elimination for Handwritten Digit Recognition," Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP '09. Fifth International Conference on , vol., no., pp.1205,1208, 12-14 Sept. 2009
-
J. R. Quinlan. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (March 1986), 81-106.
Context
StackExchange Computer Science Q#21974, answer score: 11
Revisions (0)
No revisions yet.