HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Do linearly dependent features in feature vectors improve the feature vector?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
vectorstheimprovefeatureslinearlydependentvectorfeature

Problem

I was reading Wiki on feature vectors, and as far as I can see, it suggests creating new features from already existing features:


Higher-level features can be obtained from already available features
and added to the feature vector, for example for the study of diseases
the feature 'Age' is useful and is defined as Age = 'Year of birth' -
'Year of death'. This process is referred to as feature construction.

But assuming that you already have included 'Year of birth' and 'Year of Death' as features, will adding 'Age' (that is, 'Year of birth' - 'Year of death') as a feature in the feature vector improve it in any way? I'm thinking not, as the variables are linearly dependent.

If it depends on the machine learning algorithm used, I am mostly interested in SVMs.

Solution

Though edron's thought experiment is nice, it assumes that you do not already have both of those features. If you did, then adding the third feature cannot help, because, as you say, it is linearly dependent. Assume features x1 = Year of birth, x2 = Year of death and x3 = Age = x2-x1. Then any linear predictor gives:

x1w1 + x2w2 + x3w3 = x1(w1+w3) + x2*(w2-w3)

So, nothing has been gained and we could have learned the same thing with the original features.

A better way to augment the features is to add a nonlinear function of features, such as (x2-x1)^2.

Context

StackExchange Computer Science Q#7232, answer score: 12

Revisions (0)

No revisions yet.