patternMinor
Classification training data, but regression prediction
Viewed 0 times
predictionbuttrainingregressionclassificationdata
Problem
Suppose I'm performing machine learning on a simple dataset, and have a bunch of training data of the form:
Where the labels are values in two classes, $[0, 1]$. Clearly, this training data lends one to believe that it will be a classification task.
However, suppose I want to output instead the probability that a feature will take the class $1$. Then, my output is more of a regression task.
Consequently, when I'm designing a simple neural network with just a single input layer and single output layer, how many output units should I have? Should I have two output units, one for each class, and if so, how do I ensure that each pair of outputs will be a valid probability distribution (i.e. sum to one)? Or should I have only one output unit, and treat the entire problem as a regression task?
There are probably pros/cons to each approach... thanks for your help!
x (feature) y (label)
-----------------------
1 0
2 1
3 1
4 0
5 1
6 1
...Where the labels are values in two classes, $[0, 1]$. Clearly, this training data lends one to believe that it will be a classification task.
However, suppose I want to output instead the probability that a feature will take the class $1$. Then, my output is more of a regression task.
Consequently, when I'm designing a simple neural network with just a single input layer and single output layer, how many output units should I have? Should I have two output units, one for each class, and if so, how do I ensure that each pair of outputs will be a valid probability distribution (i.e. sum to one)? Or should I have only one output unit, and treat the entire problem as a regression task?
There are probably pros/cons to each approach... thanks for your help!
Solution
This is still a binary classification task. In the abstract, there are two ways to handle this:
-
Most classifiers can output a predicted class and a confidence score (which indicates how confident the classifier is in its prediction). If you don't need a probability, you can use the confidence score. If you want to turn it into a probability, you can use various calibration procedures to turn this into a probability in the range [0,1].
-
Some classifiers can output a probability directly. Generative models in particular are good at this. For instance, logistic regression outputs both a predicted class and a probability for the class.
If you construct a neural network with $k$ outputs, one per class, using a softmax output and train it to minimize the cross-entropy loss function, then you can interpret the output as a probability distribution on the classes (though beware that it might be biased or over-confident). The softmax ensures the outputs are normalized to be in the range [0,1] and to sum to one.
In your case, you can try both approaches (two outputs with softmax; or a single output with a sigmoidal activation function). The only way to know which will perform better is to try both, but personally, I'd lean towards using two outputs and softmax.
-
Most classifiers can output a predicted class and a confidence score (which indicates how confident the classifier is in its prediction). If you don't need a probability, you can use the confidence score. If you want to turn it into a probability, you can use various calibration procedures to turn this into a probability in the range [0,1].
-
Some classifiers can output a probability directly. Generative models in particular are good at this. For instance, logistic regression outputs both a predicted class and a probability for the class.
If you construct a neural network with $k$ outputs, one per class, using a softmax output and train it to minimize the cross-entropy loss function, then you can interpret the output as a probability distribution on the classes (though beware that it might be biased or over-confident). The softmax ensures the outputs are normalized to be in the range [0,1] and to sum to one.
In your case, you can try both approaches (two outputs with softmax; or a single output with a sigmoidal activation function). The only way to know which will perform better is to try both, but personally, I'd lean towards using two outputs and softmax.
Context
StackExchange Computer Science Q#60176, answer score: 2
Revisions (0)
No revisions yet.