snippetMinor
How to determine the size of training data using VC dimension?
Viewed 0 times
thesizedimensiontrainingusingdeterminehowdata
Problem
I want to determine the size of training data ($m$) when I know the parameters $VC(H)$, $δ$ and $e$. As I know the $VC$ bound satisfy this equation:
$$ \mathrm{error}_{\mathrm{true}}(h) \le \mathrm{error}_{\mathrm{train}}(h) + \sqrt\frac{VC(H) \times \ln\left(\frac{2m}{VC(H)} + 1\right) + \ln(4δ)}m
$$
but how can I determine the size of training data ($m$) if I know the others?
$$ \mathrm{error}_{\mathrm{true}}(h) \le \mathrm{error}_{\mathrm{train}}(h) + \sqrt\frac{VC(H) \times \ln\left(\frac{2m}{VC(H)} + 1\right) + \ln(4δ)}m
$$
but how can I determine the size of training data ($m$) if I know the others?
Solution
Suppose you're aiming for a specific error rate. Suppose for the moment that there is no training error. You have an inequality involving all your known parameters and the unknown $m$, and you can solve it to obtain a value of $m$ that guarantees the specific true error rate, assuming that there is no training error. If you then run classification, you can correct for the training error and repeat.
The bound obtained through usage of VC dimension could be very pessimistic, and I don't know how useful it is in practice. Perhaps your instructor will address this later on in the course (or perhaps the issue will be swept under the rug).
The bound obtained through usage of VC dimension could be very pessimistic, and I don't know how useful it is in practice. Perhaps your instructor will address this later on in the course (or perhaps the issue will be swept under the rug).
Context
StackExchange Computer Science Q#18276, answer score: 2
Revisions (0)
No revisions yet.