HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to determine the size of training data using VC dimension?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thesizedimensiontrainingusingdeterminehowdata

Problem

I want to determine the size of training data ($m$) when I know the parameters $VC(H)$, $δ$ and $e$. As I know the $VC$ bound satisfy this equation:

$$ \mathrm{error}_{\mathrm{true}}(h) \le \mathrm{error}_{\mathrm{train}}(h) + \sqrt\frac{VC(H) \times \ln\left(\frac{2m}{VC(H)} + 1\right) + \ln(4δ)}m
$$

but how can I determine the size of training data ($m$) if I know the others?

Solution

Suppose you're aiming for a specific error rate. Suppose for the moment that there is no training error. You have an inequality involving all your known parameters and the unknown $m$, and you can solve it to obtain a value of $m$ that guarantees the specific true error rate, assuming that there is no training error. If you then run classification, you can correct for the training error and repeat.

The bound obtained through usage of VC dimension could be very pessimistic, and I don't know how useful it is in practice. Perhaps your instructor will address this later on in the course (or perhaps the issue will be swept under the rug).

Context

StackExchange Computer Science Q#18276, answer score: 2

Revisions (0)

No revisions yet.