patternMinor
Do I have the right definition of VC dimension?
Viewed 0 times
definitionthedimensionhaveright
Problem
I am having some trouble understanding the notion of the VC dimension. The definition I have is the following:
The VC dimension of a set of hypothesis functions $H$ is the cardinality of the largest set which $H$ can shatter. We say that $H$ shatters some set $S \subseteq \mathcal{X}$ if we can realise any labelings on $S$ using functions from $H$.
To me, this means that for any (countable) set of labels $L \subseteq \mathbb{Z}$, there is some $h \in H$ that provides a surjection $h: S \mapsto T$, where $T \subseteq \mathcal{X}$ with $|T| = |L| \leq |S|$. (There is then a trivial bijection from $T$ to $L$.)
Is this the correct interpretation?
The reason I ask is because the above interpretation leads me to think that the singleton set $H = \{h\}$ has $VC(H) = 1$, since it always sends a single point to a single point, so we can get any labelling we like on this single point by a trivial bijection.
That is, suppose we wanted to label the point $x \in \mathbb{R}$ as $l$, then $h: y \in \mathbb{R} \mapsto 0$ will do since we can just change the name of 0 to $l$. Note that this chosen $h$ does not shatter a two-set from $\mathbb{R}$, since we can only label the points in one distinct way.
However, I have read that the VC of a singleton is 0. I don't understand this.
(I see hypothesis functions $g: \mathcal{X} \to \{1, \dots, n\}$ as belonging to an equivalence class of functions that sends $\mathcal{X}$ to any set of size $n$. Please correct me if this is the wrong intuition.)
The VC dimension of a set of hypothesis functions $H$ is the cardinality of the largest set which $H$ can shatter. We say that $H$ shatters some set $S \subseteq \mathcal{X}$ if we can realise any labelings on $S$ using functions from $H$.
To me, this means that for any (countable) set of labels $L \subseteq \mathbb{Z}$, there is some $h \in H$ that provides a surjection $h: S \mapsto T$, where $T \subseteq \mathcal{X}$ with $|T| = |L| \leq |S|$. (There is then a trivial bijection from $T$ to $L$.)
Is this the correct interpretation?
The reason I ask is because the above interpretation leads me to think that the singleton set $H = \{h\}$ has $VC(H) = 1$, since it always sends a single point to a single point, so we can get any labelling we like on this single point by a trivial bijection.
That is, suppose we wanted to label the point $x \in \mathbb{R}$ as $l$, then $h: y \in \mathbb{R} \mapsto 0$ will do since we can just change the name of 0 to $l$. Note that this chosen $h$ does not shatter a two-set from $\mathbb{R}$, since we can only label the points in one distinct way.
However, I have read that the VC of a singleton is 0. I don't understand this.
(I see hypothesis functions $g: \mathcal{X} \to \{1, \dots, n\}$ as belonging to an equivalence class of functions that sends $\mathcal{X}$ to any set of size $n$. Please correct me if this is the wrong intuition.)
Solution
A family of hypothesis functions on domain $\cal X$ is a subset of $\{0,1\}^{\cal X}$. A family $H$ shatters a set $S \subseteq \cal X$ if for every subset $T \subseteq S$ there exists a function $h \in H$ such that $h(s) = 1_{s \in T}$ for all $s \in S$, that is, $h(s) = 1$ if $s \in T$ and $h(s) = 0$ if $s \in S \setminus T$.
A singleton family $H$ cannot shatter any non-empty set $S$. Indeed, suppose that $H = \{h\}$ shatters $S \neq \emptyset$, and take any $s \in S$. Taking $T = \emptyset$, we see that $h(s) = 0$, whereas taking $T = S$, we see that $h(s) = 1$.
A singleton family $H$ cannot shatter any non-empty set $S$. Indeed, suppose that $H = \{h\}$ shatters $S \neq \emptyset$, and take any $s \in S$. Taking $T = \emptyset$, we see that $h(s) = 0$, whereas taking $T = S$, we see that $h(s) = 1$.
Context
StackExchange Computer Science Q#81080, answer score: 4
Revisions (0)
No revisions yet.