patternMinor
Proving a certain hypothesis class on a given distribution is not learnable
Viewed 0 times
provinghypothesislearnabledistributioncertainclassgivennot
Problem
I had this question in Learning theory, but it's really just a question in probability theory to be honest, so I'm gonna try to rephrase it in a way that really emphasizes what I was trying to do to solve it:
Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge \forall \, 1\leq j \leq m, \,x_j(i)=y_j)>\delta$
Original phrasing:
Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Define the hypothesis class $\mathcal H_N = \{h_i: \mathcal X_N \to \mathcal Y \mid 1 \leq i \leq N \}$, where $h_i(x) = x(i)$.
Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge L_S(h)=0)>\delta$
So a few things:
$S$ is obtained by randomly and independently drawing $(x,y)$ pairs from $\mathcal D_N$,
$L_D(i) = \mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y)$,
x(i) - is the i'th coordinate of the $N$ dimensional vector x.
What I tried to do was:
First of all, fix an $i\in \{1,...,N\}$ and consider $\mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y) = 2\frac{2^{N-1}}{2^{N+1}} = \frac{1}{2} > \epsilon $ for any $\epsilon \in (0,\frac{1}{2})$ so the first requirement is rather redundant.
Second, independently drawing a sequence $S$ of $m$ pairs (with repetition) such that every pair has $x(i)=y$ is with probability $\mathbb P_{S\sim \mathcal D^m} (\forall \, 1\leq j \leq m : x_j(i)=y_j) = \prod_{j=1}^m \mathbb P_{(x,y)\sim \mathcal D} (x_j(i)=y_j) = \frac{1}{2^m}$.
Now that's for a single fixed $i\in [N]$, trying to lower bound the probability of an existence of such an $i$ could
Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge \forall \, 1\leq j \leq m, \,x_j(i)=y_j)>\delta$
Original phrasing:
Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Define the hypothesis class $\mathcal H_N = \{h_i: \mathcal X_N \to \mathcal Y \mid 1 \leq i \leq N \}$, where $h_i(x) = x(i)$.
Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge L_S(h)=0)>\delta$
So a few things:
$S$ is obtained by randomly and independently drawing $(x,y)$ pairs from $\mathcal D_N$,
$L_D(i) = \mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y)$,
x(i) - is the i'th coordinate of the $N$ dimensional vector x.
What I tried to do was:
First of all, fix an $i\in \{1,...,N\}$ and consider $\mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y) = 2\frac{2^{N-1}}{2^{N+1}} = \frac{1}{2} > \epsilon $ for any $\epsilon \in (0,\frac{1}{2})$ so the first requirement is rather redundant.
Second, independently drawing a sequence $S$ of $m$ pairs (with repetition) such that every pair has $x(i)=y$ is with probability $\mathbb P_{S\sim \mathcal D^m} (\forall \, 1\leq j \leq m : x_j(i)=y_j) = \prod_{j=1}^m \mathbb P_{(x,y)\sim \mathcal D} (x_j(i)=y_j) = \frac{1}{2^m}$.
Now that's for a single fixed $i\in [N]$, trying to lower bound the probability of an existence of such an $i$ could
Solution
Since each $x$ is uniformly chosen, the probability of having a certain value at the $i$'th spot is independent of the probability of finding some other certain value in the $j$'th spot. You showed this already, but notice it means that you dont need the inclusion-exclusion principle, and instead the probability will just be the probability of the complement, which is $1-\frac{1}{2^{km}}$. You can now easily choose $k$ such that this probability would be bigger than $\delta$
Context
StackExchange Computer Science Q#138279, answer score: 3
Revisions (0)
No revisions yet.