patternMinor

Proving a certain hypothesis class on a given distribution is not learnable

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning cs probability-theory stackoverflow

provinghypothesislearnabledistributioncertainclassgivennot

Problem

I had this question in Learning theory, but it's really just a question in probability theory to be honest, so I'm gonna try to rephrase it in a way that really emphasizes what I was trying to do to solve it:

Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge \forall \, 1\leq j \leq m, \,x_j(i)=y_j)>\delta$

Original phrasing:

Let $\mathcal X_N=\{0,1\}^N$, $\mathcal Y=\{0,1\}$ and define a uniform distribution, $\mathcal D_N$, on $\mathcal X_N \times \mathcal Y$. Define the hypothesis class $\mathcal H_N = \{h_i: \mathcal X_N \to \mathcal Y \mid 1 \leq i \leq N \}$, where $h_i(x) = x(i)$.

Prove that for any $m, \epsilon\in(0,\frac{1}{2}), \delta \in(0,1)$ there exists a number $N$ such that $\mathbb P_{S\sim \mathcal D^m} (\exists \,1\leq i \leq N: L_\mathcal D(i) > \epsilon \,\wedge L_S(h)=0)>\delta$

So a few things:

$S$ is obtained by randomly and independently drawing $(x,y)$ pairs from $\mathcal D_N$,

$L_D(i) = \mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y)$,

x(i) - is the i'th coordinate of the $N$ dimensional vector x.

What I tried to do was:

First of all, fix an $i\in \{1,...,N\}$ and consider $\mathbb P_{(x,y)\sim \mathcal D} (x(i)\neq y) = 2\frac{2^{N-1}}{2^{N+1}} = \frac{1}{2} > \epsilon $ for any $\epsilon \in (0,\frac{1}{2})$ so the first requirement is rather redundant.

Second, independently drawing a sequence $S$ of $m$ pairs (with repetition) such that every pair has $x(i)=y$ is with probability $\mathbb P_{S\sim \mathcal D^m} (\forall \, 1\leq j \leq m : x_j(i)=y_j) = \prod_{j=1}^m \mathbb P_{(x,y)\sim \mathcal D} (x_j(i)=y_j) = \frac{1}{2^m}$.

Now that's for a single fixed $i\in [N]$, trying to lower bound the probability of an existence of such an $i$ could

Solution

Since each $x$ is uniformly chosen, the probability of having a certain value at the $i$'th spot is independent of the probability of finding some other certain value in the $j$'th spot. You showed this already, but notice it means that you dont need the inclusion-exclusion principle, and instead the probability will just be the probability of the complement, which is $1-\frac{1}{2^{km}}$. You can now easily choose $k$ such that this probability would be bigger than $\delta$

Context

StackExchange Computer Science Q#138279, answer score: 3

Revisions (0)

No revisions yet.