patternMinor
PAC learning of axis-aligned rectangles
Viewed 0 times
axisrectanglespaclearningaligned
Problem
I've been reading the proof that axis-aligned rectangles are PAC learnable from the book Foundations of Machine Learning by Mohri (Proof pt. 1, Proof pt. 2), and a small technical detail stuck out to me.
The proof goes through dividing the target rectangle $R$ to four rectangular regions $r_i$ (Fig 2.3), each having probability at least $\frac{\epsilon}{4}$. It is then claimed that:
It seems that for the first point, we need to pick $r_i$ such that $\Pr[r_i]\le\frac{\epsilon}{4}$, which means that we must pick the $r_i$'s such that $\Pr[R_i]$ equals exactly $\frac{\epsilon}{4}$. Is this reasoning correct? If so, how can you ensure that $\Pr[r_i] = \epsilon/4$, without assuming something like absolute continuity of our distribution on $\mathbb{R}^2$?
The proof goes through dividing the target rectangle $R$ to four rectangular regions $r_i$ (Fig 2.3), each having probability at least $\frac{\epsilon}{4}$. It is then claimed that:
- If the tightest sample-consistent rectangle $R_S$ (Fig 2.2) meets any of these regions, then the mass of the possible erroneous region $R\setminus R_S$ is bounded by epsilon, which in turn bounds the error probability.
- If $R_S$ misses at least one of those rectangles, then we reduce to upper bounding the probability that the number of trials in a geometric random variable with probability $\ge\frac{\epsilon}{4}$ exceeds $m$.
It seems that for the first point, we need to pick $r_i$ such that $\Pr[r_i]\le\frac{\epsilon}{4}$, which means that we must pick the $r_i$'s such that $\Pr[R_i]$ equals exactly $\frac{\epsilon}{4}$. Is this reasoning correct? If so, how can you ensure that $\Pr[r_i] = \epsilon/4$, without assuming something like absolute continuity of our distribution on $\mathbb{R}^2$?
Solution
Good question! The proof given in Mohri's book indeed assumes continuity of the distribution, but this can be avoided with a finer definition of the $r_i$'s. Such a correction appears in these lecture notes who cite this paper.
Recall that you want the rectangles to be small enough, such that if $R_S$ intersects all of them, then $R\setminus R_s$ is small enough. Simultaneously, you want the rectangles to have large enough mass such that with high probability you do not miss a rectangle (here you use a simple bound on a geometric random variable as stated in the second point).
Suppose $R=[a,b]\times [c,d]$, and let us focus on $r_1$, the subrectangle having $[a,b]$ as an edge (if we are able to satisfy the above constraints for $r_1$, you can apply a similar construction for the rest of the rectangles). Define $r_1=[a,b]\times[c,c+h]$, where:
$h=\inf\big\{h'\big|\Pr\left[[a,b]\times[c,c+h']\right]\ge \frac{\epsilon}{4}\big\}$.
Note that $\Pr[r_1]\ge\frac{\epsilon}{4}$. To see why, let us assume that $\Pr[R]\ge\epsilon$ (otherwise we are done) and let $[a,b]\times[c,c+h']$ be some rectangle with mass $\ge \epsilon/4$. Now define the sequence of events
$x_i=[a,b]\times[c,c+h+2^{-i}\Delta]$, where $\Delta=h'-h$, and identify the rectangle $x_i$ with the event that a point drawn from our distribution lies inside the rectangle. Now we have:
$\Pr\left[r_1\right]=\Pr\left[\bigcap\limits_{n=1}^\infty x_n\right]=\Pr\left[\lim\limits_{n\rightarrow\infty}x_n\right]=\lim\limits_{n\rightarrow\infty}\Pr[x_n]\ge \frac{\epsilon}{4}$.
To see why the last inequality holds note that the series converges, since it is decreasing and lower bounded by $\frac{\epsilon}{4}$ (by our choice of $h$). See this question for a discussion on the probability of limit events. This shows that you are covered on the second point, and can upper bound the probability that at least one of the rectangles is missed.
It remains to show what we should do in the case that $r_i\cap R_s\neq\emptyset$ for all $i$. We can no longer immediately claim that $\Pr\left[\bigcup\limits_{i}r_i\right]\le\epsilon$, but a more sensitive approach would do the trick. Let us define $r_i'$ to be the interior of the rectangle $r_i$ along with its corresponding edge from $R$. Note that if $R_S\cap r_i\neq\emptyset$ then $R\setminus R_S\subseteq\bigcup\limits_{i}r_i'$ (geometric argument). We can now conclude the proof by showing that $\Pr[r_i']\le\epsilon/4$. You can prove this by a similar approach, define a sequence of events converging to $r_1'$, and use the fact the the probability for each event in the sequence is upper bounded by $\epsilon/4$.
Recall that you want the rectangles to be small enough, such that if $R_S$ intersects all of them, then $R\setminus R_s$ is small enough. Simultaneously, you want the rectangles to have large enough mass such that with high probability you do not miss a rectangle (here you use a simple bound on a geometric random variable as stated in the second point).
Suppose $R=[a,b]\times [c,d]$, and let us focus on $r_1$, the subrectangle having $[a,b]$ as an edge (if we are able to satisfy the above constraints for $r_1$, you can apply a similar construction for the rest of the rectangles). Define $r_1=[a,b]\times[c,c+h]$, where:
$h=\inf\big\{h'\big|\Pr\left[[a,b]\times[c,c+h']\right]\ge \frac{\epsilon}{4}\big\}$.
Note that $\Pr[r_1]\ge\frac{\epsilon}{4}$. To see why, let us assume that $\Pr[R]\ge\epsilon$ (otherwise we are done) and let $[a,b]\times[c,c+h']$ be some rectangle with mass $\ge \epsilon/4$. Now define the sequence of events
$x_i=[a,b]\times[c,c+h+2^{-i}\Delta]$, where $\Delta=h'-h$, and identify the rectangle $x_i$ with the event that a point drawn from our distribution lies inside the rectangle. Now we have:
$\Pr\left[r_1\right]=\Pr\left[\bigcap\limits_{n=1}^\infty x_n\right]=\Pr\left[\lim\limits_{n\rightarrow\infty}x_n\right]=\lim\limits_{n\rightarrow\infty}\Pr[x_n]\ge \frac{\epsilon}{4}$.
To see why the last inequality holds note that the series converges, since it is decreasing and lower bounded by $\frac{\epsilon}{4}$ (by our choice of $h$). See this question for a discussion on the probability of limit events. This shows that you are covered on the second point, and can upper bound the probability that at least one of the rectangles is missed.
It remains to show what we should do in the case that $r_i\cap R_s\neq\emptyset$ for all $i$. We can no longer immediately claim that $\Pr\left[\bigcup\limits_{i}r_i\right]\le\epsilon$, but a more sensitive approach would do the trick. Let us define $r_i'$ to be the interior of the rectangle $r_i$ along with its corresponding edge from $R$. Note that if $R_S\cap r_i\neq\emptyset$ then $R\setminus R_S\subseteq\bigcup\limits_{i}r_i'$ (geometric argument). We can now conclude the proof by showing that $\Pr[r_i']\le\epsilon/4$. You can prove this by a similar approach, define a sequence of events converging to $r_1'$, and use the fact the the probability for each event in the sequence is upper bounded by $\epsilon/4$.
Context
StackExchange Computer Science Q#88442, answer score: 2
Revisions (0)
No revisions yet.