patternMinor
Mathematical optimization on a noisy function
Viewed 0 times
optimizationnoisyfunctionmathematical
Problem
Let $f:\mathbb{R}^d \to \mathbb{R}$ be a function that is fairly nice (e.g., continuous, differentiable, not too many local maxima, maybe concave, etc.). I want to find a maxima of $f$: a value $x \in \mathbb{R}^d$ that makes $f(x)$ as large as possible.
If I had a procedure to evaluate $f$ precisely on any input of my choice, I could use standard mathematical optimization techniques: hill-climbing, gradient descent (well, gradient ascent), etc. However, in my application I don't have a way to evaluate $f(x)$ exactly. Instead, I have a way to estimate the value of $f(x)$.
In particular, given any $x$ and any $\varepsilon$, I have an oracle that will output an estimate of $f(x)$, and whose expected error is approximately $\varepsilon$. The running time of this oracle invocation is proportional to $1/\varepsilon^2$. (It is implemented by a kind of simulation; the accuracy of the simulation increases with the square root of the number of trials, and I can choose how many trials to run, so I can choose the desired accuracy.) So this gives me a way to get an estimate of any accuracy I desire, but the more accurate I want the estimate to be, the longer it will take me.
Given this noisy oracle for $f$, are there any techniques for computing a maxima of $f$ as efficiently as possible? (Or, more precisely, finding an approximate maxima.) Are there variants of hill-climbing, gradient descent, etc., that work within this model?
Of course, I could fix a very small value of $\varepsilon$ and apply hill-climbing or gradient descent with this oracle, keeping the same $\varepsilon$ throughout. However, this might be unnecessarily inefficient: we might not need such a precise estimate near the beginning, whereas precision near the end when you are zeroing in on the solution is more important. So is there any way to take advantage of my ability to control the accuracy of my estimate dynamically, to make the optimization process more efficient? Has this kind of problem
If I had a procedure to evaluate $f$ precisely on any input of my choice, I could use standard mathematical optimization techniques: hill-climbing, gradient descent (well, gradient ascent), etc. However, in my application I don't have a way to evaluate $f(x)$ exactly. Instead, I have a way to estimate the value of $f(x)$.
In particular, given any $x$ and any $\varepsilon$, I have an oracle that will output an estimate of $f(x)$, and whose expected error is approximately $\varepsilon$. The running time of this oracle invocation is proportional to $1/\varepsilon^2$. (It is implemented by a kind of simulation; the accuracy of the simulation increases with the square root of the number of trials, and I can choose how many trials to run, so I can choose the desired accuracy.) So this gives me a way to get an estimate of any accuracy I desire, but the more accurate I want the estimate to be, the longer it will take me.
Given this noisy oracle for $f$, are there any techniques for computing a maxima of $f$ as efficiently as possible? (Or, more precisely, finding an approximate maxima.) Are there variants of hill-climbing, gradient descent, etc., that work within this model?
Of course, I could fix a very small value of $\varepsilon$ and apply hill-climbing or gradient descent with this oracle, keeping the same $\varepsilon$ throughout. However, this might be unnecessarily inefficient: we might not need such a precise estimate near the beginning, whereas precision near the end when you are zeroing in on the solution is more important. So is there any way to take advantage of my ability to control the accuracy of my estimate dynamically, to make the optimization process more efficient? Has this kind of problem
Solution
One could replace the exact function $f(x,p)$ by the noisy function $f(x+\Delta x, p + \Delta p)$, where $p$ is an artificial parameter used to describe the noise dependency such that $\Delta x$ and $\Delta p$ contain the noise.
- Some techniques used in stochastic optimization and robust optimization might be applicable.
- Because $\frac{\partial f}{\partial x}\approx 0$ near the maxima, $\Delta x$ is less dangerous than $\Delta p$.
- Sometimes $\frac{\partial f}{\partial x}(\tilde{x}, \tilde{p})$ can be approximated accurately while evaluating $f(\tilde{x}, \tilde{p})$. Often, this is only true in theory, because it is not implemented, and some parts would require special care.
- The desired "smallness" of $\Delta p$ (and $\Delta x$) is an "end user" decision. One can offer heuristics to control it, but a runtime proportional to $1/\epsilon^2$ is too slow for fully automatic accuracy handling.
- The given noise vs runtime tradeoff is what sets this problem apart from better studied problems. Problems were the noise is just unavoidable are more common and better studied.
Context
StackExchange Computer Science Q#39546, answer score: 4
Revisions (0)
No revisions yet.