patternpythonMinor
Iterate over coordinates and correct constraint violations
Viewed 0 times
coordinatesiterateviolationscorrectconstraintandover
Problem
I am creating an algorithm to take a series of labelled points placed randomly and move them until they fulfil a set of constraints.
The constraints have been pre-computed and are each a lower and upper bound of the distance between two points. There is definitely a solution that satisfies all constraints.
The procedure is as follows:
The code below works but it is slow when I have, say, 2000 points and each has a constraint with each other.
I am looking for input on how to speed up the code below. The procedure outlined above must be followed, but any algorithm that implements it is allowed.
I have a few ideas for improving it, including:
But I would appreciate input on which of these is fastest, works best in Python etc. Does anyone have a feel for how good the 'best' performance of Python will be compared to, for example, C++?
```
import numpy as np
import random
# inters is a list of lists of lists: the first index is point_1_no, the second index is point_2_no and
# the third index is either 0 for the lower bound or 1 for the upper bound.
# See below for sample data and code for reading it into the inters list.
random_size = 20
locations = [np.array([random.random() * random_size for j in range(3)]) for i in range(len(inters))]
while
The constraints have been pre-computed and are each a lower and upper bound of the distance between two points. There is definitely a solution that satisfies all constraints.
The procedure is as follows:
- Every pair of points is checked. If the distance violates the constraint, the points are moved an equal amount apart/together to be at a uniformly-random distance between the bounds.
- After each pair has been checked, the number of violations is recomputed (as some pairs will be moved having been fixed) and if it is greater than zero, every pair is iterated over again.
- This process is repeated up to a max number of times. If violations are still present, the coordinates are re-randomised and the whole process starts again.
The code below works but it is slow when I have, say, 2000 points and each has a constraint with each other.
I am looking for input on how to speed up the code below. The procedure outlined above must be followed, but any algorithm that implements it is allowed.
I have a few ideas for improving it, including:
- Don't take the square roots: compare square distances instead.
- Have a list of pairs that need to be 'checked' rather than checking the distances of all pairs.
- Be clever with NumPy arrays.
But I would appreciate input on which of these is fastest, works best in Python etc. Does anyone have a feel for how good the 'best' performance of Python will be compared to, for example, C++?
```
import numpy as np
import random
# inters is a list of lists of lists: the first index is point_1_no, the second index is point_2_no and
# the third index is either 0 for the lower bound or 1 for the upper bound.
# See below for sample data and code for reading it into the inters list.
random_size = 20
locations = [np.array([random.random() * random_size for j in range(3)]) for i in range(len(inters))]
while
Solution
\$
\newcommand{\pt}[2]{(#1 \!\times\! #2)}
\$Here's some commentary about potential algorithms particularly suited to Python. Note that a general optimal algorithm might not be suitable for Python because Numpy almost requires you to be able to vectorize your calculations.
The aim, basically, would be to make maximal usage of
You should then utilize
to find the \$\pt nn\$ matrix of distances.
The problem with the current technique is the quick invalidation of these generated distances. When you call
This means that \$\mathcal{O}(n^2)\$ work must be done to move at most \$n\$ points once each. The alternative is what you have done: calculate distances and directions independently with each
A faster algorithm (in Python) would require vectorized calls to move the points. The sensible thing to me would be a pseudo-physical simulation; each point has two forces it applies to every other point according to some potential. One should fall off exponentially (or at \$1/d\$) whereas the other should rise exponentially. Where the distance, \$d\$, is between the limits the forces should (approximately) cancel.
The equations can be alike to
I have seen something like this used effectively for an online visualisation, although cannot remember what. I suggest doing something similar to simulated annealing where the sharpness of the curve starts pretty flat (say, approx. linearithmic) and ends up close to an infinite potential well.
Be careful not to ramp up the energy as well; annealing works as the states move slower towards the solution.
When reaching a non-solution, instead of restarting one should consider finding the most displaced fraction of points and permuting their positions with their close neighbours. This will allow near-solutions to complete more quickly. The radius of permuted points should probably increase with the number of failed attempts.
Note that these are very hand-wavy thoughts and I haven't tested them for efficiency or even effectivity. It's possible this algorithm would be rather slow at reaching a solution or would get to a close solution and then stop. It, however, seems like a sensible option.
The main problem with fast convergence with the original solution was its convergence; imagine something like
With the natural state
A-B
A-C
B-C
A-B
A-C
...and so on.
With a more physical, simultaneous simulation, I'd expect convergence more like
Namely, prior movements won't cause unwanted collateral damage since no move will be independent of other point's requirements.
Again, just rambling. I've not based this off of any paper nor any trials of my own. It just feels appropriate.
\newcommand{\pt}[2]{(#1 \!\times\! #2)}
\$Here's some commentary about potential algorithms particularly suited to Python. Note that a general optimal algorithm might not be suitable for Python because Numpy almost requires you to be able to vectorize your calculations.
The aim, basically, would be to make maximal usage of
scipy.spatial.distance. This will require moving the data into homogeneous pure-Numpy arrays. I suggest- \$\pt n3\$ array of points
- \$\pt nn\$ array of lower limits
- \$\pt nn\$ array of upper limits
You should then utilize
from scipy.spatial import distance
distances = distance.squareform(distance.pdist(points))to find the \$\pt nn\$ matrix of distances.
The problem with the current technique is the quick invalidation of these generated distances. When you call
move_pair or anything that emulates it, you can not further involve it in movements until a new call to distance.pdist(points).This means that \$\mathcal{O}(n^2)\$ work must be done to move at most \$n\$ points once each. The alternative is what you have done: calculate distances and directions independently with each
move_pair.A faster algorithm (in Python) would require vectorized calls to move the points. The sensible thing to me would be a pseudo-physical simulation; each point has two forces it applies to every other point according to some potential. One should fall off exponentially (or at \$1/d\$) whereas the other should rise exponentially. Where the distance, \$d\$, is between the limits the forces should (approximately) cancel.
The equations can be alike to
import numpy
force_out = numpy.exp(distances - upper_limits)
force_in = numpy.reciprocal(lower_limits - distances)I have seen something like this used effectively for an online visualisation, although cannot remember what. I suggest doing something similar to simulated annealing where the sharpness of the curve starts pretty flat (say, approx. linearithmic) and ends up close to an infinite potential well.
from numpy import linalg, newaxis
# Normalized vectors
differences = points - points[:, newaxis]
differences /= linalg.norm(differences, axis=-1)[..., newaxis]
# Apply "simulated annealing"... sort of
total_force = (force_out + force_in) ** time
# Get displacements
displacements = differences * total_forceBe careful not to ramp up the energy as well; annealing works as the states move slower towards the solution.
When reaching a non-solution, instead of restarting one should consider finding the most displaced fraction of points and permuting their positions with their close neighbours. This will allow near-solutions to complete more quickly. The radius of permuted points should probably increase with the number of failed attempts.
Note that these are very hand-wavy thoughts and I haven't tested them for efficiency or even effectivity. It's possible this algorithm would be rather slow at reaching a solution or would get to a close solution and then stop. It, however, seems like a sensible option.
The main problem with fast convergence with the original solution was its convergence; imagine something like
A C
BWith the natural state
A B C. This could iterateA-B
A C
BA-C
A C
BB-C
A C
BA-B
A
C
BA-C
A
C
B...and so on.
With a more physical, simultaneous simulation, I'd expect convergence more like
A→→→ ←←←C
B→A→→ ←←C
B→A ↑ C
BA B CNamely, prior movements won't cause unwanted collateral damage since no move will be independent of other point's requirements.
Again, just rambling. I've not based this off of any paper nor any trials of my own. It just feels appropriate.
Code Snippets
from scipy.spatial import distance
distances = distance.squareform(distance.pdist(points))import numpy
force_out = numpy.exp(distances - upper_limits)
force_in = numpy.reciprocal(lower_limits - distances)from numpy import linalg, newaxis
# Normalized vectors
differences = points - points[:, newaxis]
differences /= linalg.norm(differences, axis=-1)[..., newaxis]
# Apply "simulated annealing"... sort of
total_force = (force_out + force_in) ** time
# Get displacements
displacements = differences * total_forceA C
BA C
BContext
StackExchange Code Review Q#77905, answer score: 2
Revisions (0)
No revisions yet.