snippetpythonModerate
Order a list of tuples or a NumPy array in a specific format
Viewed 0 times
formatordernumpyarrayspecificlisttuples
Problem
I have a list of tuples or a NumPy array (it can be either of them as the variable comes as a list and will end up being a NumPy array) and I want to order them in a specific way (that I am not able to phrase it).
Let's make an example:
Desired output:
(or a NumPy array of 2x4 will do also)
The logic behind it is to get the first row, read its last value and find that value in the array. Get that tuple, swap it to make the found value the first one (if needed) and repeat.
The idea is that ...N),(N... , N is the same. There will always be 2 and only 2 repetitions of the numbers.
I can figure out how to do it "C-like", iterating, searching, swapping etc, but I wonder if there is a pythonic way (those amazing one-liners) of achieving it. Ultimately I am just trying to increase my Python-fu.
Let's make an example:
boundary=[(1, 2), (1, 3), (3, 4), (2, 4)];Desired output:
[(1, 2), (2, 4), (4, 3), (3, 1)](or a NumPy array of 2x4 will do also)
The logic behind it is to get the first row, read its last value and find that value in the array. Get that tuple, swap it to make the found value the first one (if needed) and repeat.
The idea is that ...N),(N... , N is the same. There will always be 2 and only 2 repetitions of the numbers.
I can figure out how to do it "C-like", iterating, searching, swapping etc, but I wonder if there is a pythonic way (those amazing one-liners) of achieving it. Ultimately I am just trying to increase my Python-fu.
boundary=np.array(boundary)
ordered=np.zeros([boundary.shape[0],boundary.shape[1]]);
idx=0
idy=0
for i in range(len(boundary)):
ordered[i,0]=boundary[idx,idy]
ordered[i,1]=boundary[idx,(idy+1)%2]
ix,iy=np.where(boundary==boundary[idx,(idy+1)%2])
for j in range(len(ix)):
isused=np.where(np.all(ordered==boundary[ix[j],],axis=1))
if isused[0].size==0:
idx=ix[j]
idy=iy[j]
print idx, idy
breakSolution
This is really a graph problem! Draw a graph with a vertex for every number in your list, and an edge between vertices \$a\$ and \$b\$ if there is a pair \$(a, b)\$ or \$(b, a)\$ in your list:
What you want to do is to find a path in this graph that visits all the edges. Such a path is known as an Eulerian path. Your condition that each number appears exactly twice in the list means that the path must return to the original vertex: that makes it an Eulerian cycle.
To find this cycle, we must first represent the graph using a suitable data structure. Here we can use the adjacency list representation (really, adjacency set):
Then, to find the path, we can use Hierholzer's algorithm. This is really simple: we just pick a starting vertex and then repeatedly pick any unvisited edge from that vertex. Your condition that each number appears exactly twice in the list of edges means that the algorithm can't get stuck: there are exactly two edges meeting each vertex, so having gone in on one we can go out on the other:
What might happen, however, is that your data contains multiple disjoint cycles. For example if the input is:
then this corresponds to the graph:
The
Update
If you're determined to stick to NumPy, then consider using the Compressed Sparse Graph Routines. The idea is to use
and then call
If you need to turn this array of vertices back into an array of edges (as in the OP), you could use
What you want to do is to find a path in this graph that visits all the edges. Such a path is known as an Eulerian path. Your condition that each number appears exactly twice in the list means that the path must return to the original vertex: that makes it an Eulerian cycle.
To find this cycle, we must first represent the graph using a suitable data structure. Here we can use the adjacency list representation (really, adjacency set):
from collections import defaultdict
def graph(edges):
"""Given a list of edges, return a map from vertex to a set of
adjacent vertices.
>>> graph([(1, 2), (1, 3), (3, 4), (2, 4)])
defaultdict(, {1: {2, 3}, 2: {1, 4}, 3: {1, 4}, 4: {2, 3}})
"""
g = defaultdict(set)
for v, w in edges:
g[v].add(w)
g[w].add(v)
return gThen, to find the path, we can use Hierholzer's algorithm. This is really simple: we just pick a starting vertex and then repeatedly pick any unvisited edge from that vertex. Your condition that each number appears exactly twice in the list of edges means that the algorithm can't get stuck: there are exactly two edges meeting each vertex, so having gone in on one we can go out on the other:
def eulerian_path(edges, v):
"""Generate an Eulerian path over the given collection of edges,
starting at the vertex v.
>>> list(eulerian_path([(1, 2), (1, 3), (3, 4), (2, 4)], 1))
[(1, 2), (2, 4), (4, 3), (3, 1)]
"""
g = graph(edges)
while True:
try:
w = g[v].pop()
except KeyError:
return
g[w].remove(v)
yield v, w
v = wWhat might happen, however, is that your data contains multiple disjoint cycles. For example if the input is:
[(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 4)]then this corresponds to the graph:
The
eulerian_path algorithm only finds one of the cycles. What should you do in ths case? Stop at the end of the first cycle? Keep going and find all the other cycles? Maybe this can't happen in your problem, so raise an error? You didn't explain what your problem really is, so I can't tell you what the right thing is to do in this case.Update
If you're determined to stick to NumPy, then consider using the Compressed Sparse Graph Routines. The idea is to use
scipy.sparse.csr_matrix to convert your array into a CSR matrix:>>> edges = np.array([(1, 2), (1, 3), (3, 4), (2, 4)])
>>> m = scipy.sparse.csr_matrix((np.ones(len(edges)), (edges[:,0], edges[:,1])), shape=(5, 5))and then call
scipy.sparse.csgraph.depth_first_order to get the vertices on the Eulerian path:>>> scipy.sparse.csgraph.depth_first_order(m, 1, False)[0]
array([1, 2, 4, 3], dtype=int32)If you need to turn this array of vertices back into an array of edges (as in the OP), you could use
numpy.vstack and numpy.roll, like this:>>> np.vstack((vertices, np.roll(vertices, -1))).T
array([[1, 2],
[2, 4],
[4, 3],
[3, 1]], dtype=int32)Code Snippets
from collections import defaultdict
def graph(edges):
"""Given a list of edges, return a map from vertex to a set of
adjacent vertices.
>>> graph([(1, 2), (1, 3), (3, 4), (2, 4)])
defaultdict(<class 'set'>, {1: {2, 3}, 2: {1, 4}, 3: {1, 4}, 4: {2, 3}})
"""
g = defaultdict(set)
for v, w in edges:
g[v].add(w)
g[w].add(v)
return gdef eulerian_path(edges, v):
"""Generate an Eulerian path over the given collection of edges,
starting at the vertex v.
>>> list(eulerian_path([(1, 2), (1, 3), (3, 4), (2, 4)], 1))
[(1, 2), (2, 4), (4, 3), (3, 1)]
"""
g = graph(edges)
while True:
try:
w = g[v].pop()
except KeyError:
return
g[w].remove(v)
yield v, w
v = w[(1, 2), (2, 3), (3, 1), (4, 5), (5, 6), (6, 4)]>>> edges = np.array([(1, 2), (1, 3), (3, 4), (2, 4)])
>>> m = scipy.sparse.csr_matrix((np.ones(len(edges)), (edges[:,0], edges[:,1])), shape=(5, 5))>>> scipy.sparse.csgraph.depth_first_order(m, 1, False)[0]
array([1, 2, 4, 3], dtype=int32)Context
StackExchange Code Review Q#95598, answer score: 11
Revisions (0)
No revisions yet.