HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Sort a stack of numbers in ascending order only using push

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
orderstacknumberspushusingsortonlyascending

Problem

I have a stack of numbers ranging from 1 to 1 million representing a pile of numbered DVDs. Given the stack in random order I am to sort the stack by only taking an element from the stack and push it to the top and then calculate the minimum of such pushes necessary. My code is way too inefficient when dealing with large amounts of elements in the stack and I have a requirement that it must run in less than 4 seconds for any input size < 1 million. Right now it took me 28 secs to run with an input of 10 000 DVDs, so it's nowhere close to the goal!

How can I make the code more efficient?

```
import heapq

class Node:
# Creates a DVD node
# right och left corresponds to up and down in the stack
def __init__(self, value):
self.value = value
self.right = None
self.left = None

def __lt__(self, other): # necessary for comparing node size in the heap
return self.value lowest_number: # Then last_element must be moved
if not remember_used[last_element.value-1]: # A new DVD not already in the heap , O(1)
heapq.heappush(elements_to_be_moved, last_element) # Heapq sortes the heap upon appending , O(logN)
count_movements += 1 # Every DVD's number is unique so no crashes will occur
remember_used[last_element.value-1] = True
if last_element.value < lowest_number:
lowest_number = last_element.value

last_element = last_element.left

return elements_to_be_moved, count_movements, remember_used

def move_elements(stack, elements_to_be_moved, nodes):
# pushes the dvd in the right order that gives us a sorted stack in least amount of movements
next_item = elements_to_be_moved[0]
item_value = next_item.value
heapq.heappop(elements_to_be_moved) # O(logN)
first_element = stack._first
keep_going = True
stack.push(nodes[item_value]) # O(1)

return elements_to_be_moved

def read_data():
#Reads input
#No

Solution

As always, you can make your code much more efficient by analysing the problem algorithmically before you start coding. Please note that the problem - Kattis DVDs - does not ask you to sort the given input; it asks for the minimum number of steps needed for sorting in the manner described.

The problem is special in two regards. First, the input is a permutation of the numbers 1 through n. No duplicates, no gaps. Every number between 1 and n occurs exactly once. Second, the sorting procedure can only move DVDs to the back of the sequence (top of the stack); this means that the only DVDs which need not be moved are those which are part of the perfectly increasing subsequence that starts with the DVD labelled '1'. Everything else needs to be moved, and in the optimal case (minimal number of sorting steps) each out-of-order DVD will be moved exactly once.

Hence it is trivial to compute the minimum number of moves as asked in the programming challenge, like in this pseudo code:

int next = 1;

foreach (int x in input_vector)
if (x == next)
++next;

result = input_vector.Length - (next - 1);


That takes care of the Kattis challenge but the real challenge that presents itself here - should one be so inclined - is to actually sort the DVD stack in the described manner, because that turns out to be surprisingly tricky.

One could imagine processing the input in order to generate a list of numbers for a hypothetical robot, where each number k signifies 'pull the kth DVD from the stack and put it on top'. Or one could imagine writing a control program for said robot.

A simple strategy that sorts the stack using the minimal number of moves (and a lot of time) would be this:

  • find the last DVD which is part of the perfectly increasing subsequence that starts with 1



  • set k to the name of this DVD



  • while ++k



This algorithm is obviously quadratic in n and hence awfully slow.

Building a list of DVDs to be moved based on their initial positions is easy enough and efficiently done. A single pass through the input yields the initial slot indices for all DVDs that are not part of a perfect increasing subsequence that starts at #1, and sorting the pairs of slot index and DVD label into label order isn't rocket science.

However, when outputting the list of slot indices for the robot it is necessary to adjust the indices, because pulling a DVD causes all DVDs on top of it to move down by one. In other words, it is necessary to subtract from each index the number of lower slot indices which have already been output at that point. This boils down to a linear scan again, unless a data structure is employed that reduces the effort to log(n) or better.

The hypothetical robot control program could use the same strategy directly, or something analogous; in any case the same subproblem would be encountered. Hence the really interesting element in this challenge is the support structure for counting elements that are less in both label order and slot order.

The ideal structure for this purpose would be an ordered set with efficient rank computation, since the label element of each pair can be implied via query/insert order during the processing. Van Emde Boas trees come to mind but I've yet to meet anyone who'd program such a beast voluntarily, without a lot of green-backed inducement. A sorted vector would certainly do the job up to the point where insert cost becomes prohibitive, which is somewhere between 10^4 and 10^5. It would essentially replace the effort of the scan with a more efficient memory move but bigger inputs require something smarter.

In an imperative language the best bang for buck would be a simple insert-only form of a B-tree, or a hybrid like a summarising index on top of a simple bitset. The difficulty with the usual builtin tree-based maps/sets - apart from their poor performance - is the fact they normally don't expose the underlying tree structure, which makes it impossible to implement the log(n) rank order computation with a few simple tweaks.

A structure that I've used occasionally is a brutally simple ersatz B-tree with just two levels, which is comparatively easy to implement because it is insert-only and elements move only left to right overall. The frequency of inserts into the root vector - and hence the cost of moving stuff around in it - is effectively divided by the size of the leaves, and hence the 'reach' of the simple sorted vector solution gets effectively multiplied by the leaf capacity (modulo average utilisation).

The case under consideration is a bit more demanding than what I've used this for (e.g. questions like 'what are then ten million best results out of a few billion') because the top level needs to be scanned linearly towards the nearer end in order to determine the base rank of a given page. Again, the cost of the original linear scan gets effectively divided by the page size.

In a competitive situation - in order to achieve unrivalled speed - it mig

Context

StackExchange Code Review Q#118191, answer score: 4

Revisions (0)

No revisions yet.