patterncppMinor
Curvesort in C++
Viewed 0 times
curvesortstackoverflowprogramming
Problem
Curvesort is an adaptive integer sorting algorithm I discovered independently. It maintains a doubly-linked list, whose nodes have the following structure (in pseudo-C code):
Above,
The intuition behind the adaptiveness of the algorithm is as follows: if you draw a "graph" of the input sequence \$A\$ as a set of points \$(i, A_i)\$, the "smoother" the graph, the faster the algorithm finishes its work.
So the best running time is linear. Otherwise, it is not worse than \$\Theta(nk)\$, where \$k\$ is the number of distinct integers. Since \$k \leq n\$, the worst case running time is not worse than \$\Theta(n^2)\$.
My code is as follows:
curvesort.h:
```
#ifndef CURVESORT_H
#define CURVESORT_H
#include
#include
#include
#include
namespace net {
namespace coderodde {
namespace sorting {
template
void sort_impl(Iter begin, Iter end, std::true_type)
{
using value_type =
typename std::iterator_traits::value_type;
class Curvesort {
struct Node {
value_type key;
size_t count;
struct Node* prev;
struct Node* next;
Node(value_type key) : key{key}, count{1} {}
};
value_type previous_key;
Node* head;
Node* tail;
Node* prev_updated; // Points to the most recently added or
// updated nod
struct node {
int key;
int count;
node* next;
node* prev;
}Above,
count gives you the number of times key appeared in the sequence so far. The entire list is sorted by node.key values. Also, as the algorithm scans the input sequence, it keeps track of the last processed integer and the node that holds it. This allows us to insert a new element faster if it is "closer" to the previously processed integer.The intuition behind the adaptiveness of the algorithm is as follows: if you draw a "graph" of the input sequence \$A\$ as a set of points \$(i, A_i)\$, the "smoother" the graph, the faster the algorithm finishes its work.
So the best running time is linear. Otherwise, it is not worse than \$\Theta(nk)\$, where \$k\$ is the number of distinct integers. Since \$k \leq n\$, the worst case running time is not worse than \$\Theta(n^2)\$.
My code is as follows:
curvesort.h:
```
#ifndef CURVESORT_H
#define CURVESORT_H
#include
#include
#include
#include
namespace net {
namespace coderodde {
namespace sorting {
template
void sort_impl(Iter begin, Iter end, std::true_type)
{
using value_type =
typename std::iterator_traits::value_type;
class Curvesort {
struct Node {
value_type key;
size_t count;
struct Node* prev;
struct Node* next;
Node(value_type key) : key{key}, count{1} {}
};
value_type previous_key;
Node* head;
Node* tail;
Node* prev_updated; // Points to the most recently added or
// updated nod
Solution
-
You forgot to
-
There is no need for your limitation on integral types. You could just as well use the algorithm for any type with overloaded
You forgot to
#include in main.cpp.-
There is no need for your limitation on integral types. You could just as well use the algorithm for any type with overloaded
operator
-
I don't know why you write struct Node, Node is just fine in C++. It might even mask some errors, because struct SomeTypo x; will introduce SomeTypo as incomplete type, while SomeTypo x; will give an error if SomeTypo was not declared at that point.
-
In C++11, which your code seems to be, you can easily offload memory management to std::unique_ptr in most cases. This makes the code shorter, clearer and safer, e.g. make Node::next and Curvesort::head std::unqiue_ptr and leave the rest as Node*. Then each node is owning its successor node and head owns the first node. As soon as Curvesort is destroyed, head will be destoyed, which is owning the first node and it too will be destroyed, cascading down to the last node. There might be a problem with recursion for every node here if tail recursion optimization is not possible, though. In that case you can still use a custom destructor moving through nodes iteratively.
-
You can spare a few lines of code, if you put prev{nullptr} and next{nullptr} in the constructor of Node.
-
Your code has undefined behavior if the input length is zero. You either need to check that this is not the case or change up your code a bit to work without handling the first node in a special way.
-
It is unnecessary to repeat the node creation code so often. Just make a function insert_before_node(Node** node, const value_type& value) which inserts a new node before the node pointed to by node and updated node to point to the new node. In fact I think this belongs into Node, rather than Curvesort, so that Node will handle its own memory.
-
I think holding the previous_key is unnecesary, as you can just get it from one indirection to prev_updated.
-
You are basically reimplementing a std::list holding pairs of value_type and std::size_t. Maybe try using the standard library directly. I tried it out (I deleted the file accidentially though) and got your algorithm down in about 15 lines or so with std::list and no performance impact on the list test case and only about 10% additional time on the vector test case (but that might be improvable).
-
If you used a tree (std::map) instead of a list to store your sorted values, then you can reduce the time to O(log(k)n). I think this should also be possible while still keeping your O(n) behavior for smooth input.
-
If you used a hash map (std::unordered_map) instead of a list to store your sorted values, then you can reduce the time to O(n+log(k)*k) by counting all elements with the same key and ordering keys with std::sort` afterwards. However the complexity for smooth data would not be better than that in this case, except if many consecutive values are identical.Context
StackExchange Code Review Q#142976, answer score: 4
Revisions (0)
No revisions yet.