patterncppMinor
Optimising a custom tree data structure
Viewed 0 times
optimisingcustomstructuredatatree
Problem
I have the following 'custom' tree data structure. This concept was to be thought of as, if given ("abb", 20) and ("abc", 30), then
Or
With b == 30, and c == 20.
Therefore taking up very limited space, and a very fast search.
However, as depth increases, the number of bytes for a fully explored tree was exponential. In the above case, 4 * 256 bytes were required for valid indexing to occur. This was not good.
It now works in the following way; for each branch, compare its value against the n character; i.e. introducing a worst case O(256) character comparison for each depth. This is not ideal, and what I want to attempt to change.
The other question was (and still is), what is it? I couldn't quite classify it as anything but a Tree; nor relate it to any
Obviously, this code is a bit sketchy in how it handles the templates at the moment. A number of things could go wrong, but my interest (other than classification) is on improving this data structure.
```
template
class Tree {
unsigned int depth;
Tree* parent;
std::vector branches;
public:
K key;
V val;
int n_trees;
Tree(K key = 0, int depth = 0, Tree* parent = 0) : depth(depth), parent(parent), key(key) {
val = 0;
n_trees = 0;
inc();
}
void inc() {
if (parent) parent->inc();
n_trees++;
}
void add(const K* search, const unsigned int length, const V value) {
const int index = search[depth];
if (length <= depth) {
val = value;
return;
}
for (unsigned int i = 0; i < branches.size(); ++i) {
if (branches[i].key == index) {
Tree['a']['b']['b'] == 20Tree['a']['b']['c'] == 30Or
a
\
b
/ \
b cWith b == 30, and c == 20.
Therefore taking up very limited space, and a very fast search.
However, as depth increases, the number of bytes for a fully explored tree was exponential. In the above case, 4 * 256 bytes were required for valid indexing to occur. This was not good.
It now works in the following way; for each branch, compare its value against the n character; i.e. introducing a worst case O(256) character comparison for each depth. This is not ideal, and what I want to attempt to change.
The other question was (and still is), what is it? I couldn't quite classify it as anything but a Tree; nor relate it to any
std:: containers for comparison. It is very good at what it does though, out-performing a std::map with small strings and very large strings (sizes ~20, ~500 respectively); past 600, the data structure begins to get too heavy.Obviously, this code is a bit sketchy in how it handles the templates at the moment. A number of things could go wrong, but my interest (other than classification) is on improving this data structure.
```
template
class Tree {
unsigned int depth;
Tree* parent;
std::vector branches;
public:
K key;
V val;
int n_trees;
Tree(K key = 0, int depth = 0, Tree* parent = 0) : depth(depth), parent(parent), key(key) {
val = 0;
n_trees = 0;
inc();
}
void inc() {
if (parent) parent->inc();
n_trees++;
}
void add(const K* search, const unsigned int length, const V value) {
const int index = search[depth];
if (length <= depth) {
val = value;
return;
}
for (unsigned int i = 0; i < branches.size(); ++i) {
if (branches[i].key == index) {
Solution
Why do you have
I don't love using a
I suggest that you keep the children sorted by
Now you can rewrite
The size of
Speaking of
It's not really clear that you want
or
(if you have access to a C++14 implementation, even better is
Whichever you choose, use the same overloads as with
It is very weird to me to see an
m_parent? I don't think there are any reasonable operations on a trie that require a parent pointer in each node, and removing it can reduce the space used a lot.I don't love using a
depth argument in add. I'd use the following instead:void add(const std::string& key, const T& value) {
return add(key.data(), key.size(), value);
}
void add(const char *key, const T& value) {
return add(key, std::strlen(key), value);
}
void add(const char* key, std::size_t len, const T& value); // impl belowI suggest that you keep the children sorted by
m_key. Write a comparison functor like so:struct KeyLessThan {
bool operator()(const PrefixTree& l, const PrefixTree& r) const {
return l.m_key < r.m_key;
}
bool operator()(const PrefixTree& tree, char c) const {
return l.m_key < c;
}
};Now you can rewrite
add like this (using the signature above):void PrefixTree::add(const char* key, std::size_t len, const T& value) {
assert(len >= 0);
if (len == 0) {
m_value = value;
return;
}
auto it = std::lower_bound(begin(m_branches), end(m_branches),
*key, KeyLessThan());
if (it == end || it->m_key != *key) {
it = m_branches.emplace(it, *key);
}
it->add(key + 1, len - 1, value);
}The size of
m_branches is bounded to be quite small (256 elements in theory; in practice if you're doing English words only it'll be 26 or 52 at the root and much smaller in interior nodes), so this should be quite fast, and it'll make find faster.Speaking of
m_branches, m_children would be a more usual name for it.It's not really clear that you want
find to return a T*; it's odd to give out handles to internal memory in a data structure like this. I would make the signature one ofconst T* find(const std::string& key) const
std::pair find(const std::string& key) constor
std::pair find(const std::string& key) const(if you have access to a C++14 implementation, even better is
std::optional find(const std::string& key) constWhichever you choose, use the same overloads as with
add for const char *, etc.It is very weird to me to see an
#include that's not at the top of the file, and to see printfs in the middle of a C++ file.Code Snippets
void add(const std::string& key, const T& value) {
return add(key.data(), key.size(), value);
}
void add(const char *key, const T& value) {
return add(key, std::strlen(key), value);
}
void add(const char* key, std::size_t len, const T& value); // impl belowstruct KeyLessThan {
bool operator()(const PrefixTree& l, const PrefixTree& r) const {
return l.m_key < r.m_key;
}
bool operator()(const PrefixTree& tree, char c) const {
return l.m_key < c;
}
};void PrefixTree::add(const char* key, std::size_t len, const T& value) {
assert(len >= 0);
if (len == 0) {
m_value = value;
return;
}
auto it = std::lower_bound(begin(m_branches), end(m_branches),
*key, KeyLessThan());
if (it == end || it->m_key != *key) {
it = m_branches.emplace(it, *key);
}
it->add(key + 1, len - 1, value);
}const T* find(const std::string& key) const
std::pair<bool, const T&> find(const std::string& key) conststd::pair<bool, T> find(const std::string& key) constContext
StackExchange Code Review Q#1322, answer score: 7
Revisions (0)
No revisions yet.