patternMinor
What is the minimum required storage for a sparse, depth-first octree?
Viewed 0 times
depththewhatminimumstorageoctreefirstforsparserequired
Problem
For a numerical simulation framework, I use a hierarchical Cartesian grid in 3D to discretize the computational domain. I am thus looking for the most space-efficient way to store the resulting octree on disk, given the following conditions:
The best I can come up with is 6 7 bits per node: 3 bits to indicate the position of a node with respect to its parent, and 3 4 bits to store how many child nodes exist. However, intuition tells me there should be a more efficient way. Please note that algorithmic efficiency is not part of the question, as its representation in memory will be different anyways.
P.S.: Please let me know if CS is not the right SE venue for this kind of question.
Update
The resulting data structure will be stored in a file on disk, thus the need for an efficient encoding of the octree. There are additional data files that contain information associated with each existing node (e.g., solution data). To have a one-to-one relationship between nodes ("cells") in the octree file and the datasets, it is required to store all internal nodes (i.e., non-leaf nodes) in the octree file as well.
Example
An octree with 5 nodes: 0, 1, ..., 4. Their relationship is the following:
The resulting tree would look something like this (with missing nodes omitted)
and the nodes need to be stored
- It is very sparse (i.e., not all nodes exist), but potentially deep (a depth of 20 or beyond)
- It is stored depth-first (pre-order)
- All parent nodes up to the root node must be stored (i.e., it is not sufficient to store just the leave nodes)
- The amount of data per node is a global constant
The best I can come up with is 6 7 bits per node: 3 bits to indicate the position of a node with respect to its parent, and 3 4 bits to store how many child nodes exist. However, intuition tells me there should be a more efficient way. Please note that algorithmic efficiency is not part of the question, as its representation in memory will be different anyways.
P.S.: Please let me know if CS is not the right SE venue for this kind of question.
Update
The resulting data structure will be stored in a file on disk, thus the need for an efficient encoding of the octree. There are additional data files that contain information associated with each existing node (e.g., solution data). To have a one-to-one relationship between nodes ("cells") in the octree file and the datasets, it is required to store all internal nodes (i.e., non-leaf nodes) in the octree file as well.
Example
An octree with 5 nodes: 0, 1, ..., 4. Their relationship is the following:
- 0 is the root
- 1 is child 0 of node 0
- 2 is child 1 of node 1
- 3 is child 5 of node 1
- 4 is child 1 of node 0
The resulting tree would look something like this (with missing nodes omitted)
0
/ \
1 4
/ \
2 3
and the nodes need to be stored
0 1 2 3 4 (pre-order depth-first).Solution
The number of octrees with $n$ nodes is OEIS sequence A007556:
$$T(n) = {8n \choose n}/(7n+1)$$
I worked this out, incidentally, by writing a short program to generate the first few values of $T(n)$ and then searching for it.
If you had no other information than $n$, then the minimum number of bits required to store an octree with $n$ nodes is $\log\,T(n)$. (We're doing information theory here, so all logarithms are in base 2 unless otherwise specified.) Using Stirling's approximation to the logarithm of the factorial:
$$\log n! = n \log n - n \log e + O(\log n)$$
We find:
$$\log\,T(n) = (24 - 7 \log 7) n + O(\log n)$$
Which works out to $24 - 7 \log 7 \approx 4.349$ bits per node. So to answer the first part of your question, this is the theoretical minimum. You cannot do better than this.
Now we just have to address the second part: How close can we get?
Consider a bit vector of size $8n$, $8$ bits per node, where you put a 1 if the node has a child in that slot, and a 0 if it doesn't. This obviously uses about twice the space theoretically needed.
However, 0's and 1's don't occur with the same frequency. Specifically, given an octree with $n$ nodes, exactly $n-1$ of the bits must be 1 (exercise: why?), leaving $7n+1$ of them 0. The entropy of a bit is, therefore, approximately:
$$H = -\frac{1}{7} \log \frac{1}{7} - \frac{6}{7} \log \frac{6}{7} \approx 0.59$$
So you should be able to compress the bit vector to $8nH \approx 4.733n$ bits.
The actual compression method that I will suggest, because it's easy to understand and doesn't use anything complicated to implement like arithmetic coding, is Elias-Fano coding with parameter $b=2$.
We'll use your octree as an example. The bit vector corresponding to your exanple is:
We know that because $n=5$, the number of bits that are set must be $n-1=4$. We work out the indexes of the bits that are set:
$$\left\{0, 1, 9, 13\right\}$$
Or in binary, using $\left\lceil \log(8n) \right\rceil = 6$ bits per entry:
I've separated the low order $b$ bits for a reason. The compressed representation has two parts, one for the high-order bits and one for the low-order bits.
For the low-order bits, we just store these in an array. That is:
For the high-order bits, we count the patterns. Since there are no indexes that are larger than $8n-1=39$, There are only 10 possible patterns for the high-order bits:
$$\{0000,0001,0010,0011,0100,0101,0110,0111,1000,1001\}$$
We count the number of indices with each of the high-order patterns, and then encode those counts using a unary code:
(In practice, you probably don't need to work out all the high-order bit patterns, just stop when you run out of indices. But I digress.)
On this example, the representation uses 8 bits for the first array, and 14 bits for the unary-encoded bit vector, for a total of 22 bits, or 4.4 bits per node.
Recall that the asymptotic theoretical minimum is 4.349 bits per node.
Working out the asymptotic size of this representation as $n \rightarrow \infty$ is left as an exercise. So is actual code to do the compression and decompression.
Finally, I know you didn't ask about algorithmic efficiency, but storing the bit vector in an SDArray with succinct rank/select indices (there are implementations all over the place; look them up), you should be able to make this data structure random access in close to constant time. That is, you should be able to query the octree in its compressed form. The details are, again, left as an exercise.
$$T(n) = {8n \choose n}/(7n+1)$$
I worked this out, incidentally, by writing a short program to generate the first few values of $T(n)$ and then searching for it.
If you had no other information than $n$, then the minimum number of bits required to store an octree with $n$ nodes is $\log\,T(n)$. (We're doing information theory here, so all logarithms are in base 2 unless otherwise specified.) Using Stirling's approximation to the logarithm of the factorial:
$$\log n! = n \log n - n \log e + O(\log n)$$
We find:
$$\log\,T(n) = (24 - 7 \log 7) n + O(\log n)$$
Which works out to $24 - 7 \log 7 \approx 4.349$ bits per node. So to answer the first part of your question, this is the theoretical minimum. You cannot do better than this.
Now we just have to address the second part: How close can we get?
Consider a bit vector of size $8n$, $8$ bits per node, where you put a 1 if the node has a child in that slot, and a 0 if it doesn't. This obviously uses about twice the space theoretically needed.
However, 0's and 1's don't occur with the same frequency. Specifically, given an octree with $n$ nodes, exactly $n-1$ of the bits must be 1 (exercise: why?), leaving $7n+1$ of them 0. The entropy of a bit is, therefore, approximately:
$$H = -\frac{1}{7} \log \frac{1}{7} - \frac{6}{7} \log \frac{6}{7} \approx 0.59$$
So you should be able to compress the bit vector to $8nH \approx 4.733n$ bits.
The actual compression method that I will suggest, because it's easy to understand and doesn't use anything complicated to implement like arithmetic coding, is Elias-Fano coding with parameter $b=2$.
We'll use your octree as an example. The bit vector corresponding to your exanple is:
11000000 01000100 00000000 00000000 00000000We know that because $n=5$, the number of bits that are set must be $n-1=4$. We work out the indexes of the bits that are set:
$$\left\{0, 1, 9, 13\right\}$$
Or in binary, using $\left\lceil \log(8n) \right\rceil = 6$ bits per entry:
0000 00
0000 01
0010 01
0011 01I've separated the low order $b$ bits for a reason. The compressed representation has two parts, one for the high-order bits and one for the low-order bits.
For the low-order bits, we just store these in an array. That is:
00 01 01 01For the high-order bits, we count the patterns. Since there are no indexes that are larger than $8n-1=39$, There are only 10 possible patterns for the high-order bits:
$$\{0000,0001,0010,0011,0100,0101,0110,0111,1000,1001\}$$
We count the number of indices with each of the high-order patterns, and then encode those counts using a unary code:
110 0 10 10 0 0 0 0 0 0(In practice, you probably don't need to work out all the high-order bit patterns, just stop when you run out of indices. But I digress.)
On this example, the representation uses 8 bits for the first array, and 14 bits for the unary-encoded bit vector, for a total of 22 bits, or 4.4 bits per node.
Recall that the asymptotic theoretical minimum is 4.349 bits per node.
Working out the asymptotic size of this representation as $n \rightarrow \infty$ is left as an exercise. So is actual code to do the compression and decompression.
Finally, I know you didn't ask about algorithmic efficiency, but storing the bit vector in an SDArray with succinct rank/select indices (there are implementations all over the place; look them up), you should be able to make this data structure random access in close to constant time. That is, you should be able to query the octree in its compressed form. The details are, again, left as an exercise.
Code Snippets
11000000 01000100 00000000 00000000 000000000000 00
0000 01
0010 01
0011 0100 01 01 01110 0 10 10 0 0 0 0 0 0Context
StackExchange Computer Science Q#77115, answer score: 2
Revisions (0)
No revisions yet.