HiveBrain v1.2.0
Get Started
← Back to all entries
debugcppMinor

Fixed size double-ended queue

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sizedoublefixedqueueended

Problem

I require hundreds of equal, small sized buffers in my current project. The bulk of the computation in the program operates on data stored directly in those containers, so high performance is imperative. As the number of the buffers stays constant at runtime, my current approach is to use a std::vector> as the buffer, but I do not like the low cache locality (since the actual storage of each deque is spread on the heap) and possibility for memory fragmentation that this approach entails.

Thus I would like to replace my current container with something like a std::vector>, where somecontainer has at least a part of the contained objects stored locally, within the container object itself, and thus in contigous storage in the vector.

My attempt to implement such a class is presented below. The container Ring is implemented as a circular buffer, permitting fast insertion and deletion at either end, and supports indexed access to the contents. The size is fixed at compile time, and is restricted to powers of two in order to effectively utilize binary modular integer arithmetic. It is still incomplete, lacking mainly iterators and an assignment operator capable of handling Ring objects to others with different capacities. I do not plan to add exception support to the class or bounds checking to the access functions.

Are there any serious pitfalls in the design, or any significant improvements to be made? Will the memory alignment of the elements be correct, and will move constructors be used when inserting elements if possible? Could the container even be adapted to work as a thread safe, single producer - single consumer queue by substituting the current indexes of type uintN_t with indexes of type std::atomic_uintN_t along with other small modifications?

Usage example:

```
Ring buf(4, 0);
buf.popFront();
buf.pushBack(1);
buf.pushFront(-1);
while(!buf.isFull())
buf.pushBack(2);
auto buf2(buf);
buf.clear();
for(size_t n = 0; n < buf2.getSize(); n++)
s

Solution

Here are some observations that may help you improve your program.

Add type to getReqdBitCount

The C++ standard does not allow definition of functions without type, so this program is technically malformed. I suspect that getReqdBitCount should return unsigned.

Use std::size_t

Since this is C++11, you should use the standard std::size_t and #include rather than using the plain C version which may or may not be defined in `.

Use appropriate
#includes

In addition to
mentioned above, the code is using std::aligned_storage but does not #include where that is defined and is using std::forward but does not #include .

Consider simplifying some of the templates

There are a series of templates and
constexpr functions near the top that are ulimately all about finding the smallest size for MinUint but there are two changes I would like to suggest. The first is simplification. You can do all of that with just two templates:

#include 
template 
using Conditional = typename std::conditional::type;

template
    using MinUint = 
    Conditional::max()>=MaxVal,  uint_fast8_t, 
        Conditional::max()>=MaxVal,  uint_fast16_t, 
            Conditional::max()>=MaxVal,  uint_fast32_t, 
                uint_fast64_t
            >
        >
    >;


The second change is to use types such as
uint_fast8_t instead of uint8_t for the resulting type. There may be no difference, but the intent is to designate the fastest uint type with at least 8 bits. Also note the use of std::numeric_limits means that we don't have to manually type a bunch of potentially error-prone constant values. The difference is that if you type one too few Fs in a long constant, the compiler will happily generate the wrong code anyway, but if you make an error typing std::numeric_limits... the compiler will halt with an error.

Think of the data cache

Each of your data structures has three data items --
ringFront_, ringBack_ and the actual buffer. Because the cache line size in a typical machine these days is 64 bytes, you'll want to put the "hottest" (that is, the most frequently accessed) data toward the front of any object. For that reason, you should probably make the ring_ element the last of the three data items instead of the first to improve cacheing performance. Naturally, you should actually test the performance change (if any) rather than simply making assumptions.

A somewhat more radical possibility is to put all of the cache pointers in one object and all of the cache buffer area in another.

Consider keeping explicit count

At the moment, the
isEmpty, isFull and getSize are relatively complex because of the storage mechanism used. Instead, consider instead if a separate count_ variable were used. Then, those functions would be trivial:

bool isEmpty() const { return count_==0; }
bool isFull() const { return count_==BufSize; }
bool getSize() const { return count_; }


Increment and decrement would be done in each dequeue operation instead, but it may be worthwhile to check for a performance difference. This would also allow for the front and back to be pointers (
*T`) rather than indices, which would also likely speed access.

Code Snippets

#include <limits>
template <bool C, typename T, typename F>
using Conditional = typename std::conditional<C, T, F>::type;

template<uint64_t MaxVal>
    using MinUint = 
    Conditional<std::numeric_limits<uint8_t>::max()>=MaxVal,  uint_fast8_t, 
        Conditional<std::numeric_limits<uint16_t>::max()>=MaxVal,  uint_fast16_t, 
            Conditional<std::numeric_limits<uint32_t>::max()>=MaxVal,  uint_fast32_t, 
                uint_fast64_t
            >
        >
    >;
bool isEmpty() const { return count_==0; }
bool isFull() const { return count_==BufSize; }
bool getSize() const { return count_; }

Context

StackExchange Code Review Q#86725, answer score: 4

Revisions (0)

No revisions yet.