HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustMinor

A rusty Bit Vector library

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rustybitvectorlibrary

Problem

As an exercise to learn Rust I've decided to write a Bit Vector library, furthermore I also intend to use it later because my version can share slices of the bit vector between threads whereas the existing bit_vec library cannot do that and development does not seem to be active anymore.

The library has the following functionality:

  • Create a bit vector backed by a storage such as u8, u16, u32, u64 or usize.



  • Set a bit in the bit vector.



  • Get a bit from the bit vector.



  • Split the bit vector into either immutable or mutable bit slices.



  • Retrieve an iterator over the bit vector or bit slices.



I have tried to mimic Vec as much as possible, it for example has a vec.split_at() resp vec.split_at_mut() method that return slices ([T]) to the original vector, the following types are available in this library:

  • BitStorage: A trait with implementation that specifies which types are eligible for storing the bits, existing types that fit this definition are u8, u16, u32, u64 and usize, but an user could also define their own types.



  • BitVector: The struct that represents the bit vector.



  • BitSlice: The struct that represents the immutable bit slice.



  • BitSliceMut: The struct that represents the mutable bit slice.



Below is the code to be reviewed, all code including unit tests can be found on my GitHub project page and I'd especially like advice on reducing code duplication.

The main lib.rs file:

extern crate num;

macro_rules! bool_ref {
    ($cond:expr) => (if $cond { &TRUE } else { &FALSE })
}

mod bit_storage;
mod bit_vector;
mod bit_slice;
mod bit_slice_mut;

pub use bit_storage::BitStorage;
pub use bit_vector::BitVector;
pub use bit_slice::BitSlice;
pub use bit_slice_mut::BitSliceMut;

static TRUE: bool = true;
static FALSE: bool = false;


The bit_storage.rs file:

```
use std::mem;
use std::ops::{BitAnd,BitAndAssign,BitOr,BitOrAssign,BitXor,BitXorAssign,Not,Shl,ShlAssign,Shr,ShrAssign};
use num;
use num::{One,Zero,Unsigned,Num

Solution

There are some correctness issues with this code that need to be addressed immediately.

-
There is no need to use pointers for the BitSlice(Mut). These should happily accept regular slices as the backing storage.

-
Check if you are going to go off the end of the vector before doing it! The code for the slice iterators currently looks like this:

let next = self.get_unchecked_by_data_index_and_remainder(self.data_index_counter, remainder);
if self.calculate_index() == self.capacity { return None; }


but it should be

if self.calculate_index() == self.capacity { return None; }
let next = self.get_unchecked_by_data_index_and_remainder(self.data_index_counter, remainder);


This was hidden from you due to the use of pointers.

-
I do not believe that it the Send / Sync implementations are correct. I did not test, but I believe that you could drop the underlying BitVec, leaving the pointers dangling. You shouldn't need these after switching to slices, anyway.

You should become familiar with tools like Valgrind if you want to do pointer-level work. Additionally, tools like quickcheck may help you explore a larger space of inputs with which to trigger bad behavior.

Here's the stylistic things I saw before the correctness ones above:

-
Use &self or &mut self if the first argument is a &Self, specifically on the traits. You can still call it with the long form of Universal Function Call Syntax (UFCS), but this also allows method syntax:

// definition
fn set(&mut self, storage_index: Self, value: bool);
fn get(&self, storage_index: Self) -> bool;

// call site
self.data[data_index].get(remainder)


-
Why increment the capacity by 1 at creation time? Feels like it's supposed to be rounding up? If so, it should be either documented by a comment or a well-named method. Consider using floats and ceiling to get an accurate value, instead of adding overhead when the value fits perfectly.

-
When I got to split_at, my first thought was "What about splitting at a section inside a byte?". This leads to my next point...

-
You have to have documentation for public functions that do non-obvious things. Method names and types serve as a good base level of documentation, but they aren't sufficient. Panic conditions should especially be documented!

-
You should include the conditions that make unsafe code not unsafe inside the unsafe block. This helps future programmers understand the preconditions and prevents them from being split up.

-
The "unchecked" methods call the underlying slice / vectors Idx operation, which panics on out of bounds. It's likely you should be calling the underlying "unchecked" methods.

-
I would then expect the "unchecked" methods to be unsafe as you could easily walk off the array and into undefined memory.

-
Marking non-public functions inline probably has no effect. inline means to include a raw form of the code in the compiled library so that downstream consumers can choose to inline it. If it's private, nothing downstream can use it. Note that inline != inline(always).

-
The iterators could be built on top of the slice / Vec iterator instead of reindexing into them each time.

-
"Capacity" is a strange name. The capacity is how many values a vector could hold, size is how many it is holding. If I use 1 bit in a u64, the capacity should be 64 and the size should be 1.

At this point, I saw the correctness issues and stopped looking at idiomatic issues.

Code Snippets

let next = self.get_unchecked_by_data_index_and_remainder(self.data_index_counter, remainder);
if self.calculate_index() == self.capacity { return None; }
if self.calculate_index() == self.capacity { return None; }
let next = self.get_unchecked_by_data_index_and_remainder(self.data_index_counter, remainder);
// definition
fn set(&mut self, storage_index: Self, value: bool);
fn get(&self, storage_index: Self) -> bool;

// call site
self.data[data_index].get(remainder)

Context

StackExchange Code Review Q#132621, answer score: 4

Revisions (0)

No revisions yet.