patterncppMinor
GPU and CPU templated matrix classes
Viewed 0 times
templatedgpuclassesandcpumatrix
Problem
I'm learning CUDA C++ (and C++). Can you guys give me some feedback on my basic templated matrix implementation?
A brief explanation of my decisions:
And my most important doubts:
How can I improve the code without making it bloated? By bloated I mean overloading all kinds of operators that won't be used.
```
#ifndef VK_MATRIX_CUH_INCLUDED
#define VK_MATRIX_CUH_INCLUDED
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include
#include
#include
namespace vk {
template
class matrix {
private:
T* _values;
const size_t _width;
const size_t _height;
public:
__host__ __device__ matrix(T* ptr, const size_t width, const size_t height)
:_values(ptr),
A brief explanation of my decisions:
- First things first: I can't use STL classes/functions in my CUDA code.
- I can't use C++ 14 (or newer) features.
- I'm using templates to give support to
floats,doubles and half precisionfloats.
matrixis meant to be the basic/bare bones implementation; that is a storage for a pointer to a contiguous block of memory, the width and the height of the matrix and the most basic get/set methods.
host_matrixis meant to be a "CPU + main memory" implementation. It's just a thin wrapper for matrix plus a constructor/destructor to manage memory.
device_matrixis meant to be a "GPU + GPU memory" implementation. Like the host version, it's just a thin wrapper plus some memory management code.
- The host and device versions don't inherit from
matrixwould because I can't think of an algorithm that would work properly for ahost_matrixand adevice_matrix. It would break the substitution principle.
And my most important doubts:
- Is everything well defined? I'm somewhat concerned that my
device_matrix's constructor is working by chance. I'm not sure that the GPU memory layout is compatible with the CPU memory layout and that it's UB to memcpy like I did
- Does the code make sense?
- Is the code easy to read/maintain (for C++ standards)?
How can I improve the code without making it bloated? By bloated I mean overloading all kinds of operators that won't be used.
```
#ifndef VK_MATRIX_CUH_INCLUDED
#define VK_MATRIX_CUH_INCLUDED
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include
#include
#include
namespace vk {
template
class matrix {
private:
T* _values;
const size_t _width;
const size_t _height;
public:
__host__ __device__ matrix(T* ptr, const size_t width, const size_t height)
:_values(ptr),
Solution
class matrix
The class matrix does not own the pointer passed in the constructor. So you should do something to make that clear in the interface so that people don't accidentally pass you dynamically allocated memory.
There is nothing in the standard so I would create a class that represents a raw pointer. Its not
This way you can not just pass a pointer. You need to construct a
Copying
Because you are not managing the memory you don't technically need to follow the rule of three or five. But this also results in unexpected behavior.
As a result I would disable the copy semantics.
Moving
This should work like normal. There is nothing special needed here.
Get and Set
That seems a bit obtuse. When not just one method that returns a reference to the value. You can then use normal assignment to the element.
With a small amount of work you can get the above to work using
host_matrix
Prefer to use the initializer list
Better if you writ it like this:
Off by one error:
Think of a 2 * 2 matrix. This is located at
device_matrix
You break the rule of three. You have an owned memory resource. But you don't define the correct copy semantics. Thus you will have multiple calls to
You should use a shared pointer and define the destructor just like you did in
class device_matrix
{
private:
std::shared_ptr _data_pointer;
std::shared_ptr> _matrix;
const size_t _width;
const size_t _height;
Use exceptions to kill an app.
The
If you throw an exception the stack is unwound (thus calling the destructor of all objects). If you have correctly used RAII then these destructors should release your
The class matrix does not own the pointer passed in the constructor. So you should do something to make that clear in the interface so that people don't accidentally pass you dynamically allocated memory.
There is nothing in the standard so I would create a class that represents a raw pointer. Its not
template
class raw_ptr
{
T* ptr
public:
explicit raw_ptr(T* ptr): ptr(ptr) {}
T* operator() const {return ptr;}
};
__host__ __device__
matrix(raw_ptr const& ptr, const size_t width, const size_t height)
:_values(ptr),
_width(width),
_height(height)
{
assert(ptr != nullptr);
}
// Note:
T* _values; // don't need to change this type.
// I would just change the interface.This way you can not just pass a pointer. You need to construct a
raw_ptr object that will make people understand that they are supposed to pass an unmanaged pointer across.int data[14] = {};
matrix a(raw_ptr(data), 2, 7);Copying
Because you are not managing the memory you don't technically need to follow the rule of three or five. But this also results in unexpected behavior.
matrix b(a);
a.set(1, 3, 5);
std::cout << b.get(1, 3) << "\n"; // prints 5 when you would expect 0As a result I would disable the copy semantics.
matrix(matrix const&) = delete;
matrix& operator=(matrix const&) = delete;Moving
This should work like normal. There is nothing special needed here.
matrix(matrix&&) = default;
matrix& operator=(matrix&&) = default;Get and Set
That seems a bit obtuse. When not just one method that returns a reference to the value. You can then use normal assignment to the element.
T& operator()(int x, int y) {
return values[y * _width + x];
}
T const& operator()(int x, int y) const{
return values[y * _width + x];
}
int x = a(1,2); // read a value (get)
a(1,3) = a(1,4); // write a value from a read value.With a small amount of work you can get the above to work using
[] rather than (). Personally I would do the extra work to get [] working because that is the natural way to use a matrix.host_matrix
Prefer to use the initializer list
host_matrix(const size_t width, const size_t height)
{
_data_pointer = std::shared_ptr(new T[width * height], std::default_delete());
_matrix = std::make_shared>(_data_pointer.get(), width, height);
}Better if you writ it like this:
host_matrix(const size_t width, const size_t height)
: _data_pointer(new T[width * height], std::default_delete())
, _matrix(raw_ptr(_data_pointer.get()), width, height)
{}Off by one error:
auto offset = _matrix->width * _matrix->height;
auto last = _matrix->values + offset;
return last + 1;Think of a 2 * 2 matrix. This is located at
[0], [1], [2], [3] so one past the end is at [4] but your function returns [5].device_matrix
You break the rule of three. You have an owned memory resource. But you don't define the correct copy semantics. Thus you will have multiple calls to
cudaFree() when copied objects go out of scope.You should use a shared pointer and define the destructor just like you did in
host_matrix.class device_matrix
{
private:
std::shared_ptr _data_pointer;
std::shared_ptr> _matrix;
const size_t _width;
const size_t _height;
public:
__host__ device_matrix(const size_t width, const size_t height)
: _data_pointer(allocate(width * heigt), [](T* m){cudaFree(m);}),
_matrix(allocate(), [](matrix* m){cudaFree(m);}),
_width(width),
_height(height)
{Use exceptions to kill an app.
The
abort() function exits the app. But does not unwind the call stack. I am not sure how the cuda functions affect the GPU if the resources are not freed. But I should not have to worry about that.If you throw an exception the stack is unwound (thus calling the destructor of all objects). If you have correctly used RAII then these destructors should release your
cuda resources. Thus placing your GPU back into a good state.Code Snippets
template<typename T>
class raw_ptr
{
T* ptr
public:
explicit raw_ptr(T* ptr): ptr(ptr) {}
T* operator() const {return ptr;}
};
__host__ __device__
matrix(raw_ptr<T> const& ptr, const size_t width, const size_t height)
:_values(ptr),
_width(width),
_height(height)
{
assert(ptr != nullptr);
}
// Note:
T* _values; // don't need to change this type.
// I would just change the interface.int data[14] = {};
matrix<T> a(raw_ptr<T>(data), 2, 7);matrix<int> b(a);
a.set(1, 3, 5);
std::cout << b.get(1, 3) << "\n"; // prints 5 when you would expect 0matrix(matrix const&) = delete;
matrix& operator=(matrix const&) = delete;matrix(matrix&&) = default;
matrix& operator=(matrix&&) = default;Context
StackExchange Code Review Q#160797, answer score: 2
Revisions (0)
No revisions yet.