patterncppMinor
CUDA Kernel - Neural Net
Viewed 0 times
neuralkernelcudanet
Problem
I'm building a spiking neural net (recurrent, integrate and fire), and I'm curious about how to reduce the warp divergence (and other problems) I may have.
Here's an example with a few hand-placed neurons and synapses for a better apprehension. I upload the whole code on a Git repo for faster access,
The very basic workflow is to execute the 4 kernels (explanation in the comments) one after another.
After 5000 cycles, here's the time in ms for each kernel in order:
My code is split into three files, whose name are pretty explicit.
NN.hpp (which contains my structures)
```
#ifndef NN_HPP
# define NN_HPP
# include
/ Window Setting -- for SFML, no need here /
# define WIDTH 1280
# define HEIGHT 720
# define XOFFSET -0
# define YOFFSET -95
/ Network Settings /
# define STIMULUS 1
# define INHIBITION -1
# define STIMULUS_RATIO 80
# define SPIKE_BUFFER 4
# define INPUT 0
# define OUTPUT 1
# define HIDDEN 2
typedef struct s_neuron_info
{
float x, y, z;
char gid;
unsigned char group; // hidden, input, output
} t_neuron_info;
typedef struct s_neuron
{
short in_time;
float in_value;
int next_time;
float action_potential;
float threshold; // fire when threshold reached
float weight;
char type; // stimulus, inhibition
char carry;
} t_neuron;
typedef struct s_synapse
{
int id_in;
int id_out;
int axonal_delay; // in timestep
} t_synapse;
typedef struct s_spike
{
int syn_id;
int id_out;
int start_t, end_t; /
Here's an example with a few hand-placed neurons and synapses for a better apprehension. I upload the whole code on a Git repo for faster access,
make tui then ./cudasnn to try.The very basic workflow is to execute the 4 kernels (explanation in the comments) one after another.
After 5000 cycles, here's the time in ms for each kernel in order:
- 1 - 0.196ms
- 2 - 3.558ms
- 3 - 0.038ms
- 4 - 4.416ms to 6.278ms
My code is split into three files, whose name are pretty explicit.
NN.hpp (which contains my structures)
```
#ifndef NN_HPP
# define NN_HPP
# include
/ Window Setting -- for SFML, no need here /
# define WIDTH 1280
# define HEIGHT 720
# define XOFFSET -0
# define YOFFSET -95
/ Network Settings /
# define STIMULUS 1
# define INHIBITION -1
# define STIMULUS_RATIO 80
# define SPIKE_BUFFER 4
# define INPUT 0
# define OUTPUT 1
# define HIDDEN 2
typedef struct s_neuron_info
{
float x, y, z;
char gid;
unsigned char group; // hidden, input, output
} t_neuron_info;
typedef struct s_neuron
{
short in_time;
float in_value;
int next_time;
float action_potential;
float threshold; // fire when threshold reached
float weight;
char type; // stimulus, inhibition
char carry;
} t_neuron;
typedef struct s_synapse
{
int id_in;
int id_out;
int axonal_delay; // in timestep
} t_synapse;
typedef struct s_spike
{
int syn_id;
int id_out;
int start_t, end_t; /
Solution
As a first step, remove as many conditional branches as possible. Take a functional programming approach.
You added a lot of conditional returns for error checking that can be removed if your arrays are set up to accommodate all inputs.
Conversion to functional programming example:
becomes:
You added a lot of conditional returns for error checking that can be removed if your arrays are set up to accommodate all inputs.
Conversion to functional programming example:
if (n[idx].carry)
{
n[idx].action_potential = 0.0f;
n[idx].carry = 0;
}becomes:
n[idx].action_potential = n[idx].action_potential - (n[idx].carry * n[idx].action_potential);
n[idx].carry = 0;Code Snippets
if (n[idx].carry)
{
n[idx].action_potential = 0.0f;
n[idx].carry = 0;
}n[idx].action_potential = n[idx].action_potential - (n[idx].carry * n[idx].action_potential);
n[idx].carry = 0;Context
StackExchange Code Review Q#95874, answer score: 2
Revisions (0)
No revisions yet.