patternMinor
AXI4-Stream module
Viewed 0 times
streamaxi4module
Problem
I'm implementing an AXI4-Stream module. The module uses three DSP blocks (DSP49E1, UG479 - Xilinx). In order to run the module at a frequency of 150 MHz, the design is a pipeline going successively through each DSP.
In this code, I made the choice of a
Would that be better to create independent processes instead of a
I like the way I coded the pipeline because it saved me time (for the shift register) thanks to the
However, is it a good way of coding it (frequency, power, FPGA utilization,...)? More generally, is it a good design practice or is it limiting for certain purposes? I would like to understand the whys and wherefores of my choice.
```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library ieee_proposed;
use ieee_proposed.fixed_pkg.all;
entity slave_AXIStream_RGBtoGray is
port (
-- Main signals
CLK : in std_logic;
RESETN : in std_logic;
-- Ready signal for upstream block
S_AXIS_TREADY : out std_logic;
-- Data in
S_AXIS_TDATA : in std_logic_vector(23 downto 0);
-- Flag for first pixel of a frame
S_AXIS_TUSER : in std_logic;
-- Flag for last pixel of a line
S_AXIS_TLAST : in std_logic;
-- Valid data
S_AXIS_TVALID : in std_logic;
-- Downstream blocks are ready
M_AXIS_TREADY : in std_logic;
-- Data out
M_AXIS_TDATA : out std_logic_vector(7 downto 0);
-- Flag for first pixel of a frame
M_AXIS_TUSER : out std_logic;
-- Flag for last pixel of a line
M_AXIS_TLAST : out std_logic;
-- Valid data
M_AXIS_TVALID : out std_logic
);
end slave_AXIStream_RGBtoGray;
architecture Behavioral o
In this code, I made the choice of a
for loop into a single process to implement the pipeline. I should say that I have simulated and tested this design on a Xilinx FPGA (7 series) and it works perfectly fine so far.Would that be better to create independent processes instead of a
for loop inside a single process?I like the way I coded the pipeline because it saved me time (for the shift register) thanks to the
for loop and the arrays of std_logic_vector.However, is it a good way of coding it (frequency, power, FPGA utilization,...)? More generally, is it a good design practice or is it limiting for certain purposes? I would like to understand the whys and wherefores of my choice.
```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library ieee_proposed;
use ieee_proposed.fixed_pkg.all;
entity slave_AXIStream_RGBtoGray is
port (
-- Main signals
CLK : in std_logic;
RESETN : in std_logic;
-- Ready signal for upstream block
S_AXIS_TREADY : out std_logic;
-- Data in
S_AXIS_TDATA : in std_logic_vector(23 downto 0);
-- Flag for first pixel of a frame
S_AXIS_TUSER : in std_logic;
-- Flag for last pixel of a line
S_AXIS_TLAST : in std_logic;
-- Valid data
S_AXIS_TVALID : in std_logic;
-- Downstream blocks are ready
M_AXIS_TREADY : in std_logic;
-- Data out
M_AXIS_TDATA : out std_logic_vector(7 downto 0);
-- Flag for first pixel of a frame
M_AXIS_TUSER : out std_logic;
-- Flag for last pixel of a line
M_AXIS_TLAST : out std_logic;
-- Valid data
M_AXIS_TVALID : out std_logic
);
end slave_AXIStream_RGBtoGray;
architecture Behavioral o
Solution
I prefer
You could try to write generic VHDL code for your MACC operation. If Xilinx ISE can infer the correct hardware (so it uses the DSP48E* hard macro and the embedded adder), then it's a better long term solution. Each FPGA generation and family has its own
The component syntax is 'outdated'. You can spare the component declaration and use this line for the instantiation:
If the macro is compiled into another library than the current design unit (
You might want to replace some of your magic numbers (12, 13, 14, 19, 30, ...) with constants or generic parameters and reuse or calculate them. So if you decide to increase ranges, it's easier to modify only a few constants rather than rethinking the complete algorithm.
Appendix -
Here are two generate example, which I wrote:
-
An Odd-Even Sort sorting network. It has a generate loop for the stages and a simple odd/even decision to instantiate the correct stage element.
-
An Odd-Even Mergesort sorting network. It has 5 nested generate loops to translate the recursive (software) description into a linear one. While doing so, I found that the proposed software algorithm is not "good". It's correct but is produces more compare operations than required.
Constructing an odd-even mergesort sorting network
Without generate statements and a non-Xilinx tool, I had never found out that my design has multiple drivers. Some tools don't create a multiple driver warning if all drivers generate the same value. In simulation, this fault is also undetected because equal driving drivers resolve to the same value.
generate statements over complex VHDL processes. Process descriptions won't unveil pipeline wiring faults, because a process can't create multiple drivers (last assignment wins). In a generate description, such a fault creates multiple driver which can be detected by the tools (synthesis and simulation).You could try to write generic VHDL code for your MACC operation. If Xilinx ISE can infer the correct hardware (so it uses the DSP48E* hard macro and the embedded adder), then it's a better long term solution. Each FPGA generation and family has its own
DSPxxEy hard macros. So using generic VHDL code can increase maintainability and portability. (On the other hand synthesis tools are known for unlearning ... )The component syntax is 'outdated'. You can spare the component declaration and use this line for the instantiation:
DSP_A: entity work.dsp48E1_macro
port map (
-- ...
);If the macro is compiled into another library than the current design unit (
work) is compiled into, then replace work with the correct library name.You might want to replace some of your magic numbers (12, 13, 14, 19, 30, ...) with constants or generic parameters and reuse or calculate them. So if you decide to increase ranges, it's easier to modify only a few constants rather than rethinking the complete algorithm.
Appendix -
generate examplesHere are two generate example, which I wrote:
-
An Odd-Even Sort sorting network. It has a generate loop for the stages and a simple odd/even decision to instantiate the correct stage element.
-
An Odd-Even Mergesort sorting network. It has 5 nested generate loops to translate the recursive (software) description into a linear one. While doing so, I found that the proposed software algorithm is not "good". It's correct but is produces more compare operations than required.
Constructing an odd-even mergesort sorting network
Without generate statements and a non-Xilinx tool, I had never found out that my design has multiple drivers. Some tools don't create a multiple driver warning if all drivers generate the same value. In simulation, this fault is also undetected because equal driving drivers resolve to the same value.
Code Snippets
DSP_A: entity work.dsp48E1_macro
port map (
-- ...
);Context
StackExchange Code Review Q#135868, answer score: 5
Revisions (0)
No revisions yet.