patternMinor
Increase performance of a spate of 2x2 matrix multiplication
Viewed 0 times
spate2x2multiplicationperformanceincreasematrix
Problem
I would like to improve the performance of the piece of code below (in Fortran). It gives good results but the profiling tells me that it is where it spends most of its running time.
Basically, it increments over a time window (
```
do j=1,nb_pts_t + 1
t = t + dt
! Compute Zgates for all pairs in up/down states
call Z_gate (eigen_ener_up, t / dble(2 * CP_seq), Zgate_u)
call Z_gate (eigen_ener_down, t / dble(2 * CP_seq), Zgate_d)
! Initialize identity matrices
Tfu%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfu%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,2) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,2) = dcmplx(1.d0, 0.d0)
! work out transition matrices in up/down states
! and compute the decoherence
do i=1,nb_pairs
! rotate to eigenbasis and propagate
Tu(i)%elements = matmul(Zgate_u(i)%elements, matrot_u(i)%elements)
! rotate back to bath basis
Tu(i)%elements = matmul(matrottrans_u(i)%elements, Tu(i)%elements)
! Idem down state
Td(i)%elements = matmul(Zgate_d(i)%elements, matrot_d(i)%elements)
Td(i)%elements = matmul(matrottrans_d(i)%elements, Td(i)%elements)
! Initialize Tud and Tdu
Tud(i)%elements = matmul(Tu(i)%elements, Td(i)%elements)
Tdu(i)%elements = matmul(Td(i)%elements, Tu(i)%elements)
do k=1,CP_seq
if(modulo(k, 2) .ne. 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
else if(modulo(k, 2) == 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elemen
Basically, it increments over a time window (
j loop) and performs a spate of 2x2 matrix multiplications. I was wondering where I can refactor it to make it much more efficient. For instance, is there a smarter and faster way to initialise the identity matrices?```
do j=1,nb_pts_t + 1
t = t + dt
! Compute Zgates for all pairs in up/down states
call Z_gate (eigen_ener_up, t / dble(2 * CP_seq), Zgate_u)
call Z_gate (eigen_ener_down, t / dble(2 * CP_seq), Zgate_d)
! Initialize identity matrices
Tfu%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfu%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,2) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,2) = dcmplx(1.d0, 0.d0)
! work out transition matrices in up/down states
! and compute the decoherence
do i=1,nb_pairs
! rotate to eigenbasis and propagate
Tu(i)%elements = matmul(Zgate_u(i)%elements, matrot_u(i)%elements)
! rotate back to bath basis
Tu(i)%elements = matmul(matrottrans_u(i)%elements, Tu(i)%elements)
! Idem down state
Td(i)%elements = matmul(Zgate_d(i)%elements, matrot_d(i)%elements)
Td(i)%elements = matmul(matrottrans_d(i)%elements, Td(i)%elements)
! Initialize Tud and Tdu
Tud(i)%elements = matmul(Tu(i)%elements, Td(i)%elements)
Tdu(i)%elements = matmul(Td(i)%elements, Tu(i)%elements)
do k=1,CP_seq
if(modulo(k, 2) .ne. 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
else if(modulo(k, 2) == 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elemen
Solution
Leave matmul in
You can test the speed of
Identity matrix initialization
This can be done with vector assignment and a simple loop
While Fortran defaults the imaginary component to zero when the 2nd argument is not present (e.g.,
Time variable in call to Z_gate
AFAIK, when there is math to be done in subroutine calls, Fortran will make a temporary copy of that variable. While it isn't much space for a double precision variable, you should still declare a new variable
It might also be worth declaring a new variable outside the j-loop
and making use of it as
As multiplication is faster than division, computing the inverse once (or at least once per call to this subprogram) should shave off some time.
Looping over k
Another thing to consider is that you have do loop increments at your disposal. That is to say, you can write your
Rather than wasting any time testing the two cases, do both in order and increment k by 2.
You can test the speed of
matmul versus do-loops versus dot_product for any size arrays via the program found in this SO question. You can test this with any compiler and optimization you want, but the results are that any matrix of size < 50 (probably more, but I didn't test it higher than that), the three methods are identical in execution time.Identity matrix initialization
This can be done with vector assignment and a simple loop
Tfu(i)%elements(:,:) = cmplx(0d0, 0d0)
do k=1,2
Tfu(i)%elements(k,k) = cmplx(1d0,0d0)
end doWhile Fortran defaults the imaginary component to zero when the 2nd argument is not present (e.g.,
cmplx(1d0)), it is probably better to be more clear by including it. Note that this won't change the run time, but it will make the declaration more compact.Time variable in call to Z_gate
AFAIK, when there is math to be done in subroutine calls, Fortran will make a temporary copy of that variable. While it isn't much space for a double precision variable, you should still declare a new variable
t_temp=t / dble(2 * CP_seq) and use that in place of your function call, especially since the math is done twice in subsequent calls to Z_gate. It might also be worth declaring a new variable outside the j-loop
cp_seq_inv = 1d0/dble(cp_seq)and making use of it as
t_temp = t * 0.5 * cp_seq_invAs multiplication is faster than division, computing the inverse once (or at least once per call to this subprogram) should shave off some time.
Looping over k
Another thing to consider is that you have do loop increments at your disposal. That is to say, you can write your
do k=1,CP_seq loop asdo k=1,CP_seq,2
! odd step
Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
! even step
Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tud(i)%elements)
end doRather than wasting any time testing the two cases, do both in order and increment k by 2.
Code Snippets
Tfu(i)%elements(:,:) = cmplx(0d0, 0d0)
do k=1,2
Tfu(i)%elements(k,k) = cmplx(1d0,0d0)
end docp_seq_inv = 1d0/dble(cp_seq)t_temp = t * 0.5 * cp_seq_invdo k=1,CP_seq,2
! odd step
Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
! even step
Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tud(i)%elements)
end doContext
StackExchange Code Review Q#70401, answer score: 4
Revisions (0)
No revisions yet.