patternMinor

Increase performance of a spate of 2x2 matrix multiplication

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

spate2x2multiplicationperformanceincreasematrix

Problem

I would like to improve the performance of the piece of code below (in Fortran). It gives good results but the profiling tells me that it is where it spends most of its running time.

Basically, it increments over a time window (j loop) and performs a spate of 2x2 matrix multiplications. I was wondering where I can refactor it to make it much more efficient. For instance, is there a smarter and faster way to initialise the identity matrices?

```
do j=1,nb_pts_t + 1
t = t + dt
! Compute Zgates for all pairs in up/down states
call Z_gate (eigen_ener_up, t / dble(2 * CP_seq), Zgate_u)
call Z_gate (eigen_ener_down, t / dble(2 * CP_seq), Zgate_d)

! Initialize identity matrices
Tfu%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfu%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfu%elements(2,2) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,1) = dcmplx(1.d0, 0.d0)
Tfd%elements(1,2) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,1) = dcmplx(0.d0, 0.d0)
Tfd%elements(2,2) = dcmplx(1.d0, 0.d0)

! work out transition matrices in up/down states
! and compute the decoherence
do i=1,nb_pairs
! rotate to eigenbasis and propagate
Tu(i)%elements = matmul(Zgate_u(i)%elements, matrot_u(i)%elements)
! rotate back to bath basis
Tu(i)%elements = matmul(matrottrans_u(i)%elements, Tu(i)%elements)
! Idem down state
Td(i)%elements = matmul(Zgate_d(i)%elements, matrot_d(i)%elements)
Td(i)%elements = matmul(matrottrans_d(i)%elements, Td(i)%elements)

! Initialize Tud and Tdu
Tud(i)%elements = matmul(Tu(i)%elements, Td(i)%elements)
Tdu(i)%elements = matmul(Td(i)%elements, Tu(i)%elements)

do k=1,CP_seq
if(modulo(k, 2) .ne. 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
else if(modulo(k, 2) == 0) then
Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elemen

Solution

Leave matmul in

You can test the speed of matmul versus do-loops versus dot_product for any size arrays via the program found in this SO question. You can test this with any compiler and optimization you want, but the results are that any matrix of size < 50 (probably more, but I didn't test it higher than that), the three methods are identical in execution time.

Identity matrix initialization

This can be done with vector assignment and a simple loop

Tfu(i)%elements(:,:) = cmplx(0d0, 0d0) 
do k=1,2
   Tfu(i)%elements(k,k) = cmplx(1d0,0d0)
end do

While Fortran defaults the imaginary component to zero when the 2nd argument is not present (e.g., cmplx(1d0)), it is probably better to be more clear by including it. Note that this won't change the run time, but it will make the declaration more compact.

Time variable in call to Z_gate

AFAIK, when there is math to be done in subroutine calls, Fortran will make a temporary copy of that variable. While it isn't much space for a double precision variable, you should still declare a new variable t_temp=t / dble(2 * CP_seq) and use that in place of your function call, especially since the math is done twice in subsequent calls to Z_gate.

It might also be worth declaring a new variable outside the j-loop

cp_seq_inv = 1d0/dble(cp_seq)

and making use of it as

t_temp = t * 0.5 * cp_seq_inv

As multiplication is faster than division, computing the inverse once (or at least once per call to this subprogram) should shave off some time.

Looping over k

Another thing to consider is that you have do loop increments at your disposal. That is to say, you can write your do k=1,CP_seq loop as

do k=1,CP_seq,2
! odd step
   Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
   Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
! even step
   Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elements)
   Tfd(i)%elements = matmul(Tfd(i)%elements, Tud(i)%elements)
end do

Rather than wasting any time testing the two cases, do both in order and increment k by 2.

Code Snippets

Tfu(i)%elements(:,:) = cmplx(0d0, 0d0) 
do k=1,2
   Tfu(i)%elements(k,k) = cmplx(1d0,0d0)
end do

cp_seq_inv = 1d0/dble(cp_seq)

t_temp = t * 0.5 * cp_seq_inv

do k=1,CP_seq,2
! odd step
   Tfu(i)%elements = matmul(Tfu(i)%elements, Tud(i)%elements)
   Tfd(i)%elements = matmul(Tfd(i)%elements, Tdu(i)%elements)
! even step
   Tfu(i)%elements = matmul(Tfu(i)%elements, Tdu(i)%elements)
   Tfd(i)%elements = matmul(Tfd(i)%elements, Tud(i)%elements)
end do

Context

StackExchange Code Review Q#70401, answer score: 4

Revisions (0)

No revisions yet.