HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Reimplementing numpy.genfromtxt in Fortran for Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
genfromtxtfortrannumpyreimplementingforpython

Problem

I've found that the function genfromtxt from numpy in Python is very slow.

Therefore I decided to wrap a subroutine with f2py to read my data. The data is a matrix.

subroutine genfromtxt(filename, nx, ny, a)
implicit none
    character(100):: filename
    real, dimension(ny,nx) :: a 
    integer :: row, col, ny, nx
    !f2py character(100), intent(in) ::filename
    !f2py integer, intent(in) :: nx
    !f2py integer, intent(in) :: ny
    !f2py real, intent(out), dimension(nx,ny) :: a

    !Opening file
    open(5, file=filename)

    !read data again
    do row = 1, ny
        read(5,*) (a(row,col), col =1,nx) !reading line by line 
    end do
    close (5)
end subroutine genfromtxt


The length of the filename is fixed to 100 because f2py can't deal with dynamic arrays. The code works for filenames shorter than 100, otherwise the code in Python crashes.

This is how I call the function in Python

import Fmodules as modules
w_map=modules.genfromtxt(filename,100, 50)


Any idea on how to do this dynamically without sending nx and ny as parameters?

Solution

Numpy's genfromtxt is indeed slow which is due to it's flexibility. It tries very hard to figure out what the layout of your data file is. So you will not get this flexibility by implementing it yourself in fortran.

Have you tried using loadtxt instead?

Depending on the conditions, for the same matrix genfromtxt may be more than 20 times slower than loadtxt.

BTW a simple implementation in python which is in my case faster than both loadtxt and genfromtxt:

with open("matrix.txt",'r') as f:
    a=array([fromstring(s,dtype=float,sep=' ') for line in f])


I guess the speed comes from the fact that I do not have to read in the whole file to check how many lines it has.

EDIT: I realise this does not really answer your question, but I believe that the speed-up that you may get from using fortran does not really warrant the loss of flexibility.

Code Snippets

with open("matrix.txt",'r') as f:
    a=array([fromstring(s,dtype=float,sep=' ') for line in f])

Context

StackExchange Code Review Q#94343, answer score: 4

Revisions (0)

No revisions yet.