HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding the average of each row in a CSV file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thefileeachcsvaveragefindingrow

Problem

I am getting back into python after a year of not doing it at all. I just wrote a bit of code to create a file that looks like this:

1,2
1,1,1,1
-1,0,1
42,17


Then I decided to take the average of each line and put it into a list. For this I wrote the following:

def line_averages():
    out_file = open("data.csv", "w") #create data file
    out_file.write("1,2\n1,1,1,1\n-1,0,1\n42,17") #write numbers into data file
    out_file.close() 

    f = open("data.csv", "r") #read data file
    lines = f.readlines() 
    avLines= [] #create a list to put averages into 
    for line in lines:
        line2 = line.replace("\n", "") #remove /n from lines
        line2 = line2.split(",") #split the lines by commas
        Nlines = [] #create a list to put floats into
        for i in line2:
            i = float(i)
            Nlines.append(i) #put the floats into the list
        sums = sum(Nlines) #total up each line from the file
        avLines.append(sums/len(line2)) #put the averages of each line into a list
    return avLines


I'm pretty happy as this does exactly what I wanted but I can't help but wonder if it can't be shortened/simplified. It seems a bit inefficient to me to have lots of "placeholders" (like Nlines or line2) which are used to do an operation rather than being able to do the operation directly on them. What would you guys do (if anything) to make this more compact? I'm sure it could be done more succinctly!

Solution

Firstly, note that you are not closing the file after opening it. A Better way to read from a file would be to use the with statement. Read What is the python “with” statement designed for? to know more about the with statement.

The code would be,

with open("data.csv") as filename:
    # Code Follows


Other things to note here are

  • The mode of the open statement is 'r' by default and hence, you don't need to explicitly mention it.



  • Use a meaningful variable name like filename. This is to help your future self from not asking What is the f?



Now coming to the core of the problem. You can use the fileobject directly to iterate over the lines, You do not need to read the entire file into a list. This is helpful for large files. The code will be

for line in filename:
     # Code follows


The line line2 = line.replace("\n", "") is quite inadequate as there might be other whitespace like a trailing space or a \r. To solve all these issues, you can use str.strip.

stripped_line = line.strip()


Again, Note the name of the variable. Coming to this code block

line2 = line2.split(",")
Nlines = [] 
for i in line2:
    i = float(i)
    Nlines.append(i) 
sums = sum(Nlines)


You can solve this issue using split and a generator expression.

stripped_line  = stripped_line .split(',')
line_sum = sum(float(i) for i in stripped_line)


Alternatively you can use map

line_sum = sum(map(float,stripped_line))


(You can use literal_eval to directly do sum(literal_eval(stripped_line)), But the string functions are the right (and preffered way))

Hence the complete code will be reduced to (excluding the data creation part)

def line_averages():
    with open("test.csv", "r") as filename: 
        avLines= [] 
        for line in filename:
            stripped_line  = line.strip() 
            stripped_line  = stripped_line.split(',')
            line_sum = sum(float(i) for i in stripped_line ) 
            avLines.append(line_sum/len(stripped_line )) 
        return avLines


Code Golfing the solution, We can use a List Comprehension. Read this to kow more about the list comprehensions.

def line_averages():
    with open("test.csv", "r") as filename: 
        lines = [[float(i) for i in line.strip().split(',')] for line in filename]
        return [sum(i)/len(i) for i in lines]

Code Snippets

with open("data.csv") as filename:
    # Code Follows
for line in filename:
     # Code follows
stripped_line = line.strip()
line2 = line2.split(",")
Nlines = [] 
for i in line2:
    i = float(i)
    Nlines.append(i) 
sums = sum(Nlines)
stripped_line  = stripped_line .split(',')
line_sum = sum(float(i) for i in stripped_line)

Context

StackExchange Code Review Q#131791, answer score: 4

Revisions (0)

No revisions yet.