patternpythonMinor
Finding the average of each row in a CSV file
Viewed 0 times
thefileeachcsvaveragefindingrow
Problem
I am getting back into python after a year of not doing it at all. I just wrote a bit of code to create a file that looks like this:
Then I decided to take the average of each line and put it into a list. For this I wrote the following:
I'm pretty happy as this does exactly what I wanted but I can't help but wonder if it can't be shortened/simplified. It seems a bit inefficient to me to have lots of "placeholders" (like Nlines or line2) which are used to do an operation rather than being able to do the operation directly on them. What would you guys do (if anything) to make this more compact? I'm sure it could be done more succinctly!
1,2
1,1,1,1
-1,0,1
42,17Then I decided to take the average of each line and put it into a list. For this I wrote the following:
def line_averages():
out_file = open("data.csv", "w") #create data file
out_file.write("1,2\n1,1,1,1\n-1,0,1\n42,17") #write numbers into data file
out_file.close()
f = open("data.csv", "r") #read data file
lines = f.readlines()
avLines= [] #create a list to put averages into
for line in lines:
line2 = line.replace("\n", "") #remove /n from lines
line2 = line2.split(",") #split the lines by commas
Nlines = [] #create a list to put floats into
for i in line2:
i = float(i)
Nlines.append(i) #put the floats into the list
sums = sum(Nlines) #total up each line from the file
avLines.append(sums/len(line2)) #put the averages of each line into a list
return avLinesI'm pretty happy as this does exactly what I wanted but I can't help but wonder if it can't be shortened/simplified. It seems a bit inefficient to me to have lots of "placeholders" (like Nlines or line2) which are used to do an operation rather than being able to do the operation directly on them. What would you guys do (if anything) to make this more compact? I'm sure it could be done more succinctly!
Solution
Firstly, note that you are not closing the file after opening it. A Better way to read from a file would be to use the
The code would be,
Other things to note here are
Now coming to the core of the problem. You can use the fileobject directly to iterate over the lines, You do not need to read the entire file into a list. This is helpful for large files. The code will be
The line
Again, Note the name of the variable. Coming to this code block
You can solve this issue using
Alternatively you can use
(You can use
Hence the complete code will be reduced to (excluding the data creation part)
Code Golfing the solution, We can use a List Comprehension. Read this to kow more about the list comprehensions.
with statement. Read What is the python “with” statement designed for? to know more about the with statement. The code would be,
with open("data.csv") as filename:
# Code FollowsOther things to note here are
- The mode of the
openstatement is'r'by default and hence, you don't need to explicitly mention it.
- Use a meaningful variable name like
filename. This is to help your future self from not asking What is thef?
Now coming to the core of the problem. You can use the fileobject directly to iterate over the lines, You do not need to read the entire file into a list. This is helpful for large files. The code will be
for line in filename:
# Code followsThe line
line2 = line.replace("\n", "") is quite inadequate as there might be other whitespace like a trailing space or a \r. To solve all these issues, you can use str.strip. stripped_line = line.strip()Again, Note the name of the variable. Coming to this code block
line2 = line2.split(",")
Nlines = []
for i in line2:
i = float(i)
Nlines.append(i)
sums = sum(Nlines)You can solve this issue using
split and a generator expression. stripped_line = stripped_line .split(',')
line_sum = sum(float(i) for i in stripped_line)Alternatively you can use
mapline_sum = sum(map(float,stripped_line))(You can use
literal_eval to directly do sum(literal_eval(stripped_line)), But the string functions are the right (and preffered way))Hence the complete code will be reduced to (excluding the data creation part)
def line_averages():
with open("test.csv", "r") as filename:
avLines= []
for line in filename:
stripped_line = line.strip()
stripped_line = stripped_line.split(',')
line_sum = sum(float(i) for i in stripped_line )
avLines.append(line_sum/len(stripped_line ))
return avLinesCode Golfing the solution, We can use a List Comprehension. Read this to kow more about the list comprehensions.
def line_averages():
with open("test.csv", "r") as filename:
lines = [[float(i) for i in line.strip().split(',')] for line in filename]
return [sum(i)/len(i) for i in lines]Code Snippets
with open("data.csv") as filename:
# Code Followsfor line in filename:
# Code followsstripped_line = line.strip()line2 = line2.split(",")
Nlines = []
for i in line2:
i = float(i)
Nlines.append(i)
sums = sum(Nlines)stripped_line = stripped_line .split(',')
line_sum = sum(float(i) for i in stripped_line)Context
StackExchange Code Review Q#131791, answer score: 4
Revisions (0)
No revisions yet.