patternpythonMinor
CSV demographics analyzer seems to waste memory/move slowly
Viewed 0 times
csvwasteseemsmovememoryslowlydemographicsanalyzer
Problem
I'm much more fluent in JS, but I needed to sort a lot of dates, ages, genders, etc. from a tab-delimited text file so I wrote this. Could I get some tips on how to make this more efficient and more Pythonic? The more Python I write the more I like it, but I definitely need some help.
One thing I noticed is that when I use a ~600 MB file Python uses up to 25% of my RAM. That seems like a bit much. Am I leaking somewhere? I couldn't make heads or tails of Guppy, which printed something like this:
Which I'm assuming means that the dict is using 80% of my memory, and my variables are using 19%? The documentation is, uh, not incredibly user-friendly.
```
#!/usr/bin/env python
from __future__ import division
import csv
import datetime
import subprocess
import gc
'''
#from guppy import hpy # This is used only if you want to see where memory is allocated
#h = hpy() # I woudn't uncomment unless you want to see your memory double
''' # Or if you want to see memory usage
vrdb = 'active.txt'
# Write headings to three output files
with open('legdata.txt', 'wb+') as myfile:
myfile.write('LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,PerFemales,Q1,Q2,Q3,Q4,Q5,Q6' + '\r\n')
with open('citydata.txt', 'wb+') as myfile:
myfi
One thing I noticed is that when I use a ~600 MB file Python uses up to 25% of my RAM. That seems like a bit much. Am I leaking somewhere? I couldn't make heads or tails of Guppy, which printed something like this:
Partition of a set of 8273952 objects. Total size = 1747556688 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 414691 5 1389344200 80 1389344200 80 dict (no owner)
1 7427576 90 338925432 19 1728269632 99 str
2 414362 5 9944688 1 1738214320 99 float
3 216 0 6847872 0 1745062192 100 list
4 7040 0 580152 0 1745642344 100 tuple
5 95 0 288488 0 1745930832 100 dict of module
6 1917 0 245376 0 1746176208 100 types.CodeType
7 235 0 243592 0 1746419800 100 dict of type
8 1840 0 220800 0 1746640600 100 function
9 235 0 209104 0 1746849704 100 type
Which I'm assuming means that the dict is using 80% of my memory, and my variables are using 19%? The documentation is, uh, not incredibly user-friendly.
```
#!/usr/bin/env python
from __future__ import division
import csv
import datetime
import subprocess
import gc
'''
#from guppy import hpy # This is used only if you want to see where memory is allocated
#h = hpy() # I woudn't uncomment unless you want to see your memory double
''' # Or if you want to see memory usage
vrdb = 'active.txt'
# Write headings to three output files
with open('legdata.txt', 'wb+') as myfile:
myfile.write('LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,PerFemales,Q1,Q2,Q3,Q4,Q5,Q6' + '\r\n')
with open('citydata.txt', 'wb+') as myfile:
myfi
Solution
Things I would have done (some minor) -
-
I would order the imports alphabetically. (not a huge deal)
-
You have a lot of "with open" statements in your code. I would have put this in a function. (again minor) i.e.
-
I typically use join for lines like this:
-
I would order the imports alphabetically. (not a huge deal)
-
You have a lot of "with open" statements in your code. I would have put this in a function. (again minor) i.e.
# Formatting your strings like this allows you to be pep8 compliant - 79 chars
legal_data = (
"LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
"PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
"\r\n"
)
def write_file(filename, mode, data):
with open(filename, mode) as f:
f.write(data)
write_file('legdata.txt', 'wb+', legal_data)-
I typically use join for lines like this:
precincts.append(str(row.get('CountyCode')) + '+' + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))Code Snippets
# Formatting your strings like this allows you to be pep8 compliant - 79 chars
legal_data = (
"LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
"PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
"\r\n"
)
def write_file(filename, mode, data):
with open(filename, mode) as f:
f.write(data)
write_file('legdata.txt', 'wb+', legal_data)precincts.append(str(row.get('CountyCode')) + '+' + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))Context
StackExchange Code Review Q#56486, answer score: 3
Revisions (0)
No revisions yet.