HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

CSV demographics analyzer seems to waste memory/move slowly

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
csvwasteseemsmovememoryslowlydemographicsanalyzer

Problem

I'm much more fluent in JS, but I needed to sort a lot of dates, ages, genders, etc. from a tab-delimited text file so I wrote this. Could I get some tips on how to make this more efficient and more Pythonic? The more Python I write the more I like it, but I definitely need some help.

One thing I noticed is that when I use a ~600 MB file Python uses up to 25% of my RAM. That seems like a bit much. Am I leaking somewhere? I couldn't make heads or tails of Guppy, which printed something like this:

Partition of a set of 8273952 objects. Total size = 1747556688 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 414691 5 1389344200 80 1389344200 80 dict (no owner)
1 7427576 90 338925432 19 1728269632 99 str
2 414362 5 9944688 1 1738214320 99 float
3 216 0 6847872 0 1745062192 100 list
4 7040 0 580152 0 1745642344 100 tuple
5 95 0 288488 0 1745930832 100 dict of module
6 1917 0 245376 0 1746176208 100 types.CodeType
7 235 0 243592 0 1746419800 100 dict of type
8 1840 0 220800 0 1746640600 100 function
9 235 0 209104 0 1746849704 100 type



Which I'm assuming means that the dict is using 80% of my memory, and my variables are using 19%? The documentation is, uh, not incredibly user-friendly.

```
#!/usr/bin/env python

from __future__ import division
import csv
import datetime
import subprocess
import gc

'''
#from guppy import hpy # This is used only if you want to see where memory is allocated
#h = hpy() # I woudn't uncomment unless you want to see your memory double
''' # Or if you want to see memory usage

vrdb = 'active.txt'

# Write headings to three output files

with open('legdata.txt', 'wb+') as myfile:
myfile.write('LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,PerFemales,Q1,Q2,Q3,Q4,Q5,Q6' + '\r\n')

with open('citydata.txt', 'wb+') as myfile:
myfi

Solution

Things I would have done (some minor) -

-
I would order the imports alphabetically. (not a huge deal)

-
You have a lot of "with open" statements in your code. I would have put this in a function. (again minor) i.e.

# Formatting your strings like this allows you to be pep8 compliant - 79 chars 
legal_data = (
    "LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
    "PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
    "\r\n"
    )

def write_file(filename, mode, data):
    with open(filename, mode) as f:
        f.write(data)

write_file('legdata.txt', 'wb+', legal_data)


-
I typically use join for lines like this:

precincts.append(str(row.get('CountyCode')) + '+'  + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))

Code Snippets

# Formatting your strings like this allows you to be pep8 compliant - 79 chars 
legal_data = (
    "LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
    "PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
    "\r\n"
    )

def write_file(filename, mode, data):
    with open(filename, mode) as f:
        f.write(data)

write_file('legdata.txt', 'wb+', legal_data)
precincts.append(str(row.get('CountyCode')) + '+'  + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))

Context

StackExchange Code Review Q#56486, answer score: 3

Revisions (0)

No revisions yet.