patternpythonMinor

CSV demographics analyzer seems to waste memory/move slowly

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

csvwasteseemsmovememoryslowlydemographicsanalyzer

Problem

I'm much more fluent in JS, but I needed to sort a lot of dates, ages, genders, etc. from a tab-delimited text file so I wrote this. Could I get some tips on how to make this more efficient and more Pythonic? The more Python I write the more I like it, but I definitely need some help.

One thing I noticed is that when I use a ~600 MB file Python uses up to 25% of my RAM. That seems like a bit much. Am I leaking somewhere? I couldn't make heads or tails of Guppy, which printed something like this:

Partition of a set of 8273952 objects. Total size = 1747556688 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 414691   5 1389344200  80 1389344200  80 dict (no owner)
     1 7427576  90 338925432  19 1728269632  99 str
     2 414362   5  9944688   1 1738214320  99 float
     3    216   0  6847872   0 1745062192 100 list
     4   7040   0   580152   0 1745642344 100 tuple
     5     95   0   288488   0 1745930832 100 dict of module
     6   1917   0   245376   0 1746176208 100 types.CodeType
     7    235   0   243592   0 1746419800 100 dict of type
     8   1840   0   220800   0 1746640600 100 function
     9    235   0   209104   0 1746849704 100 type

Which I'm assuming means that the dict is using 80% of my memory, and my variables are using 19%? The documentation is, uh, not incredibly user-friendly.

```
#!/usr/bin/env python

from __future__ import division
import csv
import datetime
import subprocess
import gc

'''
#from guppy import hpy # This is used only if you want to see where memory is allocated
#h = hpy() # I woudn't uncomment unless you want to see your memory double
''' # Or if you want to see memory usage

vrdb = 'active.txt'

# Write headings to three output files

with open('legdata.txt', 'wb+') as myfile:
myfile.write('LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,PerFemales,Q1,Q2,Q3,Q4,Q5,Q6' + '\r\n')

with open('citydata.txt', 'wb+') as myfile:
myfi

Solution

Things I would have done (some minor) -

-
I would order the imports alphabetically. (not a huge deal)

-
You have a lot of "with open" statements in your code. I would have put this in a function. (again minor) i.e.

# Formatting your strings like this allows you to be pep8 compliant - 79 chars 
legal_data = (
    "LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
    "PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
    "\r\n"
    )

def write_file(filename, mode, data):
    with open(filename, mode) as f:
        f.write(data)

write_file('legdata.txt', 'wb+', legal_data)

-
I typically use join for lines like this:

precincts.append(str(row.get('CountyCode')) + '+'  + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))

Code Snippets

# Formatting your strings like this allows you to be pep8 compliant - 79 chars 
legal_data = (
    "LegDist,AvgAge,NumMales,PerMales,Q1,Q2,Q3,Q4,Q5,Q6,NumFemales,"
    "PerFemales,Q1,Q2,Q3,Q4,Q5,Q6"
    "\r\n"
    )

def write_file(filename, mode, data):
    with open(filename, mode) as f:
        f.write(data)

write_file('legdata.txt', 'wb+', legal_data)

precincts.append(str(row.get('CountyCode')) + '+'  + str(row.get('PrecinctCode')) + '+' + str(row.get('PrecinctPart')))

Context

StackExchange Code Review Q#56486, answer score: 3

Revisions (0)

No revisions yet.