HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

python: is my program optimal

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
programpythonoptimal

Problem

I wrote code in python that works slow. Because I am new to python, I am not sure that I am doing everything right. My question is what I can do optimally?
About the problem: I have 25 *.json files, each is about 80 MB. Each file just contain json strings. I need make some histogram based on data.

In this part I want create list of all dictionaries ( one dictionary represent json object):

d = [] # filename is list of name of files
for x in filename:
d.extend(map(json.loads, open(x)))


then I want to create list u :

u = []
for x in d:
  s = x['key_1'] # s is sting which I use to get useful value
  t1 = 60*int(s[11:13]) + int(s[14:16])# t1 is useful value
  u.append(t1)


Now I am creating histogram:

plt.hist(u, bins = (max(u) - min(u)))
plt.show()


Any thought and suggestions are appreciated.
Thank you!

Solution

Python uses a surprisingly large amount of memory when reading files, often 3-4 times the actual file size. You never close each file after you open it, so all of that memory is still in use later in the program.

Try changing the flow of your program to

  • Open a file



  • Compute a histogram for that file



  • Close the file



  • Merge it with a "global" histogram



  • Repeat until there are no files left.



Something like

u = []
for f in filenames:
    with open(f) as file:
        # process individual file contents
        contents = file.read()
        data = json.loads(contents)
        for obj in data:
            s = obj['key_1']
            t1 = 60 * int(s[11:13]) + int(s[14:16])
            u.append(t1)

# make the global histogram
plt.hist(u, bins = (max(u) - min(u)))
plt.show()


with open as automatically closes files when you're done, and handles cases where the file can't be read or there are other errors.

Code Snippets

u = []
for f in filenames:
    with open(f) as file:
        # process individual file contents
        contents = file.read()
        data = json.loads(contents)
        for obj in data:
            s = obj['key_1']
            t1 = 60 * int(s[11:13]) + int(s[14:16])
            u.append(t1)

# make the global histogram
plt.hist(u, bins = (max(u) - min(u)))
plt.show()

Context

StackExchange Code Review Q#8963, answer score: 7

Revisions (0)

No revisions yet.