patternpythonMinor
python: is my program optimal
Viewed 0 times
programpythonoptimal
Problem
I wrote code in python that works slow. Because I am new to python, I am not sure that I am doing everything right. My question is what I can do optimally?
About the problem: I have 25 *.json files, each is about 80 MB. Each file just contain json strings. I need make some histogram based on data.
In this part I want create list of all dictionaries ( one dictionary represent json object):
then I want to create list
Now I am creating histogram:
Any thought and suggestions are appreciated.
Thank you!
About the problem: I have 25 *.json files, each is about 80 MB. Each file just contain json strings. I need make some histogram based on data.
In this part I want create list of all dictionaries ( one dictionary represent json object):
d = [] # filename is list of name of files
for x in filename:
d.extend(map(json.loads, open(x)))then I want to create list
u :u = []
for x in d:
s = x['key_1'] # s is sting which I use to get useful value
t1 = 60*int(s[11:13]) + int(s[14:16])# t1 is useful value
u.append(t1)Now I am creating histogram:
plt.hist(u, bins = (max(u) - min(u)))
plt.show()Any thought and suggestions are appreciated.
Thank you!
Solution
Python uses a surprisingly large amount of memory when reading files, often 3-4 times the actual file size. You never close each file after you open it, so all of that memory is still in use later in the program.
Try changing the flow of your program to
Something like
Try changing the flow of your program to
- Open a file
- Compute a histogram for that file
- Close the file
- Merge it with a "global" histogram
- Repeat until there are no files left.
Something like
u = []
for f in filenames:
with open(f) as file:
# process individual file contents
contents = file.read()
data = json.loads(contents)
for obj in data:
s = obj['key_1']
t1 = 60 * int(s[11:13]) + int(s[14:16])
u.append(t1)
# make the global histogram
plt.hist(u, bins = (max(u) - min(u)))
plt.show()with open as automatically closes files when you're done, and handles cases where the file can't be read or there are other errors.Code Snippets
u = []
for f in filenames:
with open(f) as file:
# process individual file contents
contents = file.read()
data = json.loads(contents)
for obj in data:
s = obj['key_1']
t1 = 60 * int(s[11:13]) + int(s[14:16])
u.append(t1)
# make the global histogram
plt.hist(u, bins = (max(u) - min(u)))
plt.show()Context
StackExchange Code Review Q#8963, answer score: 7
Revisions (0)
No revisions yet.