patternpythonMinor
Clean and tidy Python code for dictionary look up and API interactions
Viewed 0 times
lookinteractionscodepythontidyforanddictionaryapiclean
Problem
I am new to programming and entirely self-taught. Below is a program I wrote to retrieve text content via an API and scan the text for important words. If the word is found URI is returned. The API returns data in JSON. The important words and URIs are linked via a dictionary. It works. I'm at a point in my programming education where I need to start focusing on writing clean code. I have a book about it, but there's something about peer review that is so much more helpful. Here is the code with commenting.
Any ideas on how I can make this cleaner?
```
#!/usr/bin/env python
# coding: utf-8
from __future__ import division
import urllib, json
#bio_dict is the dictionary used to assign DBpedia URIs. When a specific key is
#found in the text, the value URI is returned. Any dictionary can be used here.
bio_dict = eval(open('biodict.txt').read())
in_file_name = "text_object_id1.txt"
in_file = open(in_file_name, 'r')
out_file_name = "all_uri.txt"
out_file = open(out_file_name, 'w')
#At this point, you have imported all the modules you need, read the dictionary
#you will use to the assign the URIs and opened all the files you will be using. The
#InFile should be a list of DataObject IDs. The OutFile will hold your results.
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/' + line.strip() + '.json'
#data_object_url is a function that will create the URL needed to query the API
#for each data object ID in the list
for line in in_file:
results = urllib.urlopen(data_object_url(line)).read()
print line
print len(results)
data = json.loads(results)
descriptions = data ['dataObjects'][0]['description'].lower()
for k, v in bio_dict.iteritems():
if k in descriptions:
out_file.write(','.join([line.strip(),k,v + '\n']))
Any ideas on how I can make this cleaner?
```
#!/usr/bin/env python
# coding: utf-8
from __future__ import division
import urllib, json
#bio_dict is the dictionary used to assign DBpedia URIs. When a specific key is
#found in the text, the value URI is returned. Any dictionary can be used here.
bio_dict = eval(open('biodict.txt').read())
in_file_name = "text_object_id1.txt"
in_file = open(in_file_name, 'r')
out_file_name = "all_uri.txt"
out_file = open(out_file_name, 'w')
#At this point, you have imported all the modules you need, read the dictionary
#you will use to the assign the URIs and opened all the files you will be using. The
#InFile should be a list of DataObject IDs. The OutFile will hold your results.
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/' + line.strip() + '.json'
#data_object_url is a function that will create the URL needed to query the API
#for each data object ID in the list
for line in in_file:
results = urllib.urlopen(data_object_url(line)).read()
print line
print len(results)
data = json.loads(results)
descriptions = data ['dataObjects'][0]['description'].lower()
for k, v in bio_dict.iteritems():
if k in descriptions:
out_file.write(','.join([line.strip(),k,v + '\n']))
Solution
I have a few pointers, a lot of them coming from Python's offical style guide:
-
ALWAYS use
This syntax prevents any errant bugs like forgetting to close a file, which you do here:
-
Use
-
Different module
They should have their own line. If you are importing submodules, then you can have multiple imports on the same line:
-
Pay attention to function and variable names; make them descriptive.
Function names should start with a verb and describe what they do: instead of
Also, make variable names descriptive; erring on the side of being too wordy than not wordy enough: instead of
-
When writing comments, only say what needs to be explained. Comments should not explain the obvious but instead should explain the complex or the reasons behind the code. A lot of the comments you have would actually be more fitting as docstrings.
-
Look into list comprehensions. In my code below I was able to take your for loop into a single statement. This also made writing to the output file much more Pythonic.
Here is what your code looks like with my suggestions implemented:
-
ALWAYS use
with syntax when dealing with files:with open('some_file.txt', 'r') as file:
# Do stuff
# file will be closed by now
foo = 'Hello World!'This syntax prevents any errant bugs like forgetting to close a file, which you do here:
bio_dict = eval(open('biodict.txt').read())-
Use
format to create strings with variables. This is the easiest way to create strings without having to use ugly string concatenation:# This functions...
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/' + line.strip() + '.json'
# But this is better, more Pythonic
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/{}.json'.format(line.strip())-
Different module
imports should not go on the same line:import urllib, jsonThey should have their own line. If you are importing submodules, then you can have multiple imports on the same line:
from my_module import some_func, other_func-
Pay attention to function and variable names; make them descriptive.
Function names should start with a verb and describe what they do: instead of
data_object_url use something like get_object_url. Also, make variable names descriptive; erring on the side of being too wordy than not wordy enough: instead of
for k, v in bio_dict.iteritems() use for key, value in bio_dict.iteritems().-
When writing comments, only say what needs to be explained. Comments should not explain the obvious but instead should explain the complex or the reasons behind the code. A lot of the comments you have would actually be more fitting as docstrings.
-
Look into list comprehensions. In my code below I was able to take your for loop into a single statement. This also made writing to the output file much more Pythonic.
Here is what your code looks like with my suggestions implemented:
#!/usr/bin/env python
# coding: utf-8
import urllib
import json
with open('biodict.txt', 'r') as file:
bio_dict = eval(file.read())
in_file_name = "text_object_id1.txt"
out_file_name = "all_uri.txt"
def get_object_url(line):
return 'http://eol.org/api/data_objects/1.0/{}.json'.format(line)
out_file_text = []
with open(in_file_name, 'r') as file:
for line in in_file:
line = line.strip()
results = urllib.urlopen(get_object_url(line)).read()
data = json.loads(results)
descriptions = data['dataObjects'][0]['description'].lower()
out_file_text += [','.join(line, key, bio_dict[key])
for key in bio_dict if key in descriptions]
with open(out_file_name, 'w') as file:
file.write('\n'.join(out_file_text))Code Snippets
with open('some_file.txt', 'r') as file:
# Do stuff
# file will be closed by now
foo = 'Hello World!'bio_dict = eval(open('biodict.txt').read())# This functions...
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/' + line.strip() + '.json'
# But this is better, more Pythonic
def data_object_url(line):
return 'http://eol.org/api/data_objects/1.0/{}.json'.format(line.strip())import urllib, jsonfrom my_module import some_func, other_funcContext
StackExchange Code Review Q#51908, answer score: 6
Revisions (0)
No revisions yet.