patternpythonMinor
Matching two lists of dicts, strictly and more loosely
Viewed 0 times
dictsstrictlymoretwolooselylistsandmatching
Problem
Two lists of dictionaries. Dicts in A and B can be matched by a particular dict key that may be identical or similar. The result should be a list of the source key from A, the matching key from B, and an additional key from B.
Both lists are constructed using screen-scraping, so you normally don't get to see them (unless you
List B looks like this:
The following does what it should do. Using Python 2.7.
```
# lookup helpers
def match(this, that):
''' returns True if this loosely matches with that '''
if this.strip('.') in that or this.split()[0] in that:
return True
else:
return False # this is not necessary, but explicit is better than implicit.
def fetch(name, symbols, key):
''' find name by dict[key] in list of dicts symbols '''
return next((item for item in symbols if match(name, item[key])), None)
def adddict(entry, symbol):
dict = {'source' : entry['stock'],
'found' : symbol['name'],
'symbol' : symbol['symbol']}
return dict
def lookup(stox, symbols):
'''lookup symbol for stock name'''
hits = []
misses = []
for entry in stox:
# try an exact match first
sym = next((item for item in symbols if entry['stock'] == item['name']), None)
try:
# save a hit, if any
hits.append(adddict(entry, sym))
except TypeError:
# if not
Both lists are constructed using screen-scraping, so you normally don't get to see them (unless you
print them explicitly). For clarity though: list A is called stoxlist and looks something like this: stoxlist = [{'stock': 'Apple',
'last_price': afloat,
'some-more': 'value',
..
},
{'stock': 'Google',
'last_price': afloat,
'some-more': 'value',
..
},
...
]List B looks like this:
symbollist = [{'name': 'Apple', 'symbol': 'APPLE'},
{'name': 'Google', 'symbol': 'ALPHABET'},
...
]The following does what it should do. Using Python 2.7.
```
# lookup helpers
def match(this, that):
''' returns True if this loosely matches with that '''
if this.strip('.') in that or this.split()[0] in that:
return True
else:
return False # this is not necessary, but explicit is better than implicit.
def fetch(name, symbols, key):
''' find name by dict[key] in list of dicts symbols '''
return next((item for item in symbols if match(name, item[key])), None)
def adddict(entry, symbol):
dict = {'source' : entry['stock'],
'found' : symbol['name'],
'symbol' : symbol['symbol']}
return dict
def lookup(stox, symbols):
'''lookup symbol for stock name'''
hits = []
misses = []
for entry in stox:
# try an exact match first
sym = next((item for item in symbols if entry['stock'] == item['name']), None)
try:
# save a hit, if any
hits.append(adddict(entry, sym))
except TypeError:
# if not
Solution
-
Exceptions can be faster and slower than using
If you don't care about performance then you pick whichever you prefer.
However, if you do care you have to take into account three timings;
using get, without an exception, and with an exception.
Using get is slower than
But
So, you should pick the method on whether there will be a sizeable amount of exceptions, or not.
In this case, as you expect upto four times, you should use
Here are my timings in Python 3.6.0:
-
Change
This makes swapping between
And also makes the code a little more dense, whilst still being readable.
Personally, I find the code being a bit more dense makes your code more readable.
But it's not much of a change from your current code.
This can change your code to:
However I don't think this should be done in Python.
It looks like your extracting data from your database and filtering it in Python, rather than performing both in SQL.
And from my limited knowable of SQL this would be much easier written SQL. Assuming SQL has an equivalent of
But if you want to keep with Python, to change your code to use equality, rather than
This is as it can change your code to have linear time, rather than quadratic.
If I were to do this I'd change your code to something like:
Exceptions can be faster and slower than using
dict.get.If you don't care about performance then you pick whichever you prefer.
However, if you do care you have to take into account three timings;
using get, without an exception, and with an exception.
Using get is slower than
try without an exception, by about 37.8%.But
try with an exception is slower than get, by about 35.7%.So, you should pick the method on whether there will be a sizeable amount of exceptions, or not.
In this case, as you expect upto four times, you should use
dict.get.Here are my timings in Python 3.6.0:
>>> from timeit import timeit
>>> timeit('''\
try:
v = d['key']
except KeyError:
pass
else:
pass
''', 'd = {}')
0.25764639688429725
>>> timeit('''\
try:
v = d['key']
except KeyError:
pass
else:
pass
''', 'd = {"key": "here"}')
0.034827991199023245
>>> timeit('''\
v = d.get('key')
if v is not None:
pass
''', 'd = {}')
0.0920266698828982-
Change
match to return, rather than being in the if.- Move
matchintofetch. It makes the code easier to read.
- Changing your nested
trys to a function thatyields the possiblesymbols, makes the code more DRY, and easier to read.
- Move
adddictintolookup.
- Possibly remove the comments.
This makes swapping between
try and dict.get simpler if one is faster than the other.And also makes the code a little more dense, whilst still being readable.
Personally, I find the code being a bit more dense makes your code more readable.
But it's not much of a change from your current code.
This can change your code to:
# lookup helpers
def fetch(name, symbols, key):
''' find name by dict[key] in list of dicts symbols '''
return next((
item
for item in symbols
if name.strip('.') in item[key]
or name.split()[0] in item[key]
), None)
def methods(entry, symbols):
yield next((item for item in symbols if entry['stock'] == item['name']), None)
yield fetch(entry['stock'], symbols, 'name')
yield fetch(entry['stock'].upper(), symbols, 'symbol')
yield fetch(entry['stock'], SPECIALS, 'name')
def lookup(stox, symbols):
'''lookup symbol for stock name'''
hits = []
misses = []
for entry in stox:
for symbol in methods(entry, symbols):
try:
hits.append({'source': entry['stock'],
'found' : symbol['name'],
'symbol': symbol['symbol']})
break
except TypeError:
continue
else:
misses.append(entry['stock'])However I don't think this should be done in Python.
It looks like your extracting data from your database and filtering it in Python, rather than performing both in SQL.
And from my limited knowable of SQL this would be much easier written SQL. Assuming SQL has an equivalent of
in.But if you want to keep with Python, to change your code to use equality, rather than
in.This is as it can change your code to have linear time, rather than quadratic.
If I were to do this I'd change your code to something like:
def methods(stox, symbols):
methods = [
{i['name']: i for i in symbols},
{i['symbol']: i for i in symbols},
{i['name']: i for i in SPECIALS}
]
def inner(stock):
strip = stock.strip('.')
split = stock.split()[0]
yield methods[0].get(stock)
yield methods[0].get(strip)
yield methods[0].get(split)
yield methods[1].get(strip.upper())
yield methods[1].get(split.upper())
yield methods[2].get(strip)
yield methods[2].get(split)
return inner
def lookup(stox, symbols):
meth = methods(stox, symbols)
hits = []
misses = []
for entry in stox:
for symbol in meth(entry['stock']):
if symbol is not None:
hits.append({'source': entry['stock'],
'found' : symbol['name'],
'symbol': symbol['symbol']})
break
else:
misses.append(entry['stock'])Code Snippets
>>> from timeit import timeit
>>> timeit('''\
try:
v = d['key']
except KeyError:
pass
else:
pass
''', 'd = {}')
0.25764639688429725
>>> timeit('''\
try:
v = d['key']
except KeyError:
pass
else:
pass
''', 'd = {"key": "here"}')
0.034827991199023245
>>> timeit('''\
v = d.get('key')
if v is not None:
pass
''', 'd = {}')
0.0920266698828982# lookup helpers
def fetch(name, symbols, key):
''' find name by dict[key] in list of dicts symbols '''
return next((
item
for item in symbols
if name.strip('.') in item[key]
or name.split()[0] in item[key]
), None)
def methods(entry, symbols):
yield next((item for item in symbols if entry['stock'] == item['name']), None)
yield fetch(entry['stock'], symbols, 'name')
yield fetch(entry['stock'].upper(), symbols, 'symbol')
yield fetch(entry['stock'], SPECIALS, 'name')
def lookup(stox, symbols):
'''lookup symbol for stock name'''
hits = []
misses = []
for entry in stox:
for symbol in methods(entry, symbols):
try:
hits.append({'source': entry['stock'],
'found' : symbol['name'],
'symbol': symbol['symbol']})
break
except TypeError:
continue
else:
misses.append(entry['stock'])def methods(stox, symbols):
methods = [
{i['name']: i for i in symbols},
{i['symbol']: i for i in symbols},
{i['name']: i for i in SPECIALS}
]
def inner(stock):
strip = stock.strip('.')
split = stock.split()[0]
yield methods[0].get(stock)
yield methods[0].get(strip)
yield methods[0].get(split)
yield methods[1].get(strip.upper())
yield methods[1].get(split.upper())
yield methods[2].get(strip)
yield methods[2].get(split)
return inner
def lookup(stox, symbols):
meth = methods(stox, symbols)
hits = []
misses = []
for entry in stox:
for symbol in meth(entry['stock']):
if symbol is not None:
hits.append({'source': entry['stock'],
'found' : symbol['name'],
'symbol': symbol['symbol']})
break
else:
misses.append(entry['stock'])Context
StackExchange Code Review Q#154114, answer score: 8
Revisions (0)
No revisions yet.