HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Matching two lists of dicts, strictly and more loosely

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
dictsstrictlymoretwolooselylistsandmatching

Problem

Two lists of dictionaries. Dicts in A and B can be matched by a particular dict key that may be identical or similar. The result should be a list of the source key from A, the matching key from B, and an additional key from B.

Both lists are constructed using screen-scraping, so you normally don't get to see them (unless you print them explicitly). For clarity though: list A is called stoxlist and looks something like this:

stoxlist = [{'stock': 'Apple',  
             'last_price': afloat,  
             'some-more': 'value', 
             ..
             },
            {'stock': 'Google',  
             'last_price': afloat,  
             'some-more': 'value', 
             ..
             }, 
             ...
            ]


List B looks like this:

symbollist = [{'name': 'Apple',  'symbol': 'APPLE'},
              {'name': 'Google', 'symbol': 'ALPHABET'},
              ...
             ]


The following does what it should do. Using Python 2.7.

```
# lookup helpers
def match(this, that):
''' returns True if this loosely matches with that '''
if this.strip('.') in that or this.split()[0] in that:
return True
else:
return False # this is not necessary, but explicit is better than implicit.

def fetch(name, symbols, key):
''' find name by dict[key] in list of dicts symbols '''
return next((item for item in symbols if match(name, item[key])), None)

def adddict(entry, symbol):
dict = {'source' : entry['stock'],
'found' : symbol['name'],
'symbol' : symbol['symbol']}
return dict

def lookup(stox, symbols):
'''lookup symbol for stock name'''
hits = []
misses = []
for entry in stox:
# try an exact match first
sym = next((item for item in symbols if entry['stock'] == item['name']), None)
try:
# save a hit, if any
hits.append(adddict(entry, sym))
except TypeError:
# if not

Solution

-
Exceptions can be faster and slower than using dict.get.

If you don't care about performance then you pick whichever you prefer.
However, if you do care you have to take into account three timings;
using get, without an exception, and with an exception.
Using get is slower than try without an exception, by about 37.8%.
But try with an exception is slower than get, by about 35.7%.

So, you should pick the method on whether there will be a sizeable amount of exceptions, or not.
In this case, as you expect upto four times, you should use dict.get.

Here are my timings in Python 3.6.0:

>>> from timeit import timeit
>>> timeit('''\
try:
    v = d['key']
except KeyError:
    pass
else:
    pass
''', 'd = {}')
0.25764639688429725
>>> timeit('''\
try:
    v = d['key']
except KeyError:
    pass
else:
    pass
''', 'd = {"key": "here"}')
0.034827991199023245
>>> timeit('''\
v = d.get('key')
if v is not None:
    pass
''', 'd = {}')
0.0920266698828982


-
Change match to return, rather than being in the if.

  • Move match into fetch. It makes the code easier to read.



  • Changing your nested trys to a function that yields the possible symbols, makes the code more DRY, and easier to read.



  • Move adddict into lookup.



  • Possibly remove the comments.



This makes swapping between try and dict.get simpler if one is faster than the other.
And also makes the code a little more dense, whilst still being readable.
Personally, I find the code being a bit more dense makes your code more readable.
But it's not much of a change from your current code.

This can change your code to:

# lookup helpers    
def fetch(name, symbols, key):
    ''' find name by dict[key] in list of dicts symbols  '''
    return next((
            item
            for item in symbols
            if name.strip('.') in item[key]
            or name.split()[0] in item[key]
        ), None)

def methods(entry, symbols):
    yield next((item for item in symbols if entry['stock'] == item['name']), None)
    yield fetch(entry['stock'], symbols, 'name')
    yield fetch(entry['stock'].upper(), symbols, 'symbol')
    yield fetch(entry['stock'], SPECIALS, 'name')

def lookup(stox, symbols):
    '''lookup symbol for stock name'''
    hits = []
    misses = []
    for entry in stox:
        for symbol in methods(entry, symbols):
            try:
                hits.append({'source': entry['stock'],
                             'found' : symbol['name'],
                             'symbol': symbol['symbol']})
                break
            except TypeError:
                continue
        else:
            misses.append(entry['stock'])


However I don't think this should be done in Python.
It looks like your extracting data from your database and filtering it in Python, rather than performing both in SQL.
And from my limited knowable of SQL this would be much easier written SQL. Assuming SQL has an equivalent of in.

But if you want to keep with Python, to change your code to use equality, rather than in.
This is as it can change your code to have linear time, rather than quadratic.
If I were to do this I'd change your code to something like:

def methods(stox, symbols):
    methods = [
        {i['name']: i for i in symbols},
        {i['symbol']: i for i in symbols},
        {i['name']: i for i in SPECIALS}
    ]
    def inner(stock):
        strip = stock.strip('.')
        split = stock.split()[0]
        yield methods[0].get(stock)
        yield methods[0].get(strip)
        yield methods[0].get(split)
        yield methods[1].get(strip.upper())
        yield methods[1].get(split.upper())
        yield methods[2].get(strip)
        yield methods[2].get(split)
    return inner

def lookup(stox, symbols):
    meth = methods(stox, symbols)
    hits = []
    misses = []
    for entry in stox:
        for symbol in meth(entry['stock']):
            if symbol is not None:
                hits.append({'source': entry['stock'],
                             'found' : symbol['name'],
                             'symbol': symbol['symbol']})
                break
        else:
            misses.append(entry['stock'])

Code Snippets

>>> from timeit import timeit
>>> timeit('''\
try:
    v = d['key']
except KeyError:
    pass
else:
    pass
''', 'd = {}')
0.25764639688429725
>>> timeit('''\
try:
    v = d['key']
except KeyError:
    pass
else:
    pass
''', 'd = {"key": "here"}')
0.034827991199023245
>>> timeit('''\
v = d.get('key')
if v is not None:
    pass
''', 'd = {}')
0.0920266698828982
# lookup helpers    
def fetch(name, symbols, key):
    ''' find name by dict[key] in list of dicts symbols  '''
    return next((
            item
            for item in symbols
            if name.strip('.') in item[key]
            or name.split()[0] in item[key]
        ), None)


def methods(entry, symbols):
    yield next((item for item in symbols if entry['stock'] == item['name']), None)
    yield fetch(entry['stock'], symbols, 'name')
    yield fetch(entry['stock'].upper(), symbols, 'symbol')
    yield fetch(entry['stock'], SPECIALS, 'name')


def lookup(stox, symbols):
    '''lookup symbol for stock name'''
    hits = []
    misses = []
    for entry in stox:
        for symbol in methods(entry, symbols):
            try:
                hits.append({'source': entry['stock'],
                             'found' : symbol['name'],
                             'symbol': symbol['symbol']})
                break
            except TypeError:
                continue
        else:
            misses.append(entry['stock'])
def methods(stox, symbols):
    methods = [
        {i['name']: i for i in symbols},
        {i['symbol']: i for i in symbols},
        {i['name']: i for i in SPECIALS}
    ]
    def inner(stock):
        strip = stock.strip('.')
        split = stock.split()[0]
        yield methods[0].get(stock)
        yield methods[0].get(strip)
        yield methods[0].get(split)
        yield methods[1].get(strip.upper())
        yield methods[1].get(split.upper())
        yield methods[2].get(strip)
        yield methods[2].get(split)
    return inner

def lookup(stox, symbols):
    meth = methods(stox, symbols)
    hits = []
    misses = []
    for entry in stox:
        for symbol in meth(entry['stock']):
            if symbol is not None:
                hits.append({'source': entry['stock'],
                             'found' : symbol['name'],
                             'symbol': symbol['symbol']})
                break
        else:
            misses.append(entry['stock'])

Context

StackExchange Code Review Q#154114, answer score: 8

Revisions (0)

No revisions yet.