patternpythonMinor
Retrieving lists of consecutive capitalised words from a list
Viewed 0 times
wordslistcapitalisedlistsretrievingfromconsecutive
Problem
Ok, so given the string:
I want to retrieve:
Assuming that we have successfully tokenised the original string, my first psuedocodish attempt was:
This is actually quite fast (well compared to some of my trying-to-be-pythonic attempts):
Timeit: 0.0160551071167 (1000 cycles)
Playing around with it, the quickest I can get is:
Timeit 0.0116229057312
Are there any more concise, pythonic ways to go about this (with similar execution times)?
s = "Born in Honolulu Hawaii Obama is a graduate of Columbia University and Harvard Law School"I want to retrieve:
[ ["Born"], ["Honolulu", "Hawaii", "Obama"], ["Columbia", "University"] ...]Assuming that we have successfully tokenised the original string, my first psuedocodish attempt was:
def retrieve(tokens):
results = []
i = 0
while i < len(tokens):
if tokens[i][0].isupper():
group = [tokens[i]]
j = i + 1
while i + j < len(tokens):
if tokens[i + j][0].isupper():
group.append(tokens[i + j])
j += 1
else:
break
i += 1
return resultsThis is actually quite fast (well compared to some of my trying-to-be-pythonic attempts):
Timeit: 0.0160551071167 (1000 cycles)
Playing around with it, the quickest I can get is:
def retrive(tokens):
results = []
group = []
for i in xrange(len(tokens)):
if tokens[i][0].isupper():
group.append(tokens[i])
else:
results.append(group)
group = []
results.append(group)
return filter(None, results)Timeit 0.0116229057312
Are there any more concise, pythonic ways to go about this (with similar execution times)?
Solution
A trivial optimisation that iterates on the tokens instead of by index (remember that in Python lists are iterables, it's unPythonic to iterate a list by index):
A solution like @JeremyK's with a list comprehension and regular expressions is always going to be more compact. I am only giving this answer to point out how lists should be iterated.
def retrieve(tokens):
results = []
group = []
for token in tokens:
if token[0].isupper():
group.append(token)
else:
if group: # group is not empty
results.append(group)
group = [] # reset group
return resultsA solution like @JeremyK's with a list comprehension and regular expressions is always going to be more compact. I am only giving this answer to point out how lists should be iterated.
Code Snippets
def retrieve(tokens):
results = []
group = []
for token in tokens:
if token[0].isupper():
group.append(token)
else:
if group: # group is not empty
results.append(group)
group = [] # reset group
return resultsContext
StackExchange Code Review Q#18965, answer score: 4
Revisions (0)
No revisions yet.