patternpythonMinor
Function to split strings on multiple delimiters
Viewed 0 times
functionsplitmultiplestringsdelimiters
Problem
I have this implementation of the split algorithm that different from
Using the standard
Using my implementation:
Multiple delimiters:
Obs: We can do something like using
.split() method you can use with multiple delimiters. Is this a good way of implementing it (more performance)?def split(str, delim=" "):
index = 0
string = ""
array = []
while index < len(str):
if str[index] not in delim:
string += str[index]
else:
if string:
array.append(string)
string = ""
index += 1
if string: array.append(string)
return arrayUsing the standard
.split() method:>>> print "hello = 20".split()
['hello', '=', '20']
>>> print "one;two; abc; b ".split(";")
['one', 'two', ' abc', ' b ']Using my implementation:
>>> print split("hello = 20")
['hello', '=', '20']
>>> print split("one;two; abc; b ", ";")
['one', 'two', ' abc', ' b ']Multiple delimiters:
>>> print split("one;two; abc; b.e. b eeeeee.e.e;;e ;.", " .;")
['one', 'two', 'abc', 'b', 'e', 'b', 'eeeeee', 'e', 'e', 'e']
>>> print split("foo barfoo;bar;foo bar.foo", " .;")
['foo', 'barfoo', 'bar', 'foo', 'bar', 'foo']
>>> print split("foo*bar*foo.foo bar;", "*.")
['foo', 'bar', 'foo', 'foo bar;']Obs: We can do something like using
re.split().Solution
There's no need to iterate using that
Also string concatenation (
As Maarten Fabré suggested, you could also ditch the
There's also a one-liner solution based on
1 From https://wiki.python.org/moin/PythonSpeed: "String concatenation is best done with
while, a for is good enough.Also string concatenation (
+=) is expensive. It's better to use a list and join its elements at the end1.def split(s, delim=" "):
words = []
word = []
for c in s:
if c not in delim:
word.append(c)
else:
if word:
words.append(''.join(word))
word = []
if word:
words.append(''.join(word))
return wordsAs Maarten Fabré suggested, you could also ditch the
words list and transform the function into a generator that iterates over (yields) each word. This saves some memory if you're examining only one word at a time and don't need all of them in one shot, for example when you're counting word frequency (collections.Counter(isplit(s))).def isplit(s, delim=" "): # iterator version
word = []
for c in s:
if c not in delim:
word.append(c)
else:
if word:
yield ''.join(word)
word = []
if word:
yield ''.join(word)
def split(*args, **kwargs): # only converts the iterator to a list
return list(isplit(*args, **kwargs))There's also a one-liner solution based on
itertools.groupby:import itertools
def isplit(s, delim=" "): # iterator version
# replace the outer parentheses (...) with brackets [...]
# to transform the generator comprehension into a list comprehension
# and return a list
return (''.join(word)
for is_word, word in itertools.groupby(s, lambda c: c not in delim)
if is_word)
def split(*args, **kwargs): # only converts the iterator to a list
return list(isplit(*args, **kwargs))1 From https://wiki.python.org/moin/PythonSpeed: "String concatenation is best done with
''.join(seq) which is an O(n) process. In contrast, using the + or += operators can result in an O(n**2) process because new strings may be built for each intermediate step. The CPython 2.4 interpreter mitigates this issue somewhat; however, ''.join(seq) remains the best practice".Code Snippets
def split(s, delim=" "):
words = []
word = []
for c in s:
if c not in delim:
word.append(c)
else:
if word:
words.append(''.join(word))
word = []
if word:
words.append(''.join(word))
return wordsdef isplit(s, delim=" "): # iterator version
word = []
for c in s:
if c not in delim:
word.append(c)
else:
if word:
yield ''.join(word)
word = []
if word:
yield ''.join(word)
def split(*args, **kwargs): # only converts the iterator to a list
return list(isplit(*args, **kwargs))import itertools
def isplit(s, delim=" "): # iterator version
# replace the outer parentheses (...) with brackets [...]
# to transform the generator comprehension into a list comprehension
# and return a list
return (''.join(word)
for is_word, word in itertools.groupby(s, lambda c: c not in delim)
if is_word)
def split(*args, **kwargs): # only converts the iterator to a list
return list(isplit(*args, **kwargs))Context
StackExchange Code Review Q#47627, answer score: 9
Revisions (0)
No revisions yet.