patternpythonMinor
Build a dictionary based on split string
Viewed 0 times
buildsplitbaseddictionarystring
Problem
I've written a simple script that takes a string filled with brackets and strings and builds them to a dictionary.
Example Input:
Output for
Example Input:
string_1 = '(()())'
string_2 = '((a)(b))'
string_3 = '(a(a)b(b)c)'
string_4 = 'a(a(a)b(b)c)b'
string_4 = 'a[a{a}b{b}c]b'
string_5 = 'beginning(content[word]{another word})outside'Output for
string5 and the other examples:[
('0-8', 'beginning'),
('9', '('),
('10-16', 'content'),
('17', '['),
('18-21', 'word'),
('22', ']'),
('23', '{'),
('24-35', 'another word'),
('36', '}'),
('37', ')'),
('38-44', 'outside')
]def sort(dictionary):
return sorted(dictionary.items(), key=lambda v: int(v[0].split("-")[0]))
def join(a, b):
return ''.join([repr(a), '-', repr(b - 1)]) if a != b else repr(b)
class Positions:
def __init__(self, name):
self.name = name
self.content = {} # creates a new empty list for each bracket
def push(self, pos, content):
self.content[pos] = content
def convert_string(string):
string_content = string
string = Positions(string)
pos = 0
start_of_str_pos = 0
internal_string = ''
for char in string_content:
if char in ['(', '{', '[', ')', '']:
string.push(repr(pos), repr(char))
if internal_string != '':
string.push(join(start_of_str_pos, pos), internal_string)
internal_string = ''
start_of_str_pos = pos + 1
else:
internal_string += char
pos += 1
if internal_string != '':
string.push(''.join([repr(start_of_str_pos),
'-', repr(pos - 1)]), internal_string)
print sort(string.content)Solution
Obvious but easy
There are some issues that are immediately obvious and easy to correct.
In this statement:
The multiple
and so is the
It would be less awkward, simpler and more readable by using string formatting:
This can be written a lot simpler:
Like this:
But actually, it's a bit disturbing that the opening and closing brackets are ordered inconsistently. For example, looking at this string, it's not immediately obvious that it includes all matching closing brackets for all opening brackets. This way would be immediately obvious and less error prone:
This is both confusing and disturbing:
Why shuffle around the
It's a bad practice to reassign the value of a parameter,
as it makes it confusing what it really stands for throughout the function.
Another problem here is calling an instance of
Later in the code when I read "string",
I think it's an actually string, but in fact it's not,
it's an instance of "Positions".
Very misleading.
Since empty strings are falsy, instead of this:
You can write simply:
Obvious but harder
There are some issues that are sort of obviously wrong,
but not so easy to correct.
This sorting function smells:
Instead of splitting a string and parsing to int,
my instinct says there should be a way to use the original int values directly.
In general, the code formats these strings from int values,
and it seems silly to later un-format the strings so you can sort.
There should be a better way.
The
It's called "Positions",
but its fields are "name" and "content".
Sounds more like a key-value pair.
Then there's this code with the comment:
The comment talks about "list", but the code uses a dictionary.
And there's the
But here the method puts a value into a dictionary,
an operation normally called "put".
Alternative implementation
Based on the above suggestions,
and reorganizing to more meaningful elements,
you could do something like this:
Although with the implementation of
I made it possible to sort simply by
actually there is no need to sort,
as the items are naturally sorted.
Note: the purpose of
There are some issues that are immediately obvious and easy to correct.
In this statement:
return ''.join([repr(a), '-', repr(b - 1)]) if a != b else repr(b)The multiple
repr are awkward,and so is the
join.It would be less awkward, simpler and more readable by using string formatting:
return '{}-{}'.format(a, b - 1) if a != b else repr(a)This can be written a lot simpler:
if char in ['(', '{', '[', ')', '']:Like this:
if char in '({[)':But actually, it's a bit disturbing that the opening and closing brackets are ordered inconsistently. For example, looking at this string, it's not immediately obvious that it includes all matching closing brackets for all opening brackets. This way would be immediately obvious and less error prone:
if char in '(){}[]<>':This is both confusing and disturbing:
def convert_string(string):
string_content = string
string = Positions(string)Why shuffle around the
string parameter?It's a bad practice to reassign the value of a parameter,
as it makes it confusing what it really stands for throughout the function.
Another problem here is calling an instance of
Positions a "string".Later in the code when I read "string",
I think it's an actually string, but in fact it's not,
it's an instance of "Positions".
Very misleading.
Since empty strings are falsy, instead of this:
if internal_string != '':
# ...You can write simply:
if internal_string:
# ...Obvious but harder
There are some issues that are sort of obviously wrong,
but not so easy to correct.
This sorting function smells:
def sort(dictionary):
return sorted(dictionary.items(), key=lambda v: int(v[0].split("-")[0]))Instead of splitting a string and parsing to int,
my instinct says there should be a way to use the original int values directly.
In general, the code formats these strings from int values,
and it seems silly to later un-format the strings so you can sort.
There should be a better way.
The
Positions class stinks.It's called "Positions",
but its fields are "name" and "content".
Sounds more like a key-value pair.
Then there's this code with the comment:
self.content = {} # creates a new empty list for each bracketThe comment talks about "list", but the code uses a dictionary.
And there's the
push method:def push(self, pos, content):
self.content[pos] = contentpush is a term commonly used with stacks, or sometimes with lists.But here the method puts a value into a dictionary,
an operation normally called "put".
Alternative implementation
Based on the above suggestions,
and reorganizing to more meaningful elements,
you could do something like this:
from functools import total_ordering
@total_ordering
class Item:
def __init__(self, start, end, content):
self.start = start
self.end = end
self.content = content
@property
def position(self):
a = self.start
b = self.end
return '{}-{}'.format(a, b - 1) if a != b else repr(a)
def __lt__(self, other):
return self.start ':
items.append(Item(pos, pos, char))
if internal_string:
items.append(Item(start, pos, internal_string))
internal_string = ''
start = pos + 1
else:
internal_string += char
if internal_string:
items.append(Item(start, len(string), internal_string))
return itemsAlthough with the implementation of
__lt__,I made it possible to sort simply by
sorted(items),actually there is no need to sort,
as the items are naturally sorted.
Note: the purpose of
@total_ordering is to implement the other rich comparison operators like __le__, as recommended by PEP8.Code Snippets
return ''.join([repr(a), '-', repr(b - 1)]) if a != b else repr(b)return '{}-{}'.format(a, b - 1) if a != b else repr(a)if char in ['(', '{', '[', ')', '<', '}', ']', '>']:if char in '({[)<}]>':if char in '(){}[]<>':Context
StackExchange Code Review Q#91613, answer score: 5
Revisions (0)
No revisions yet.