patternpythonMinor
Excel-formula-analysis state machine
Viewed 0 times
excelanalysisstatemachineformula
Problem
This is one of my first useful scripts. I run it on a highlighted text selection within Notepad++, but I've provided a sample formula and commented out the Scintilla-specific references.
I'm a rank amateur: Am I using good idiom? Is my approach to a state machine sensible?
```
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
EVENTS = 11
STATES = 7
state_table = [[[0,0] for x in range(EVENTS)] for x in range(STATES)]
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table[0] = [[1,0], [0,1], [5,0], [0,3], [0,4], [0,5], [2,0], [0,0], [6,0], [0,0], [0,0]] # normal state
state_table[1] = [[0,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0]] # double-quote comment
state_table[2] = [[2,0], [2,0], [2,0], [2,0], [2,0], [2,0], [3,0], [0,0], [2,0], [2,0], [2,0]] # inside bracketed table reference
state_table[3] = [[3,0], [3,0], [3,0], [3,0], [3,0], [3,0], [4,0], [2,0], [3,0], [3,0], [3,0]] # inside double-bracketed table reference
state_table[4] = [[4,0], [4,0], [4,0], [4,0], [4,0], [4,0], [-1,0], [3,0], [4,0], [4,0], [4,0]] # inside triple-bracketed table reference (I don't think this exists)
state_table[5] = [[1,2], [0,2], [5,2], [0,0], [0,2], [0,5], [2,2], [0,2], [0,2], [0,2], [0,2]] # found left-paren; only wrap and insert if not empty, like =row()
state_table[6] = [[6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [0,0], [6,0]] # inside curly-braced array
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
I'm a rank amateur: Am I using good idiom? Is my approach to a state machine sensible?
```
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
EVENTS = 11
STATES = 7
state_table = [[[0,0] for x in range(EVENTS)] for x in range(STATES)]
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table[0] = [[1,0], [0,1], [5,0], [0,3], [0,4], [0,5], [2,0], [0,0], [6,0], [0,0], [0,0]] # normal state
state_table[1] = [[0,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0]] # double-quote comment
state_table[2] = [[2,0], [2,0], [2,0], [2,0], [2,0], [2,0], [3,0], [0,0], [2,0], [2,0], [2,0]] # inside bracketed table reference
state_table[3] = [[3,0], [3,0], [3,0], [3,0], [3,0], [3,0], [4,0], [2,0], [3,0], [3,0], [3,0]] # inside double-bracketed table reference
state_table[4] = [[4,0], [4,0], [4,0], [4,0], [4,0], [4,0], [-1,0], [3,0], [4,0], [4,0], [4,0]] # inside triple-bracketed table reference (I don't think this exists)
state_table[5] = [[1,2], [0,2], [5,2], [0,0], [0,2], [0,5], [2,2], [0,2], [0,2], [0,2], [0,2]] # found left-paren; only wrap and insert if not empty, like =row()
state_table[6] = [[6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [0,0], [6,0]] # inside curly-braced array
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
Solution
You can use list unpacking to transform :
into the more concise :
. Also, it shows that the type you should be using to convey
The pythonic way to loop is not to use
becomes :
Remove whatever code is not needed.
is roughly the same as nothing. If you want to ensure the value is correct, you can use assert.
From PEP 8 :
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
Remove duplicated logic :
and
are quite similar and probably can be factorised out.
Do not repeat yourself (bis) : you do not need to define the state table twice. Just do it once and for all and everything should be ok :
Final result :
This is what my code is like at the end. I am too lazy to change the string concatenations for a better solution. Also, I'm still not quite happy with the multiple
```
#!/usr/bin/python
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# de
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]into the more concise :
current_state, takeaction = state_table[current_state][a]. Also, it shows that the type you should be using to convey
current_state and takeaction should probably be something more like a tuple than like a list.The pythonic way to loop is not to use
range and len but just to go through your iterable with for i in my_iterable:. If you do need the index, enumerate is what you need.for a in range(len(event_list)):
if z in event_list[a]:
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]becomes :
for i,event in enumerate(event_list):
if z in event:
current_state, takeaction = state_table[current_state][i]
breakRemove whatever code is not needed.
elif takeaction == 5: # strip whitespace from the input outside of quotes
pass
else:
pass # should raise error, since the state table includes more actions than are defineis roughly the same as nothing. If you want to ensure the value is correct, you can use assert.
From PEP 8 :
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.
Remove duplicated logic :
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]and
takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any eventare quite similar and probably can be factorised out.
Do not repeat yourself (bis) : you do not need to define the state table twice. Just do it once and for all and everything should be ok :
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]Final result :
This is what my code is like at the end. I am too lazy to change the string concatenations for a better solution. Also, I'm still not quite happy with the multiple
if statements but I have nothing better to suggest at the moment. ```
#!/usr/bin/python
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# de
Code Snippets
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]current_state, takeaction = state_table[current_state][a]for a in range(len(event_list)):
if z in event_list[a]:
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]for i,event in enumerate(event_list):
if z in event:
current_state, takeaction = state_table[current_state][i]
breakelif takeaction == 5: # strip whitespace from the input outside of quotes
pass
else:
pass # should raise error, since the state table includes more actions than are defineContext
StackExchange Code Review Q#44699, answer score: 7
Revisions (0)
No revisions yet.