HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Excel-formula-analysis state machine

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
excelanalysisstatemachineformula

Problem

This is one of my first useful scripts. I run it on a highlighted text selection within Notepad++, but I've provided a sample formula and commented out the Scintilla-specific references.

I'm a rank amateur: Am I using good idiom? Is my approach to a state machine sensible?

```
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'

# initialize the state table
EVENTS = 11
STATES = 7
state_table = [[[0,0] for x in range(EVENTS)] for x in range(STATES)]

# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table[0] = [[1,0], [0,1], [5,0], [0,3], [0,4], [0,5], [2,0], [0,0], [6,0], [0,0], [0,0]] # normal state
state_table[1] = [[0,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0]] # double-quote comment
state_table[2] = [[2,0], [2,0], [2,0], [2,0], [2,0], [2,0], [3,0], [0,0], [2,0], [2,0], [2,0]] # inside bracketed table reference
state_table[3] = [[3,0], [3,0], [3,0], [3,0], [3,0], [3,0], [4,0], [2,0], [3,0], [3,0], [3,0]] # inside double-bracketed table reference
state_table[4] = [[4,0], [4,0], [4,0], [4,0], [4,0], [4,0], [-1,0], [3,0], [4,0], [4,0], [4,0]] # inside triple-bracketed table reference (I don't think this exists)
state_table[5] = [[1,2], [0,2], [5,2], [0,0], [0,2], [0,5], [2,2], [0,2], [0,2], [0,2], [0,2]] # found left-paren; only wrap and insert if not empty, like =row()
state_table[6] = [[6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [0,0], [6,0]] # inside curly-braced array

# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""

# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"

for z in input_text:

Solution

You can use list unpacking to transform :

takeaction = state_table[current_state][a][1]
        current_state = state_table[current_state][a][0]


into the more concise :

current_state, takeaction = state_table[current_state][a]


. Also, it shows that the type you should be using to convey current_state and takeaction should probably be something more like a tuple than like a list.

The pythonic way to loop is not to use range and len but just to go through your iterable with for i in my_iterable:. If you do need the index, enumerate is what you need.

for a in range(len(event_list)):
        if z in event_list[a]:
            takeaction = state_table[current_state][a][1]
            current_state = state_table[current_state][a][0]


becomes :

for i,event in enumerate(event_list):
    if z in event:
        current_state, takeaction = state_table[current_state][i]
        break


Remove whatever code is not needed.

elif takeaction == 5: # strip whitespace from the input outside of quotes
        pass
    else:
        pass # should raise error, since the state table includes more actions than are define


is roughly the same as nothing. If you want to ensure the value is correct, you can use assert.

From PEP 8 :


For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a += b or a =
a + b. This optimization is fragile even in CPython (it only works for
some types) and isn't present at all in implementations that don't use
refcounting. In performance sensitive parts of the library, the
''.join() form should be used instead. This will ensure that
concatenation occurs in linear time across various implementations.

Remove duplicated logic :

takeaction = state_table[current_state][a][1]
        current_state = state_table[current_state][a][0]


and

takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
    current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any event


are quite similar and probably can be factorised out.

Do not repeat yourself (bis) : you do not need to define the state table twice. Just do it once and for all and everything should be ok :

state_table = [
    [(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
    [(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
    [(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
    [(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
    [(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
    [(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
    [(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]


Final result :

This is what my code is like at the end. I am too lazy to change the string concatenations for a better solution. Also, I'm still not quite happy with the multiple if statements but I have nothing better to suggest at the moment.

```
#!/usr/bin/python

# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'

# initialize the state table
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]

# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""

# de

Code Snippets

takeaction = state_table[current_state][a][1]
        current_state = state_table[current_state][a][0]
current_state, takeaction = state_table[current_state][a]
for a in range(len(event_list)):
        if z in event_list[a]:
            takeaction = state_table[current_state][a][1]
            current_state = state_table[current_state][a][0]
for i,event in enumerate(event_list):
    if z in event:
        current_state, takeaction = state_table[current_state][i]
        break
elif takeaction == 5: # strip whitespace from the input outside of quotes
        pass
    else:
        pass # should raise error, since the state table includes more actions than are define

Context

StackExchange Code Review Q#44699, answer score: 7

Revisions (0)

No revisions yet.