HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Parsing JSON in one go

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
jsononeparsing

Problem

I need to parse a simple JSON string (flat JSON, no hierarchy) for keys and values, and there is a system constraint that I cannot use any built-in JSON library and can only read a string once due to latency requirements. I need to use Python 2.7.x series and cannot use a higher version.

Currently, my major concern for below code is, I still do a bit more than one pass string parsing, since I need to go backward/forward to remove unnecessary characters around a word, which is what wordBeautify does.

This is a follow-up to "Json String Parsing"

```
def wordBeautify(word1, beginIndex, endIndex):
noMeaningChars=[' ', '"', '{', ',', ':','}']
while word1[beginIndex] in noMeaningChars:
beginIndex+=1
while word1[endIndex] in noMeaningChars:
endIndex-=1
return (beginIndex,endIndex+1)

def parseElegant2(str1):

keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0

for i in range(len(str1)):

if str1[i]==':':
endIndex=i-1
(x,y) = wordBeautify(str1, beginIndex, endIndex)
keyStr=str1[x:y]
print "key string " + keyStr
beginIndex=i+1
elif str1[i]==',' or str1[i]=='}':
endIndex=i
(x,y) = wordBeautify(str1, beginIndex, endIndex)
valueStr=str1[x:y]
beginIndex=i+1
print "value string " + valueStr
print 'key and value, '+ keyStr, valueStr

def parseElegant(str1):

keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0

for i in range(len(str1)):

if str1[i] == '"' and beginWord==False:
beginWord=True
beginIndex = i+1
elif str1[i] == '"' and beginWord==True:
beginWord=False
endIndex=i
if beginIndex<endIndex:
print "get word, " + str1[beginIndex:endIndex]
elif str1[i]==':':
keyStr=str1[beginIndex:e

Solution

First of all, let me say well done on the progress made from your first question to now!

Code Review is for reviewing code

As you seem to be seeking advice "about how to implement" an idea (i.e. streams), I think you may be better off asking on Programmers.SE, since that is a more appropriate place for questions about "software architecture and design" and "algorithm and data structure concepts".
Code Review is "not for questions about broken code, hypothetical code, or non-existent code, as such questions will be closed as off-topic." (taken from this help page).

I can see that you have code already, and are seeking to improve it (in a way that doesn't add any functionality) - so I consider your question to be on-topic. Although your desire to learn about other ways of implementing it entirely may not be best placed on this site.

A Quick Review

 
Remove old code

You have included the (now unused) code for parseElegant (version 1). While I can understand you not wanting to erase this, you have already linked to your previous question (which contains the old code). Including version 1 here just adds ~24 lines for reviewers to have to scroll through.

 
Can't you just use strip?

The wordBeautify function (which you are seeking to make redundant/remove) seems to be VERY similar to the Python built-in string.strip method?

Consider the following code:

test="{:foo }"
(x,y)=wordBeautify(test,0,len(test)-1)
a=test[x:y]
b=test.strip(' "{,:}')


Result:  a and b both work out to be "foo"!

 

 
Use better variable names!

At present, your parseElegant2 function takes an argument called str1. While it is obvious that this means something along the lines of the first string - it is not actually particularly helpful in knowing what that variable represents.

Something more like json, text_candidate, or candidate may be better - as it expresses that this is a long text containing json, or something that is a candidate to be processed into JSON.

You have a similar issue in wordBeautify, with word1. This could just be named word, or similar.

 
Return is better than print

I've noticed that in your function you are not return'ing any values. Instead you are just print'ing to the standard output (STDOUT).
Is this your goal? Or will your program eventually need to construct some kind of data structure (such as a dict) of the JSON values?

I would recommend refactoring your code to return a data structure, and have your function write the JSON values into that data structure.

A word on string stream processing

As @ferada commented, I think you should "consider using a stream, looking only at the (single) next character at a time and then build a state machine around it.", however your current program is not doing this.

Instead, your loop is going through the indices of the string. And is (at present), also calling your wordBeautify function to do the same (with indices).

Rather than using for i in range(len(str1)):, as you currently are - I would recommend that you loop through the characters themselves (one at a time, as @ferada suggested). Thus, your loop may look more like: for c in str1:

This way, you are essentially forced into only looking at the current character. (i.e. you are unable to increment/decrement any number value to go forwards or backwards in the string.)

Constructing a state machine

From the Wikipedia article about Finite-state machines:


A finite-state machine (FSM) or simply state machine, is a mathematical model of computation used to design computer programs. It is conceived as an abstract machine that can be in one of any finite number of states. The machine is in only one state at a time, called the current state. It can change from one state to another when triggered, this is called a transition. A particular state machine is defined by a list of its states, and the triggering condition for each transition.


(Emphasis mine)

You can read the article for more background information, but essentially your are seeking to construct a state machine. As you can read in the last sentence (that I've bolded), this requires:

  • a list of states



  • the trigger conditions for each state transition



What are the states?

Luckily, the official JSON website (JSON.org) actually gives us these exact things!
In the form of railroad diagrams. Specifically, they tell us that:


An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).




(source: json.org)

So we know that STRING is a state, for example. They also tell us how we get into (and out of) the STRING state:


A string is a sequence of zero or more Unicode characters, wrapped in double quotes...




(source: json.org)

So we know that a STRING state is triggered when we encounter a " (quote) character. (And th

Code Snippets

test="{:foo }"
(x,y)=wordBeautify(test,0,len(test)-1)
a=test[x:y]
b=test.strip(' "{,:}')
def parseElegant3(candidate):
    READY=0
    STRING=1
    NUMBER=3
    COLON=-1
    COMMA=-2

    search={'"':STRING,  ':':COLON,  ',':COMMA,  STRING:'"',  COMMA:','}
    NOP=" {\t}\r\n"
    stringIgnore='{,:}'
    numbers='0123456789.'

    result={}
    if not len(candidate):
        return result
    accumulator=""
    mostRecentKey=None

    state=READY
    for c in candidate:
        newState=state
        if state is READY:
            if c in NOP:continue
            if c in numbers:
                newState=NUMBER
            if c in search.keys():
                newState=search[c]
        if newState is COLON:
            newState=READY
            if len(accumulator):
                accumulator=accumulator.strip()
                result[accumulator]=None
                mostRecentKey=accumulator
                accumulator=""
            else:
                state=READY
                continue
        elif newState is COMMA:
            newState=READY
        elif newState is NUMBER and not state is NUMBER:
            accumulator+=c
        if state is STRING and newState is STRING:
            if c in stringIgnore:continue
            if c==search[STRING]:
                newState=READY
                if not mostRecentKey is None:
                    result[mostRecentKey]=accumulator
                    accumulator=""
                    mostRecentKey=None
            else:
                accumulator+=c
        elif state is NUMBER:
            if c in numbers:
                accumulator+=c
            else:
                if c in NOP:continue
                assert c == search[COMMA]
                try:
                    result[mostRecentKey]=int(accumulator)
                except ValueError:
                    result[mostRecentKey]=float(accumulator)
                accumulator=""
                mostRecentKey=None
                newState=READY
        state=newState
    return result

JSONString = '{ "id": 1, "name": "A green door", "price": 12.50, "tags": "home green"}'

parseElegant3(JSONString)

Context

StackExchange Code Review Q#121328, answer score: 7

Revisions (0)

No revisions yet.