patternpythonMinor
Parsing JSON in one go
Viewed 0 times
jsononeparsing
Problem
I need to parse a simple JSON string (flat JSON, no hierarchy) for keys and values, and there is a system constraint that I cannot use any built-in JSON library and can only read a string once due to latency requirements. I need to use Python 2.7.x series and cannot use a higher version.
Currently, my major concern for below code is, I still do a bit more than one pass string parsing, since I need to go backward/forward to remove unnecessary characters around a word, which is what
This is a follow-up to "Json String Parsing"
```
def wordBeautify(word1, beginIndex, endIndex):
noMeaningChars=[' ', '"', '{', ',', ':','}']
while word1[beginIndex] in noMeaningChars:
beginIndex+=1
while word1[endIndex] in noMeaningChars:
endIndex-=1
return (beginIndex,endIndex+1)
def parseElegant2(str1):
keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0
for i in range(len(str1)):
if str1[i]==':':
endIndex=i-1
(x,y) = wordBeautify(str1, beginIndex, endIndex)
keyStr=str1[x:y]
print "key string " + keyStr
beginIndex=i+1
elif str1[i]==',' or str1[i]=='}':
endIndex=i
(x,y) = wordBeautify(str1, beginIndex, endIndex)
valueStr=str1[x:y]
beginIndex=i+1
print "value string " + valueStr
print 'key and value, '+ keyStr, valueStr
def parseElegant(str1):
keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0
for i in range(len(str1)):
if str1[i] == '"' and beginWord==False:
beginWord=True
beginIndex = i+1
elif str1[i] == '"' and beginWord==True:
beginWord=False
endIndex=i
if beginIndex<endIndex:
print "get word, " + str1[beginIndex:endIndex]
elif str1[i]==':':
keyStr=str1[beginIndex:e
Currently, my major concern for below code is, I still do a bit more than one pass string parsing, since I need to go backward/forward to remove unnecessary characters around a word, which is what
wordBeautify does.This is a follow-up to "Json String Parsing"
```
def wordBeautify(word1, beginIndex, endIndex):
noMeaningChars=[' ', '"', '{', ',', ':','}']
while word1[beginIndex] in noMeaningChars:
beginIndex+=1
while word1[endIndex] in noMeaningChars:
endIndex-=1
return (beginIndex,endIndex+1)
def parseElegant2(str1):
keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0
for i in range(len(str1)):
if str1[i]==':':
endIndex=i-1
(x,y) = wordBeautify(str1, beginIndex, endIndex)
keyStr=str1[x:y]
print "key string " + keyStr
beginIndex=i+1
elif str1[i]==',' or str1[i]=='}':
endIndex=i
(x,y) = wordBeautify(str1, beginIndex, endIndex)
valueStr=str1[x:y]
beginIndex=i+1
print "value string " + valueStr
print 'key and value, '+ keyStr, valueStr
def parseElegant(str1):
keyStr=''
valueStr=''
beginWord = False
isKey=True
beginIndex=0
endIndex=0
for i in range(len(str1)):
if str1[i] == '"' and beginWord==False:
beginWord=True
beginIndex = i+1
elif str1[i] == '"' and beginWord==True:
beginWord=False
endIndex=i
if beginIndex<endIndex:
print "get word, " + str1[beginIndex:endIndex]
elif str1[i]==':':
keyStr=str1[beginIndex:e
Solution
First of all, let me say well done on the progress made from your first question to now!
Code Review is for reviewing code
As you seem to be seeking advice "about how to implement" an idea (i.e. streams), I think you may be better off asking on Programmers.SE, since that is a more appropriate place for questions about "software architecture and design" and "algorithm and data structure concepts".
Code Review is "not for questions about broken code, hypothetical code, or non-existent code, as such questions will be closed as off-topic." (taken from this help page).
I can see that you have code already, and are seeking to improve it (in a way that doesn't add any functionality) - so I consider your question to be on-topic. Although your desire to learn about other ways of implementing it entirely may not be best placed on this site.
A Quick Review
Remove old code
You have included the (now unused) code for
Can't you just use
The
Consider the following code:
Result:
Use better variable names!
At present, your
Something more like
You have a similar issue in
Return is better than print
I've noticed that in your function you are not
Is this your goal? Or will your program eventually need to construct some kind of data structure (such as a
I would recommend refactoring your code to return a data structure, and have your function write the JSON values into that data structure.
A word on string stream processing
As @ferada commented, I think you should "consider using a stream, looking only at the (single) next character at a time and then build a state machine around it.", however your current program is not doing this.
Instead, your loop is going through the indices of the string. And is (at present), also calling your
Rather than using
This way, you are essentially forced into only looking at the current character. (i.e. you are unable to increment/decrement any number value to go forwards or backwards in the string.)
Constructing a state machine
From the Wikipedia article about Finite-state machines:
A finite-state machine (FSM) or simply state machine, is a mathematical model of computation used to design computer programs. It is conceived as an abstract machine that can be in one of any finite number of states. The machine is in only one state at a time, called the current state. It can change from one state to another when triggered, this is called a transition. A particular state machine is defined by a list of its states, and the triggering condition for each transition.
(Emphasis mine)
You can read the article for more background information, but essentially your are seeking to construct a state machine. As you can read in the last sentence (that I've bolded), this requires:
What are the states?
Luckily, the official JSON website (JSON.org) actually gives us these exact things!
In the form of railroad diagrams. Specifically, they tell us that:
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
(source: json.org)
So we know that STRING is a state, for example. They also tell us how we get into (and out of) the STRING state:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes...
(source: json.org)
So we know that a STRING state is triggered when we encounter a
Code Review is for reviewing code
As you seem to be seeking advice "about how to implement" an idea (i.e. streams), I think you may be better off asking on Programmers.SE, since that is a more appropriate place for questions about "software architecture and design" and "algorithm and data structure concepts".
Code Review is "not for questions about broken code, hypothetical code, or non-existent code, as such questions will be closed as off-topic." (taken from this help page).
I can see that you have code already, and are seeking to improve it (in a way that doesn't add any functionality) - so I consider your question to be on-topic. Although your desire to learn about other ways of implementing it entirely may not be best placed on this site.
A Quick Review
Remove old code
You have included the (now unused) code for
parseElegant (version 1). While I can understand you not wanting to erase this, you have already linked to your previous question (which contains the old code). Including version 1 here just adds ~24 lines for reviewers to have to scroll through.Can't you just use
strip?The
wordBeautify function (which you are seeking to make redundant/remove) seems to be VERY similar to the Python built-in string.strip method?Consider the following code:
test="{:foo }"
(x,y)=wordBeautify(test,0,len(test)-1)
a=test[x:y]
b=test.strip(' "{,:}')Result:
a and b both work out to be "foo"!Use better variable names!
At present, your
parseElegant2 function takes an argument called str1. While it is obvious that this means something along the lines of the first string - it is not actually particularly helpful in knowing what that variable represents.Something more like
json, text_candidate, or candidate may be better - as it expresses that this is a long text containing json, or something that is a candidate to be processed into JSON.You have a similar issue in
wordBeautify, with word1. This could just be named word, or similar.Return is better than print
I've noticed that in your function you are not
return'ing any values. Instead you are just print'ing to the standard output (STDOUT).Is this your goal? Or will your program eventually need to construct some kind of data structure (such as a
dict) of the JSON values?I would recommend refactoring your code to return a data structure, and have your function write the JSON values into that data structure.
A word on string stream processing
As @ferada commented, I think you should "consider using a stream, looking only at the (single) next character at a time and then build a state machine around it.", however your current program is not doing this.
Instead, your loop is going through the indices of the string. And is (at present), also calling your
wordBeautify function to do the same (with indices).Rather than using
for i in range(len(str1)):, as you currently are - I would recommend that you loop through the characters themselves (one at a time, as @ferada suggested). Thus, your loop may look more like: for c in str1:This way, you are essentially forced into only looking at the current character. (i.e. you are unable to increment/decrement any number value to go forwards or backwards in the string.)
Constructing a state machine
From the Wikipedia article about Finite-state machines:
A finite-state machine (FSM) or simply state machine, is a mathematical model of computation used to design computer programs. It is conceived as an abstract machine that can be in one of any finite number of states. The machine is in only one state at a time, called the current state. It can change from one state to another when triggered, this is called a transition. A particular state machine is defined by a list of its states, and the triggering condition for each transition.
(Emphasis mine)
You can read the article for more background information, but essentially your are seeking to construct a state machine. As you can read in the last sentence (that I've bolded), this requires:
- a list of states
- the trigger conditions for each state transition
What are the states?
Luckily, the official JSON website (JSON.org) actually gives us these exact things!
In the form of railroad diagrams. Specifically, they tell us that:
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
(source: json.org)
So we know that STRING is a state, for example. They also tell us how we get into (and out of) the STRING state:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes...
(source: json.org)
So we know that a STRING state is triggered when we encounter a
" (quote) character. (And thCode Snippets
test="{:foo }"
(x,y)=wordBeautify(test,0,len(test)-1)
a=test[x:y]
b=test.strip(' "{,:}')def parseElegant3(candidate):
READY=0
STRING=1
NUMBER=3
COLON=-1
COMMA=-2
search={'"':STRING, ':':COLON, ',':COMMA, STRING:'"', COMMA:','}
NOP=" {\t}\r\n"
stringIgnore='{,:}'
numbers='0123456789.'
result={}
if not len(candidate):
return result
accumulator=""
mostRecentKey=None
state=READY
for c in candidate:
newState=state
if state is READY:
if c in NOP:continue
if c in numbers:
newState=NUMBER
if c in search.keys():
newState=search[c]
if newState is COLON:
newState=READY
if len(accumulator):
accumulator=accumulator.strip()
result[accumulator]=None
mostRecentKey=accumulator
accumulator=""
else:
state=READY
continue
elif newState is COMMA:
newState=READY
elif newState is NUMBER and not state is NUMBER:
accumulator+=c
if state is STRING and newState is STRING:
if c in stringIgnore:continue
if c==search[STRING]:
newState=READY
if not mostRecentKey is None:
result[mostRecentKey]=accumulator
accumulator=""
mostRecentKey=None
else:
accumulator+=c
elif state is NUMBER:
if c in numbers:
accumulator+=c
else:
if c in NOP:continue
assert c == search[COMMA]
try:
result[mostRecentKey]=int(accumulator)
except ValueError:
result[mostRecentKey]=float(accumulator)
accumulator=""
mostRecentKey=None
newState=READY
state=newState
return result
JSONString = '{ "id": 1, "name": "A green door", "price": 12.50, "tags": "home green"}'
parseElegant3(JSONString)Context
StackExchange Code Review Q#121328, answer score: 7
Revisions (0)
No revisions yet.