patternbashModerate
JSON Parsing in Bash
Viewed 0 times
jsonparsingbash
Problem
I have a json file which needs to be restructured.
The following is the code.
Basically, the program is reading a json record for example,
and converting it to the following
The file size is about 4Gb . The code is taking a lot of time (in hours) in processing it. Is there an efficient way to make this faster ?
The following is the code.
while IFS='' read -r line || [[ -n "$line" ]]; do
COUNT=$(( $COUNT + 1 ))
#echo "[$COUNT]"
[ $COUNT -lt 5 ] && continue
sj=`echo $line | jq ._source`
index=`echo $line | jq ._index | tr -d '"'`
itype=`echo $line | jq ._type| tr -d '"'`
echo '{ "index" : { "_index" :"'$index'","_type":"'$itype'"}}' >> bulk_result.bulk
echo $sj >> bulk_result.bulk
#echo "$COUNT lines processed from file $1"
done < "$1"
echo "$COUNT lines processed from file $1"Basically, the program is reading a json record for example,
{"_index":"index1","_type":"rm","_id":"AVPkyS9w","_score":1,"_source":{"timestamp":"2016-04-05T05:00:00","token":"8eb38d14","tag":"logs.rm","message":"CouchbaseConnectSuccess,bucket=srmobjects","logsource":"rm.log","RM_pw":"","component":"rm-01-NFR","RM_un":"","timeEpochMs":1459832400.248,"RM_bucket":"srmobjects","RM_eventName":"CouchbaseConnectSuccess"}}and converting it to the following
{ "index" : { "_index" :"index1","_type":"rm"}}
{ "RM_eventName": "FcgiClose", "timeEpochMs": 1459832435.293, "component": "rm-04-NFR", "logsource": "rm.log", "message": "FcgiClose,requestIndex=0", "tag": "logs.rm", "timestamp": "2016-04-05T05:00:35" }The file size is about 4Gb . The code is taking a lot of time (in hours) in processing it. Is there an efficient way to make this faster ?
Solution
Bash is not well-suited for transforming JSON. But
There are several other issues too with the script. The `
But none of that matters much, as it seems the entire script can be replaced with a single line:
jq is. But calling jq 3 times for each line of input is certainly going to be slow.There are several other issues too with the script. The `
... syntax is obsolete in favor of $(...); the counting can be simplified, or even better, eliminated using tail -n +5; and the repeated bulk_result.bulk` would be good to put in a variable.But none of that matters much, as it seems the entire script can be replaced with a single line:
tail -n +5 "$1" | jq -rc '{index: {_index: ._index, _type: ._type}}, ._source'Code Snippets
tail -n +5 "$1" | jq -rc '{index: {_index: ._index, _type: ._type}}, ._source'Context
StackExchange Code Review Q#129969, answer score: 15
Revisions (0)
No revisions yet.