patternjavaMinor
Parsing huge data coming from a URL
Viewed 0 times
hugeparsingfromdatacomingurl
Problem
I need to parse the data coming from the URL which looks like this:
After parsing the data I need to store each datacenter data in a
For example, the Key of
which is for primary.
Similarly another
For example, the Key of
which is for secondary.
And lastly, one more map for
For example, the Key of
which is for
And all the above map will have data for its datacenter as I have three datacenter in the above example. So each each map will have three data. And also I will parse the above response only when
This code takes more than 200 ms to parse the data and store it in its corresponding
hasProcess=true
version=1
DATACENTER=abc
TotalNumberOfServers:4
primary:{0=1, 1=2, 2=1, 3=2, 4=1, 5=2, 6=1, 7=2, 8=1, 9=2, 10=1, 11=2, 12=1, 13=2}
secondary:{0=0, 1=0, 2=0, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0, 10=0, 11=1, 12=0, 13=0}
hosttomachine:{3=machineA, 2=machineB, 1=machineC, 4=machineD}
DATACENTER=pqr
TotalNumberOfServers:2
primary:{0=1, 1=2, 2=1, 3=2, 4=1, 5=2, 6=1, 7=2, 8=1, 9=2, 10=1, 11=2, 12=1, 13=2, 14=1}
secondary:{0=0, 1=0, 2=0, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0, 10=0, 11=1, 12=0, 13=0, 14=0}
hosttomachine:{1=machineP, 4=machineQ}
DATACENTER=tuv
TotalNumberOfServers:0
primary:{}
secondary:{}
hosttomachine:{}After parsing the data I need to store each datacenter data in a
Map like this:HashMap> primaryDataFor example, the Key of
primaryData is abc and value is:{0=1, 1=2, 2=1, 3=2, 4=1, 5=2, 6=1, 7=2, 8=1, 9=2, 10=1, 11=2, 12=1, 13=2}which is for primary.
Similarly another
Map for secondary for each datacenter:HashMap> secondaryDataFor example, the Key of
secondaryData is abc and value is:{0=0, 1=0, 2=0, 3=1, 4=0, 5=0, 6=0, 7=1, 8=0, 9=0, 10=0, 11=1, 12=0, 13=0}which is for secondary.
And lastly, one more map for
hosttomachine mapping for each datacenter: HashMap> hostMachineMapping -For example, the Key of
hostMachineMapping is abc and value is:{3=machineA, 2=machineB, 1=machineC, 4=machineD}which is for
hosttomachine.And all the above map will have data for its datacenter as I have three datacenter in the above example. So each each map will have three data. And also I will parse the above response only when
hasProcess is equal to true. If it is not true, then I won't parse anything.This code takes more than 200 ms to parse the data and store it in its corresponding
HashMap. Is there any way to parse the above data efficientSolution
I started to write a comment, but it became too long.
Questions
Potential slowness causes
I guess your
for splitting and use split limit of 2. Probably faster could be using Guava's
Even better would be piece-wise processing instead of splitting.
Note also that
Review
As Mat's Mug said, this is wrong. I'm using Guava's
This makes it fail-fast rather than hiding the problem to bite you later.
This is pretty confusing by squeezing something else between the two
Summary
Overall, it's not bad. When speed is important, I'd go for something like
It's just an idea (damn similar to
An example method could look like
where
Filling the maps
Now, I'd probably move all the functionality into
The parser would have fields like
so that I could write
As an example see this simple method
I ignored quite some details like parsing numbers and I can't tell how many parsing methods you'll need, but basically it's all pretty simple. Just skip what's fixed and process what you want while looking for a terminator.
It could get a bit more complicated if you needed to be error-tolerant, like allowing multiple spaces, but then you could let some of your methods skip over them. Or, if it gets difficult, use regexes. But I don't think it gets complicated.
Note that I'm unsure why your parser is slow. There's a lot of splitting and this is the probable cause, but it's just a guess.
Questions
- 200 ms is pretty close to eternity, how long is your input?
- Can you measure how long
parseResponsealone takes?
- Could you post your data? Someone may try harder to optimize.
Potential slowness causes
I guess your
String.split could be the culprit (with more than one character a regex gets created; you should useprivate static final Pattern COMMA_BLANK_PATTERN = Pattern.compile(", ");for splitting and use split limit of 2. Probably faster could be using Guava's
private static final Splitter COMMA_BLANK_SPLITTER = Splitter.on(", ");Even better would be piece-wise processing instead of splitting.
Note also that
Integer.parseInt is internationalized and therefore slightly slower than necessary. Probably unimportant.Review
if (response != null) {As Mat's Mug said, this is wrong. I'm using Guava's
Preconditions with a static import writing simplycheckNotNull(response);This makes it fail-fast rather than hiding the problem to bite you later.
String tableString = map.split(":")[1];
Map table = new HashMap();
tableString = tableString.substring(1, tableString.length() - 1);This is pretty confusing by squeezing something else between the two
tableString defining lines.Summary
Overall, it's not bad. When speed is important, I'd go for something like
MyParser parser = new MyParser(response).skip("hasProcess=");
boolean hasProcess = parser.readBoolean();
parser.skip("\nversion=");
int version = parser.readInt();
while (parser.skipIfLookingAt("\nDATACENTER=")) {
parser.parseDatacenter(parser, whatever...);
}It's just an idea (damn similar to
java.util.Scanner, which isn't exactly known for speed), but I guess that string splitting is the problem (simply as I can't see anything else).An example method could look like
MyParser skip(String prefix) {
for (int i=0; i<prefix.length(); ++i) {
if (content.charAt(index++) != prefix.charAt(i)) {
throw ...
}
}
}where
content and index are instance variables. This part doesn't need any substring creation at all (and gets a bit unreadable because of this). Warning: Doing something like content = content.substring(index) (so you could use startsWith) would make it clearer but terribly slow as you'd copy a big part of the string a lot of times.Filling the maps
Now, I'd probably move all the functionality into
MyParser, so that the parsing would be justprivate void parseResponse(String response) {
new MyParser(response).parse();
}The parser would have fields like
private final Map> primaryData;so that I could write
void parseDatacenter() {
String datacenter = readTill('\n');
skip("\n\tTotalNumberOfServers:");
int totalNumberOfServers = readInt();
skip("\n\tprimary{");
primaryData.put(datacenter, readMap());
skip("\n\t");
}
Map readMap() {
Map result = new HashMap<>();
while (true) {
String key = readTillAndSkip('=');
String value = readTillOneOf("},");
map.put(key, value);
if (lookingAt('}') {
break;
}
skip(", ");
}
return result;
}As an example see this simple method
String readTillOneOf(String terminators) {
int start = index;
while (terminators.indexOf(content.charAt(index)) > -1) {
++index;
}
return content.substring(start, index);
}I ignored quite some details like parsing numbers and I can't tell how many parsing methods you'll need, but basically it's all pretty simple. Just skip what's fixed and process what you want while looking for a terminator.
It could get a bit more complicated if you needed to be error-tolerant, like allowing multiple spaces, but then you could let some of your methods skip over them. Or, if it gets difficult, use regexes. But I don't think it gets complicated.
Note that I'm unsure why your parser is slow. There's a lot of splitting and this is the probable cause, but it's just a guess.
Code Snippets
private static final Pattern COMMA_BLANK_PATTERN = Pattern.compile(", ");private static final Splitter COMMA_BLANK_SPLITTER = Splitter.on(", ");if (response != null) {checkNotNull(response);String tableString = map.split(":")[1];
Map<Integer, String> table = new HashMap<Integer, String>();
tableString = tableString.substring(1, tableString.length() - 1);Context
StackExchange Code Review Q#94390, answer score: 6
Revisions (0)
No revisions yet.