patternjavaMinor
XML parsing in Java using SAX parser
Viewed 0 times
saxparserjavaxmlparsingusing
Problem
I have started learning XML parsing in Java (currently reading Java & XML by Brett McLaughlin). Here is my first program to parse XML using SAX parser. Please review the code and let me know if that is correct way of parsing books.xml and storing into to a
books.xml: (only partial xml is presented)
Content Handler:
```
public class MyFirstContentHandler implements ContentHandler {
private List books = new ArrayList();
private Book book = null;
private String elementName = null;
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
String value = new String(ch, start, length);
if(null != elementName) {
switch(elementName) {
case "author":
book.setAuthor(value);
elementName = null;
break;
case "title":
book.setTitle(value);
elementName = null;
break;
case "genre":
book.setGenre(value);
elementName = null;
break;
case "price":
book.setPrice(Double.parseDouble(value));
elementName = null;
break;
case "publish_date":
DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");
List (The code is working fine. I just want to ensure I'm doing the right way, as per industry standards).books.xml: (only partial xml is presented)
Gambardella, Matthew
XML Developer's Guide
Computer
44.95
2000-10-01
An in-depth look at creating applications
with XML.
Ralls, Kim
Midnight Rain
Fantasy
5.95
2000-12-16
A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.
Content Handler:
```
public class MyFirstContentHandler implements ContentHandler {
private List books = new ArrayList();
private Book book = null;
private String elementName = null;
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
String value = new String(ch, start, length);
if(null != elementName) {
switch(elementName) {
case "author":
book.setAuthor(value);
elementName = null;
break;
case "title":
book.setTitle(value);
elementName = null;
break;
case "genre":
book.setGenre(value);
elementName = null;
break;
case "price":
book.setPrice(Double.parseDouble(value));
elementName = null;
break;
case "publish_date":
DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");
Solution
Your code is, in general well structured, and neat.
You have a bunch of auto-generated content, like comments, and TODO items. You should remove those to indicate they are handled.
As for the sax parsing, you have done pretty well, though there are some bugs and issues you have to address.... and it all boils down to the
The characters method can be called multiple times inside any element. Typically the SAX parser reads chunks of data, and, if the chunk ends half-way through the text of an element, you may end up with one
This makes it very hard to put decision logic inside the characters method like you have done.
Instead, you should put the logic you want inside the
Consider something like:
That sets up a set of data tags. Now, we create an instance StringBuilder... to cache the characters when they come in:
Then, we use that Stringbuilder to cache the characters, and append them... but we are smart about resetting it when needed.....
That makes the characters easy to handle... now, the startElement is also easy...
As an aside, note how I made the check
OK, so, the
You have a bunch of auto-generated content, like comments, and TODO items. You should remove those to indicate they are handled.
As for the sax parsing, you have done pretty well, though there are some bugs and issues you have to address.... and it all boils down to the
characters method.The characters method can be called multiple times inside any element. Typically the SAX parser reads chunks of data, and, if the chunk ends half-way through the text of an element, you may end up with one
characters call for the last part of one chunk, and another for the first part of the next chunk.This makes it very hard to put decision logic inside the characters method like you have done.
Instead, you should put the logic you want inside the
startElement() and endElement() methods, and do simple String concatenation inside the characters method.Consider something like:
private static final String[] dataTags = {"author", "title", "genre", "price", "publish_date", "description"};
private static Set dataTagSet = new HashSet<>(Arrays.asList(dataTags));That sets up a set of data tags. Now, we create an instance StringBuilder... to cache the characters when they come in:
private final StringBuilder characterCache = new StringBuilder(256);Then, we use that Stringbuilder to cache the characters, and append them... but we are smart about resetting it when needed.....
public void characters(char[] ch, int start, int length)
throws SAXException {
characterCache.append(ch, start, length);
}That makes the characters easy to handle... now, the startElement is also easy...
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
if("book".equals(localName)) {
book = new Book(atts.getValue(atts.getLocalName(0)));
}
// every element resets the characterCache.
characterCache.setLength(0);
}As an aside, note how I made the check
if("book".equals(localName)) instead of if(localName.equals("book")) - that's a trick that is an easy habit that prevents null-pointer exceptions. In this case you will never have a null localname, but, if you always code your constant first, you will never have a null reference.OK, so, the
characters and startElement methods are now simpler. Let's put the logic in the endElement (let's use only one dateFormat instance too... it's faster):private final DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(book != null && dataTagSet.contains(localName)) {
// we have a data item for our current book.
String value = characterCache.toString();
switch(localName) {
case "author":
book.setAuthor(value);
break;
case "title":
book.setTitle(value);
break;
case "genre":
book.setGenre(value);
break;
case "price":
book.setPrice(Double.parseDouble(value));
break;
case "publish_date":
try {
book.setPublishDate(dateFormat.parse(value));
} catch (ParseException e) {
e.printStackTrace();
}
break;
case "description":
book.setDescription(value);
break;
}
} else if("book".equals(localName)) {
books.add(book);
book = null;
}
characterCache.setLength(0);
}Code Snippets
private static final String[] dataTags = {"author", "title", "genre", "price", "publish_date", "description"};
private static Set<String> dataTagSet = new HashSet<>(Arrays.asList(dataTags));private final StringBuilder characterCache = new StringBuilder(256);public void characters(char[] ch, int start, int length)
throws SAXException {
characterCache.append(ch, start, length);
}public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
if("book".equals(localName)) {
book = new Book(atts.getValue(atts.getLocalName(0)));
}
// every element resets the characterCache.
characterCache.setLength(0);
}private final DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(book != null && dataTagSet.contains(localName)) {
// we have a data item for our current book.
String value = characterCache.toString();
switch(localName) {
case "author":
book.setAuthor(value);
break;
case "title":
book.setTitle(value);
break;
case "genre":
book.setGenre(value);
break;
case "price":
book.setPrice(Double.parseDouble(value));
break;
case "publish_date":
try {
book.setPublishDate(dateFormat.parse(value));
} catch (ParseException e) {
e.printStackTrace();
}
break;
case "description":
book.setDescription(value);
break;
}
} else if("book".equals(localName)) {
books.add(book);
book = null;
}
characterCache.setLength(0);
}Context
StackExchange Code Review Q#96032, answer score: 5
Revisions (0)
No revisions yet.