HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Simple attribute parser for HTML

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
simpleparserforattributehtml

Problem

I am studying Java and trying to write an HTML parser, which should parse tag names and attributes. I wrote a class (code below) using the State pattern.

This is necessary for my training project, where I currently use JSoup. JSoup is too slow for me though, so I want better performance. Although suggestions about following conventions and best practise are also good. Additionally, comments on the interface / API of my class would be appreciated too.

```
import java.io.BufferedReader;
import java.io.IOException;
import java.util.HashMap;

public class AttributeParser {
public AttributeParser(BufferedReader reader) {
this.reader = reader;
states.put(AttrStat.NAME, new NameState());
states.put(AttrStat.VALUE, new ValueState());
states.put(AttrStat.VALUE_QUOTES, new ValueQuotesState());
states.put(AttrStat.AFTER_NAME, new AfterNameState());
states.put(AttrStat.NEW_ATTR, new NewAttrState());
states.put(AttrStat.NEW_VALUE, new NewValueState());
current = states.get(AttrStat.NEW_ATTR);
}

public String tag() throws IOException {
int ch;
ch = reader.read();
while (ch > 0) {
if (ch == ' 0) && (" >\n\t".indexOf(ch) == -1)) {
reader.mark(1);
tagName.append((char) ch);
if (tagName.toString().equals("!--")) {
break;
}
ch = reader.read();
}
if (ch == '>') {
reader.reset();
}
return tagName.toString();
}
ch = reader.read();
}
return null;
}

public HashMap attribute() throws IOException {
attr = new HashMap<>();
while (current.read(reader.read())) {
//without body
}
addAttribute();
return attr;
}

private void addAttribute() {
if ((name.length() > 0

Solution

I won't tell anything about bugs, but add few comments:

  • I think it would be easier for you to develop parsers if you will implement tokenization logic as a separate part.



  • If your parser accepts BufferedReader parameter, you'd better implement Closeable interface and take care of reader inside close method. That will also give you a possibility to use parser as try-with resource.



  • It's better to accept something more common than BufferedReader (abstract Reader for example).



P.S. If you want really fast implementation and it's not crucial for you to implement your own parser then maybe you should try to use StAX.

Context

StackExchange Code Review Q#71406, answer score: 2

Revisions (0)

No revisions yet.