HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Optimizing simple xHTML parser

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
parsersimpleoptimizingxhtml

Problem

I'm writing a simple xHTML parser which parses a data without nested tags.

Example input code will look like:

Header 
 atsatat  
asdsaads32532235


I've wrote the following code:

```
#include
#include
#include
#include
#include

using namespace v8;

enum Tags {
Closing = 0,
Paragraph,
Heading
};

int tagDetect(char *ptr){
if (*ptr == '/') {
return Tags::Closing;
}

if (*ptr == 'p') {
return Tags::Paragraph;
}

if ((ptr + 1) == '2' || (ptr + 1) == '3')
return Tags::Heading;

return -1;
}

Handle Callback(const Arguments& args) {
HandleScope scope;

if (!args[0]->IsString() || !args[1]->IsFunction()) {
return ThrowException(Exception::TypeError(
String::New("Invalid arguments! First parameter must be [string], second must be a callback [function].")));
}

Local callback = Local::Cast(args[1]);
String::Utf8Value param1(args[0]->ToString());
std::string input = std::string(*param1);

static Persistent data_symbol = NODE_PSYMBOL("data");
static Persistent tag_symbol = NODE_PSYMBOL("tag");

std::string::size_type pos = 0, closePos = 0;

int openPos;
int tagID, lastTag, id = 0, lastAppended = 0, clearStartPos;

Local nodes = Array::New();
Local node_obj;

while ((pos = input.find(' 10 || lastTag != Tags::Paragraph)) {
/ Concat tags /
if (lastAppended == lastTag) {
if (lastTag == Tags::Heading) { / Dont concat heading tags /
continue;
}

node_obj = nodes->Get(id - 1)->ToObject();
node_obj->Set(data_symbol, String::Concat(node_obj->Get(data_symbol)->ToString(), String::New(input.substr(openPos + (lastTag > 1 ? 3 : 2), pos - openPos - (lastTag > 1 ? 3 : 2) - 1).c_str())));
continue;
}

node_obj = Object::New();
node_obj->

Solution

-
Although optional, you can keep your #includes organized (alphabetically, for instance) so that it's easier to locate an individual one or to determine if you've already included a particular one.

-
You've included `, but I don't see any use of stringstreams. If you're not utilizing this library, then remove it.

-
Avoid
using namespace X in global scope as it will expose it to the entire program or file, which can break code in cases such as name-clashing (using the same name for something in which something doesn't belong to the namespace). It could also make it difficult for readers to tell what is part of a particular namespace (I cannot tell what belongs to v8, so I'm having a bit more trouble reviewing this). If you insist on keeping a using, then keep it local, such as in a function.

-
For this
enum:

enum Tags {
    Closing = 0,
    Paragraph,
    Heading
};


You don't need to explicitly set it to
0. This is already the default starting value for enums.

-
You don't need a
char* here:

int tagDetect(char *ptr){


Looking at the calling code, you're passing in an
std::string. You should just pass this object by itself. Also, because it's not being modified, pass it by const& to avoid a needless copy and any accidental modification:

int tagDetect(std::string const& str){


In C++ in general, you should avoid passing raw pointers to functions as you cannot keep track of ownership with them (C++11 smart pointers helps with that for when points are necessary). But in this case, you shouldn't need to deal with pointers.

With this change, you will need to modify the function code (I'll use different names for clarity):

This is accessing the first character:

if (*ptr == 'p')


so use
front()`:

if (str.front() == 'p')


This is accessing the second one:

*(ptr + 1)


so you can use indices to make it more clear (while they are technically the same thing; yours is doing pointer arithmetic):

str[1]


-
I'd keep these on separate lines for readability:

int tagID, lastTag, id = 0, lastAppended = 0, clearStartPos;


They aren't entirely related to each other, and there could be some maintenance issues since some are declared and the others initialized.

int tagID;
int lastTag;
int id = 0;
int lastAppended = 0;
int clearStartPos;


You should also have these declared/initialized as close to their scope as possible. This will help determine if any variable is no longer used (keeping around unused variables will clutter the code). But if you cannot put them any closer, then do have each one on separate lines.

Code Snippets

enum Tags {
    Closing = 0,
    Paragraph,
    Heading
};
int tagDetect(char *ptr){
int tagDetect(std::string const& str){
if (*ptr == 'p')
if (str.front() == 'p')

Context

StackExchange Code Review Q#48275, answer score: 2

Revisions (0)

No revisions yet.