HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Syntax highlighter for HTML and PHP

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
phpsyntaxhighlighterforandhtml

Problem

I've created a JavaScript application to highlight the syntax of HTML and PHP. I know a lot of syntax highlighter are available nowadays, I just created to extend my knowledge in JS and regular expressions. I only wanted to know if its the right way to do this. (The code below works fine.)

js/codeHighlighter.js

function codeHighlighter(){
    var obj=document.getElementsByTagName("code");
    for(var i=0;i<$1>");
        data=data.replace(/"(.*?)"/g,""$1"");
        data=data.replace(/<\?(.*?)\s/g,"<?$1");
        data=data.replace(/\s\?>/g,"?>");
        data=data.replace(/\/\* (.*?) \*\//g,"/* $1 */");                   
        data=data.replace(/(new|echo|print|while|for|foreach|class|public|function|static|protected|private|return|required|required_once|include|include_once)[^=]/g," $1 ");
        data=data.replace(/\\n/g,"");
        data=data.replace(/\\t/g,"     ");
        obj[i].innerHTML=data;
    }
}


index.html


    
        Code Highlighter
        
        
        
        
            window.addEventListener("load", codeHighlighter);
        
        
            code{
              font-family: arial;
            }
            .html-tag{
              color:#090;
            }
            .string-value{
              color:#900;
            }
            .reserved{
              color:#009;
            }
            .php-tag{
              color:#f00;
            }
            .comment{
              color:#444;
            }
          
    
    
        This application highlights `php` and `html` code.
        
            /* A sample code. */\n
            <div class="code" >\n
            \t Hello!\n
            </div>\n
            <?php\n
            class Anish(){\n
            \n
            \t public function __construct(){\n
            \t\t return "Hello";\n
            \t }\n
            \n
            }\n
            $anish=new Anish();\n
            echo $anish;\n
            ?>\n
        
    

Solution

Really accurate highlghting is a big challenge, and even if your implementation is not totally falsy it's very incomplete.

First some obvious points, easy to correct:

  • your PHP reserved words list lacks a number of words, such as (not exhaustively) global, const, if, else, switch, case, default, do, exit, break, continue try, catch, finally, ...



  • you look for PHP multiline comments like /.../ but not for simple line ones like //..., nor for HTML comments `.



  • you look for double-quoted strings "..." but not for single-quoted ones '...'.



Now some harder issues:

  • you currently don't take care of escaped (single or double) quotes in a quoted string: "quote \" inside quoted string" breaks the highlighting.



  • you don't look for numbers (integer or float)



Lastly, not a lack but might be improved: you don't distinct between HTML tags and their attributes.

Please note that this is not to criticize your work! At the opposite, since you said it is:


to extend my knowledge in JS and regular expressions

I hope it encourages you to rise to the challenge :)

EDIT. Two points I forgot to mention above.

The first one comes from a preliminary general advice: try to follow best practices, notably in that, instead of:

for(var i=0;i<obj.length;i++){


you should write:

for (var i = 0; i < obj.length; i++) {


and so on evrywhere...

So here is the point since when looking for reserved words you wrote this:

data = data.replace(/(new|echo|...|include|include_once)[^=]/g, ...


There you added
[^=] to avoid selecting something like $new=....

Right but regarding the above advice you must realize that one may have written
$new = ... instead. Then you'll select new as a reserved word, while it's not!

So actually you'd better looking for a prepended
$ rather than an appended =:

data = data.replace(/[^\$(new|echo|...|include|include_once)/g, ...


The other point is only for convenience: currently you force
tab` to be arbitrarily replaced by 4 spaces, which may sometimes be undesired. So you might merely write something like this:

function codeHighlighter(tab) {
  tab = tab ? tab : 4;
  for (var i = 0; i < obj.length; i++) {
    ...
    data=data.replace(/\\t/g, repeat(" ", tab);
    ...
  }
}

Code Snippets

for(var i=0;i<obj.length;i++){
for (var i = 0; i < obj.length; i++) {
data = data.replace(/(new|echo|...|include|include_once)[^=]/g, ...
data = data.replace(/[^\$(new|echo|...|include|include_once)/g, ...
function codeHighlighter(tab) {
  tab = tab ? tab : 4;
  for (var i = 0; i < obj.length; i++) {
    ...
    data=data.replace(/\\t/g, repeat("&nbsp;", tab);
    ...
  }
}

Context

StackExchange Code Review Q#119116, answer score: 4

Revisions (0)

No revisions yet.