HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMajor

Escape user input for use in JS regex

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
userinputforregexuseescape

Problem

I am building a highlight feature for search results on my website and need to escape the user's input so that I could highlight the matching string within the original text.

The function

function sanitize_for_regex(s){
    var escaped = '';
    for(var i = 0; i < s.length; ++i){
        switch(s[i]){
            case '{':
            case '}':
            case '[':
            case ']':
            case '-':
            case '/':
            case '\\':
            case '(':
            case ')':
            case '*':
            case '+':
            case '?':
            case '.':
            case '^':
            case '

How it's used

var input_from_user = 'Hey + wi$ll thith you?';

var highlighted_text = original_text.replace(new RegExp('('+sanitize_for_regex(input_from_user)+')', 'gi'), '$1');


My test cases show that it's working quite well but I would like to get some feedback from other professionals.

Does this function look like it will escape all injection "attempts"? (I quoted attempts because the user is usually not aware of attempting to break anything)

Could the performance be improved?: case '|': escaped+= '\\'; default: escaped+= s[i]; } } return escaped; }


How it's used

%%CODEBLOCK_1%%

My test cases show that it's working quite well but I would like to get some feedback from other professionals.

Does this function look like it will escape all injection "attempts"? (I quoted attempts because the user is usually not aware of attempting to break anything)

Could the performance be improved?

Solution

I'm not a fan of "sanitizing" data. There isn't a clear definition of what "sanitizing" means, other than that it takes untrustworthy input and somehow makes it valid. It might entail discarding the invalid parts of the input — which is not what you are doing here. Escaping is a clearer term to describe what you are doing.

Repeated string concatenation is considered poor practice for performance. Since JavaScript strings are immutable, it would be better to devise a solution that constructs the string "all at once".

The documentation on developer.mozilla.org suggests the following solution for escaping a regular expression:


Escaping user input to be treated as a literal string within a regular
expression can be accomplished by simple replacement:

function escapeRegExp(string){
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\%%CODEBLOCK_0%%amp;'); // %%CODEBLOCK_0%%amp; means the whole matched string
}


I strongly recommend replacing your custom solution with the standard recipe (including a citation).

Code Snippets

function escapeRegExp(string){
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}

Context

StackExchange Code Review Q#153691, answer score: 24

Revisions (0)

No revisions yet.