gotchajavascriptModerate
Using regex to replace strange characters
Viewed 0 times
replacestrangeusingcharactersregex
Problem
After importing some products from a csv strange characters have shown up on the page and it would be too much work to manually go to each product and remove them so I made this script to deploy on that product page and remove them.
My script is working fine but not sure if it could be made better. Was this the best way to go about it?
$(function() {
var p_desc = $(".rte").html();
var re = /\?ÕÌ_|Š|š|Ž|ž|À|Á|Â|Ã|Ä|Å|Æ|Ç|È|É|Ê|Ë|Ì|Í|Î|Ï|Ñ|Ò|Ó|Ô|Õ|Ö|Ø|Ù|Ú|Û|Ü|Ý|Þ|ß|à|á|â|ã|ä|å|æ|ç|è|é|ê|ë|ì|í|î|ï|ð|ñ|ò|ó|ô|õ|ö|ø|ù|ú|û|ý|þ|ÿ|_Œ‚|__|_/g;
var result = p_desc.replace(re, ' ');
var new_p_desc = result.replace(/[^\x00-\x7F]/g, "").replace(/\?/g, '');
$(".rte").html(new_p_desc);
});My script is working fine but not sure if it could be made better. Was this the best way to go about it?
Solution
RegEx Improvements
The regex can be shortened by using case-insensitive match with
After removing lowercase characters regex will be as below
Here's live demo of regex
The regex can be further improved by using character class which will make the matches faster than OR conditions
Adding
Here's the demo on RegEx101, without
These demos are created only to show difference when
Note that the
Method Chaining
As
This is equivalent to
Replacing HTML
jQuery
The code can be written as
Complete Code
With above changes, the code will be
The regex can be shortened by using case-insensitive match with
i flag. We can remove the characters which are added as both lowercase and uppercase in the regex.After removing lowercase characters regex will be as below
\?ÕÌ_|Š|Ž|À|Á|Â|Ã|Ä|Å|Æ|Ç|È|É|Ê|Ë|Ì|Í|Î|Ï|Ñ|Ò|Ó|Ô|Õ|Ö|Ø|Ù|Ú|Û|Ü|Ý|Þ|ß|ð|ÿ|_Œ‚|__|_Here's live demo of regex
The regex can be further improved by using character class which will make the matches faster than OR conditions
\?ÕÌ_|_Œ‚|[ŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞßðÿ_]+Adding
+ quantifier also has positive effect on the number of steps taken to match characters when the characters in the character class are consecutive/adjacent to each other.Here's the demo on RegEx101, without
+ quantifierScreenshot and with + quantifierscreenshot applied on the same data. Note that in these demos, PHP is selected as the steps taken to match is not shown for JavaScript. Also, the regex is different, it also contains lowercase counterparts of those special characters as i flag is not working with PHP and don't want to apply u(Unicode) flag as it is not supported in JavaScript.These demos are created only to show difference when
+ is applied on character class. The effect should be similar in JavaScript.Note that the
__(two underscores) are redundant as _ is already added in character class and with g flag it'll remove all occurrences. Method Chaining
As
replace returns a string, any other string method can be called on it. Multiple calls to replace can be chained.str.replace(someRegexOrString, someString)
.replace(someOtherRegexOrString, someOtherString);This is equivalent to
var temp = str.replace(someRegexOrString, someString);
var result = temp.replace(someOtherRegexOrString, someOtherString);Replacing HTML
jQuery
html() accepts a function which will receive the current innerHTML of the element on which the method is called as parameter and replaces the returned content to the element.The code can be written as
$('.rte').html(function(index, currentHTML) {
return doSomeOperationOn(currentHTML);
});Complete Code
With above changes, the code will be
$(document).ready(function() {
var regex = /\?ÕÌ_|_Œ‚|[ŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞßðÿ_]+/gi;
$('.rte').html(function(i, oldHTML) {
return oldHTML.replace(regex, ' ')
.replace(/[^\x00-\x7F]|\?/g, '');
});
});$(document).ready(function() { is more readable than $(function() {. So, you may also consider using more expressive form.Code Snippets
\?ÕÌ_|Š|Ž|À|Á|Â|Ã|Ä|Å|Æ|Ç|È|É|Ê|Ë|Ì|Í|Î|Ï|Ñ|Ò|Ó|Ô|Õ|Ö|Ø|Ù|Ú|Û|Ü|Ý|Þ|ß|ð|ÿ|_Œ‚|__|_\?ÕÌ_|_Œ‚|[ŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞßðÿ_]+str.replace(someRegexOrString, someString)
.replace(someOtherRegexOrString, someOtherString);var temp = str.replace(someRegexOrString, someString);
var result = temp.replace(someOtherRegexOrString, someOtherString);$('.rte').html(function(index, currentHTML) {
return doSomeOperationOn(currentHTML);
});Context
StackExchange Code Review Q#150438, answer score: 11
Revisions (0)
No revisions yet.