patternjavascriptCritical
Remove accents/diacritics in a string in JavaScript
Viewed 0 times
accentsremovediacriticsstringjavascript
Problem
How do I remove accentuated characters from a string?
Especially in IE6, I had something like this:
but IE6 bugs me, seems it doesn't like my regular expression.
Especially in IE6, I had something like this:
accentsTidy = function(s){
var r=s.toLowerCase();
r = r.replace(new RegExp(/\s/g),"");
r = r.replace(new RegExp(/[àáâãäå]/g),"a");
r = r.replace(new RegExp(/æ/g),"ae");
r = r.replace(new RegExp(/ç/g),"c");
r = r.replace(new RegExp(/[èéêë]/g),"e");
r = r.replace(new RegExp(/[ìíîï]/g),"i");
r = r.replace(new RegExp(/ñ/g),"n");
r = r.replace(new RegExp(/[òóôõö]/g),"o");
r = r.replace(new RegExp(/œ/g),"oe");
r = r.replace(new RegExp(/[ùúûü]/g),"u");
r = r.replace(new RegExp(/[ýÿ]/g),"y");
r = r.replace(new RegExp(/\W/g),"");
return r;
};but IE6 bugs me, seems it doesn't like my regular expression.
Solution
With ES2015/ES6 String.prototype.normalize(),
Note: use
Two things are happening here:
As of 2021, one can also use Unicode property escapes:
See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
const str = "Crème Brûlée"
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
> "Creme Brulee"Note: use
NFKD if you want things like \uFB01(fi) normalized (to fi).Two things are happening here:
normalize()ing toNFDUnicode normal form decomposes combined graphemes into the combination of simple ones. TheèofCrèmeends up expressed ase+̀.
- Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.
As of 2021, one can also use Unicode property escapes:
str.normalize("NFD").replace(/\p{Diacritic}/gu, "")See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
const c = new Intl.Collator();
["creme brulee", "crème brûlée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
[ 'crame brulai', 'creme brulay', 'creme bruléa', 'creme brulee', 'crème brûlée', 'creme brulfé', 'crome brouillé']
["crème brûlée", "crame brulai", "creme brulee", "crexe brulee", "crome brouillé"].sort()
[ 'crame brulai', 'creme brulee', 'crexe brulee', 'crome brouillé', 'crème brûlée']
["crème brûlée", "crame brulai", "creme brulee", "crexe brulee", "crome brouillé"].sort((a,b) => a.localeCompare(b))
[ 'crame brulai', 'creme brulee', 'crème brûlée', 'crexe brulee', 'crome brouillé']Code Snippets
const str = "Crème Brûlée"
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
> "Creme Brulee"str.normalize("NFD").replace(/\p{Diacritic}/gu, "")const c = new Intl.Collator();
["creme brulee", "crème brûlée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
[ 'crame brulai', 'creme brulay', 'creme bruléa', 'creme brulee', 'crème brûlée', 'creme brulfé', 'crome brouillé']
["crème brûlée", "crame brulai", "creme brulee", "crexe brulee", "crome brouillé"].sort()
[ 'crame brulai', 'creme brulee', 'crexe brulee', 'crome brouillé', 'crème brûlée']
["crème brûlée", "crame brulai", "creme brulee", "crexe brulee", "crome brouillé"].sort((a,b) => a.localeCompare(b))
[ 'crame brulai', 'creme brulee', 'crème brûlée', 'crexe brulee', 'crome brouillé']Context
Stack Overflow Q#990904, score: 2096
Revisions (0)
No revisions yet.