patternjavascriptMajor
Faster JavaScript fuzzy string matching function?
Viewed 0 times
javascriptfunctionfasterstringmatchingfuzzy
Problem
I'm using the following function to fuzzy match strings:
Example:
It's very slow, though. How can I optimize that function?
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};Example:
fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //falseIt's very slow, though. How can I optimize that function?
Solution
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};-
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce already exists, and it's called join:pattern = pattern.split("").join(".*")-
The regex itself can be optimised:
. initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });-
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};-
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RegExp("^"+str.replace(/./g, function(x){
return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
})+"$");
});
return function(str, pattern){
return cache(str).test(pattern)
});
})();Concerning the last regex:
Given some pattern
"ace", the regex you build (/a.c.e/) tests if the string contains the characters of the pattern in the string, in the correct order.If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order:
/^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.Code Snippets
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};pattern = pattern.split("").join(".*")pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RegExp("^"+str.replace(/./g, function(x){
return /[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/.test(x) ? "\\"+x+"?" : x+"?";
})+"$");
});
return function(str, pattern){
return cache(str).test(pattern)
});
})();Context
StackExchange Code Review Q#23899, answer score: 39
Revisions (0)
No revisions yet.