HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavascriptTip

Implementing fuzzy matching in JavaScript

Submitted by: @import:30-seconds-of-code··
0
Viewed 0 times
javascriptimplementingmatchingfuzzy

Problem

Last time, we explored prefix search to help match the start of words while the user types. While our search system is pretty robust, it still lacks error tolerance. For example, if a user types javscript instead of javascript, our search system won't return any results. This is the problem fuzzy matching aims to solve.
> [!IMPORTANT]
>
> It's highly recommended that you start with the previous article, or even the one before it, if you haven't already. We'll built on top of past code and concepts in this one, so you'll have an easier time following along if you're familiar with them.
Fuzzy matching is a technique that allows you to search for documents that contain a substring of the search term, even if the substring is not an exact match. This is particularly useful when users make typos or misspellings, which is quite common in the real world.

Solution

export const generateNgrams = (term, n = 3) => {
  const ngrams = [];

  for (let i = 0; i < term.length - n + 1; i++)
    ngrams.push(term.slice(i, i + n));

  return ngrams;
};

generateNgrams('javascript');
// ['jav', 'ava', 'vas', 'asc', 'scr', 'cri', 'rip', 'ipt']


>
> It's highly recommended that you start with the previous article, or even the one before it, if you haven't already. We'll built on top of past code and concepts in this one, so you'll have an easier time following along if you're familiar with them.
Fuzzy matching is a technique that allows you to search for documents that contain a substring of the search term, even if the substring is not an exact match. This is particularly useful when users make typos or misspellings, which is quite common in the real world.
Fuzzy matching can be implemented using various algorithms, but one of the most common approaches is to use n-grams. This algorithm works by breaking down a word into smaller substrings of a fixed length, called n-grams. The most common type of n-gram is the trigram, which consists of three characters.
Implementing a trigram algorithm is relatively simple:
As you can see, all that's needed is our trusty for loop and String.prototype.slice() to extract the n-grams. The n parameter is optional and defaults to 3, but you can change it to any value you want, depending on the use case.

Code Snippets

export const generateNgrams = (term, n = 3) => {
  const ngrams = [];

  for (let i = 0; i < term.length - n + 1; i++)
    ngrams.push(term.slice(i, i + n));

  return ngrams;
};

generateNgrams('javascript');
// ['jav', 'ava', 'vas', 'asc', 'scr', 'cri', 'rip', 'ipt']
We'll also have to slightly adjust the `prefixMatches` function implementation to accommodate the **new data structure**.
Nothing crazy so far, right? We've added a new `ngramInvertedIndex` to store the n-grams, and updated the `addDocument` function to populate it. The `prefixMatches` function now uses the `terms` property of the document object to filter for matches.

## Fuzzy search

Having set up the indexing, we can now implement the fuzzy search. Same as before, we'll rely on the inverted index to match results. We will, however, need to **score both based on the TF-IDF score and n-gram similarity**. _How should we go about it?_

### Scoring fuzzy matches

One method is to use a **weighing parameter** - let's call it `fuzzinness` - to control the balance between the two scores. The higher the value, the more weight is placed on n-gram matches, and the lower the value, the more weight is placed on exact matches. This allows us to control how fuzzy we want our search to be.

Context

From 30-seconds-of-code: n-gram-fuzzy-matching

Revisions (0)

No revisions yet.