HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavascriptTip

Implementing partial search matching in JavaScript

Submitted by: @import:30-seconds-of-code··
0
Viewed 0 times
javascriptimplementingpartialsearchmatching

Problem

In the previous article, we discussed how to implement TF-IDF and use an inverted index to optimize document searching. However, in most modern search systems, partial matching is also pretty common. Let's take a look at how to implement partial matching on top of the existing TF-IDF and inverted index implementation.
> [!IMPORTANT]
>
> If you haven't read the previous article, or are unfamiliar with the techniques discussed, I strongly recommend you do so before continuing. The code in this article builds on that article, and it will be much easier to understand if you are familiar with them.
Partial matching is a technique that allows you to search for documents that contain a substring of the search term. It's most often employed in search pages where the results are populated while you type. You may, for example, start typing _"JavaScript"_ and get results as soon a few characters have been typed (e.g. _"JavaSc"_).

Solution

const prefixMatches = (term, doc) => {
  const regex = new RegExp(`^${term}`, 'i');
  return doc.filter(term => regex.test(term));
};

const terms = [
  'javascript', 'program', 'languag', 'us', 'in', 'program', 'web',
];

prefixMatches('pro', terms); // ['program', 'program']
prefixMatches('langu', terms); // ['languag']
prefixMatches('java', terms); // ['javascript']


>
> If you haven't read the previous article, or are unfamiliar with the techniques discussed, I strongly recommend you do so before continuing. The code in this article builds on that article, and it will be much easier to understand if you are familiar with them.
Partial matching is a technique that allows you to search for documents that contain a substring of the search term. It's most often employed in search pages where the results are populated while you type. You may, for example, start typing _"JavaScript"_ and get results as soon a few characters have been typed (e.g. _"JavaSc"_).
As users type from the start of the word, we can use a simple prefix search. This can be very easily implemented with a regular expression.
As you may notice in the last example, partial matching doesn't always produce the expected result. If you were searching for Java, you'd get JavaScript documents, too. This can't necessarily be fixed by simple prefix matching, but you may employ some logic to decide whether to check for partial matches or not.
We'll have to add some conditions to check for partial matches. The first one of them is trivially simple: we need only partially match the last term in the search query. This matches how users naturally type, so it makes sense.

Code Snippets

const prefixMatches = (term, doc) => {
  const regex = new RegExp(`^${term}`, 'i');
  return doc.filter(term => regex.test(term));
};

const terms = [
  'javascript', 'program', 'languag', 'us', 'in', 'program', 'web',
];

prefixMatches('pro', terms); // ['program', 'program']
prefixMatches('langu', terms); // ['languag']
prefixMatches('java', terms); // ['javascript']
> [!NOTE]
>
> These **heuristics** seem to work well in my **limited experience**. Take them with a pinch of salt, as you may need to tweak them to find the best solution for your use case.

As you can see, this solution ensures that if a term is found in the inverted index, it won't be partially matched to avoid false positive. However, we've only managed to find the relevant document. We now have to score them.

## TF-IDF of partial matches

Luckily, **scoring partial matches** is pretty simple. While there are various ways to go about it, I think the most straightforward approach is to use the same **TF-IDF formula** as before, but with a few modifications.

A partial match can result in **more than one term**. This means we can't simply use the term frequency (**TF**) of the term in the document, like before. However, we can **sum the frequencies** of all the terms that matched. This will give us a score that is proportional to the number of terms that matched, which is what we want.

Similarly, the inverse document frequency (**IDF**) can be calculated in the same way as before. The only difference is that we need to **divide the number of documents by the number of documents** that contain any of the terms that matched, matching the calculation we did for the term frequency.

Context

From 30-seconds-of-code: partial-search-matching

Revisions (0)

No revisions yet.