snippetjavascriptTip
Implementing partial search matching in JavaScript
Viewed 0 times
javascriptimplementingpartialsearchmatching
Problem
In the previous article, we discussed how to implement TF-IDF and use an inverted index to optimize document searching. However, in most modern search systems, partial matching is also pretty common. Let's take a look at how to implement partial matching on top of the existing TF-IDF and inverted index implementation.
> [!IMPORTANT]
>
> If you haven't read the previous article, or are unfamiliar with the techniques discussed, I strongly recommend you do so before continuing. The code in this article builds on that article, and it will be much easier to understand if you are familiar with them.
Partial matching is a technique that allows you to search for documents that contain a substring of the search term. It's most often employed in search pages where the results are populated while you type. You may, for example, start typing _"JavaScript"_ and get results as soon a few characters have been typed (e.g. _"JavaSc"_).
> [!IMPORTANT]
>
> If you haven't read the previous article, or are unfamiliar with the techniques discussed, I strongly recommend you do so before continuing. The code in this article builds on that article, and it will be much easier to understand if you are familiar with them.
Partial matching is a technique that allows you to search for documents that contain a substring of the search term. It's most often employed in search pages where the results are populated while you type. You may, for example, start typing _"JavaScript"_ and get results as soon a few characters have been typed (e.g. _"JavaSc"_).
Solution
const prefixMatches = (term, doc) => {
const regex = new RegExp(`^${term}`, 'i');
return doc.filter(term => regex.test(term));
};
const terms = [
'javascript', 'program', 'languag', 'us', 'in', 'program', 'web',
];
prefixMatches('pro', terms); // ['program', 'program']
prefixMatches('langu', terms); // ['languag']
prefixMatches('java', terms); // ['javascript']>
> If you haven't read the previous article, or are unfamiliar with the techniques discussed, I strongly recommend you do so before continuing. The code in this article builds on that article, and it will be much easier to understand if you are familiar with them.
Partial matching is a technique that allows you to search for documents that contain a substring of the search term. It's most often employed in search pages where the results are populated while you type. You may, for example, start typing _"JavaScript"_ and get results as soon a few characters have been typed (e.g. _"JavaSc"_).
As users type from the start of the word, we can use a simple prefix search. This can be very easily implemented with a regular expression.
As you may notice in the last example, partial matching doesn't always produce the expected result. If you were searching for
Java, you'd get JavaScript documents, too. This can't necessarily be fixed by simple prefix matching, but you may employ some logic to decide whether to check for partial matches or not.We'll have to add some conditions to check for partial matches. The first one of them is trivially simple: we need only partially match the last term in the search query. This matches how users naturally type, so it makes sense.
Code Snippets
const prefixMatches = (term, doc) => {
const regex = new RegExp(`^${term}`, 'i');
return doc.filter(term => regex.test(term));
};
const terms = [
'javascript', 'program', 'languag', 'us', 'in', 'program', 'web',
];
prefixMatches('pro', terms); // ['program', 'program']
prefixMatches('langu', terms); // ['languag']
prefixMatches('java', terms); // ['javascript']> [!NOTE]
>
> These **heuristics** seem to work well in my **limited experience**. Take them with a pinch of salt, as you may need to tweak them to find the best solution for your use case.
As you can see, this solution ensures that if a term is found in the inverted index, it won't be partially matched to avoid false positive. However, we've only managed to find the relevant document. We now have to score them.
## TF-IDF of partial matches
Luckily, **scoring partial matches** is pretty simple. While there are various ways to go about it, I think the most straightforward approach is to use the same **TF-IDF formula** as before, but with a few modifications.
A partial match can result in **more than one term**. This means we can't simply use the term frequency (**TF**) of the term in the document, like before. However, we can **sum the frequencies** of all the terms that matched. This will give us a score that is proportional to the number of terms that matched, which is what we want.
Similarly, the inverse document frequency (**IDF**) can be calculated in the same way as before. The only difference is that we need to **divide the number of documents by the number of documents** that contain any of the terms that matched, matching the calculation we did for the term frequency.Context
From 30-seconds-of-code: partial-search-matching
Revisions (0)
No revisions yet.