snippetjavascriptTip
Split a JavaScript string into words
Viewed 0 times
javascriptwordsintosplitstring
Problem
<baseline-support featureId="intl-segmenter">
</baseline-support>
Up until a few years ago, the go-to method for splitting a string into words was
Using the
A segment is an object with a handful of properties. The ones that are interesting for this task are
</baseline-support>
Up until a few years ago, the go-to method for splitting a string into words was
String.prototype.split(). While it can still work just fine, it's a bit of a hassle to get right, especially for longer bodies of text. Yet, JavaScript has come up with a simpler way that takes care of all the nuances for us - Intl.Segmenter.Using the
Intl.Segmenter() constructor, we can create a segmenter for a given locale, with a specific granularity. In this case, we want to split a string into words, so we'll use the word granularity. Then, we can use the Intl.Segmenter.prototype.segment() method to split the string into segments.A segment is an object with a handful of properties. The ones that are interesting for this task are
segment and isWordLike. The former is the actual segment, while the latter is a boolean indicating whether the segment is word-like or not. This allows us to easily filter out non-word segments.Solution
const splitIntoWords = (str, locale) =>
[...new Intl.Segmenter(locale, { granularity: 'word' }).segment(str)].reduce(
(acc, { segment, isWordLike }) => {
if (isWordLike) acc.push(segment);
return acc;
},
[]
);
splitIntoWords('I love javaScript!!', 'en-US');
// ['I', 'love', 'javaScript']
splitIntoWords('python, javaScript & coffee', 'en-US');
// ['python', 'javaScript', 'coffee']Up until a few years ago, the go-to method for splitting a string into words was
String.prototype.split(). While it can still work just fine, it's a bit of a hassle to get right, especially for longer bodies of text. Yet, JavaScript has come up with a simpler way that takes care of all the nuances for us - Intl.Segmenter.Using the
Intl.Segmenter() constructor, we can create a segmenter for a given locale, with a specific granularity. In this case, we want to split a string into words, so we'll use the word granularity. Then, we can use the Intl.Segmenter.prototype.segment() method to split the string into segments.A segment is an object with a handful of properties. The ones that are interesting for this task are
segment and isWordLike. The former is the actual segment, while the latter is a boolean indicating whether the segment is word-like or not. This allows us to easily filter out non-word segments.Putting everything together, we can create a function that splits a string into words, using the
Intl.Segmenter API.> [!NOTE]
>
Code Snippets
const splitIntoWords = (str, locale) =>
[...new Intl.Segmenter(locale, { granularity: 'word' }).segment(str)].reduce(
(acc, { segment, isWordLike }) => {
if (isWordLike) acc.push(segment);
return acc;
},
[]
);
splitIntoWords('I love javaScript!!', 'en-US');
// ['I', 'love', 'javaScript']
splitIntoWords('python, javaScript & coffee', 'en-US');
// ['python', 'javaScript', 'coffee']Context
From 30-seconds-of-code: string-to-words
Revisions (0)
No revisions yet.