HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavascriptTip

Split a JavaScript string into words

Submitted by: @import:30-seconds-of-code··
0
Viewed 0 times
javascriptwordsintosplitstring

Problem

<baseline-support featureId="intl-segmenter">
</baseline-support>
Up until a few years ago, the go-to method for splitting a string into words was String.prototype.split(). While it can still work just fine, it's a bit of a hassle to get right, especially for longer bodies of text. Yet, JavaScript has come up with a simpler way that takes care of all the nuances for us - Intl.Segmenter.
Using the Intl.Segmenter() constructor, we can create a segmenter for a given locale, with a specific granularity. In this case, we want to split a string into words, so we'll use the word granularity. Then, we can use the Intl.Segmenter.prototype.segment() method to split the string into segments.
A segment is an object with a handful of properties. The ones that are interesting for this task are segment and isWordLike. The former is the actual segment, while the latter is a boolean indicating whether the segment is word-like or not. This allows us to easily filter out non-word segments.

Solution

const splitIntoWords = (str, locale) =>
  [...new Intl.Segmenter(locale, { granularity: 'word' }).segment(str)].reduce(
    (acc, { segment, isWordLike }) => {
      if (isWordLike) acc.push(segment);
      return acc;
    },
    []
  );

splitIntoWords('I love javaScript!!', 'en-US');
// ['I', 'love', 'javaScript']
splitIntoWords('python, javaScript & coffee', 'en-US');
// ['python', 'javaScript', 'coffee']


Up until a few years ago, the go-to method for splitting a string into words was String.prototype.split(). While it can still work just fine, it's a bit of a hassle to get right, especially for longer bodies of text. Yet, JavaScript has come up with a simpler way that takes care of all the nuances for us - Intl.Segmenter.
Using the Intl.Segmenter() constructor, we can create a segmenter for a given locale, with a specific granularity. In this case, we want to split a string into words, so we'll use the word granularity. Then, we can use the Intl.Segmenter.prototype.segment() method to split the string into segments.
A segment is an object with a handful of properties. The ones that are interesting for this task are segment and isWordLike. The former is the actual segment, while the latter is a boolean indicating whether the segment is word-like or not. This allows us to easily filter out non-word segments.
Putting everything together, we can create a function that splits a string into words, using the Intl.Segmenter API.
> [!NOTE]
>

Code Snippets

const splitIntoWords = (str, locale) =>
  [...new Intl.Segmenter(locale, { granularity: 'word' }).segment(str)].reduce(
    (acc, { segment, isWordLike }) => {
      if (isWordLike) acc.push(segment);
      return acc;
    },
    []
  );

splitIntoWords('I love javaScript!!', 'en-US');
// ['I', 'love', 'javaScript']
splitIntoWords('python, javaScript & coffee', 'en-US');
// ['python', 'javaScript', 'coffee']

Context

From 30-seconds-of-code: string-to-words

Revisions (0)

No revisions yet.