HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Make a summary from a larger text-file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filelargertextmakesummaryfrom

Problem

This code makes summaries from larger texts.

I have searched around for an algorithm and found the following:



-
Associate words with their grammatical counterparts. (e.g. "city"
and "cities")

-
Calculate the occurrence of each word in the text.

-
Assign each word with points depending on their popularity.

-
Detect which periods represent the end of a sentence. (e.g "Mr."
does not).

-
Split up the text into individual sentences.

-
Rank sentences by the sum of their words' points.

-
Return X of the most highly ranked sentences in chronological
order.


Based on this, I've written some code.

I'm fairly new to javascript so basically I'd really appreciate tips on the following:

  • Possible improvements on algorithm



  • Coding style



summary.js

```
var stopwords = require('stopwords').english; //https://github.com/huned/node-stopwords
var natural = require('natural'); //https://github.com/NaturalNode/natural

function Summary(text) {

if (typeof text !== 'string') throw new Error("Argument must be a string");

this.text = text
this.sentences = [];
this.sentencesScore = {};
this.wordPoints = {};
}

Summary.prototype.wordFrequency = function () {

return this.text.split(/\s+/)
.filter(word => { return stopwords.indexOf(word.toLowerCase()) { return natural.PorterStemmer.stem(word); })
.reduce(function(map, word) {

map[word] = (map[word] || 0) + 1;
return map;

}, this.wordPoints);
}

Summary.prototype.splitTextIntoSentences = function () {
this.sentences = this.text.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")

return this.sentences;
}

Summary.prototype.rankSentences = function () {
for(var i = 0, count = 0; i { return stopwords.indexOf(word.toLowerCase()) { return natural.PorterStemmer.stem(word); })
.forEach(word => {
cou

Solution

Use a linter

The first thing I do when reviewing JavaScript code is run it through a linter. There several online linters, JSHint for example. JSHint will complain about ES6 features:


'arrow function syntax (=>)' is only available in ES6 (use 'esversion: 6').

To get around this, add this comment line at the top:

// jshint esversion: 6


It's recommended to clear out all the reported warnings.

I will address the more interesting ones in sub-sections below.

"Don't make functions within a loop"

In the rankSentences function, you use lambda expressions in a loop:

for(var i = 0, count = 0; i  { return stopwords.indexOf(word.toLowerCase())  { return natural.PorterStemmer.stem(word); })
            .forEach(word => {
                count += this.wordPoints[word] || 0;
             });

    this.sentencesScore[sentence] = count; 
    count = 0;
}


A lambda expression is a function, so effectively you're redefining the same function repeatedly, in each iteration.
This is akin to computing the same value repeatedly.
The solution is to define the function once before the loop,
which is akin to moving a computation into a local variable before the loop begins:

function isStopWord(word) {
    return stopwords.indexOf(word.toLowerCase())  {
                  count += this.wordPoints[word] || 0;
               });

    this.sentencesScore[sentence] = count; 
    count = 0;
}


"The body of a for in should be wrapped in an if statement to filter unwanted properties from the prototype."

This answer on Stack Overflow explains well the problem and the solution for this warning:

What does the JSLint error 'body of a for in should be wrapped in an if statement' mean?

Don't repeat yourself

Notice that the functions we extracted in isStopWord and setBaseWord are also used in wordFrequency. So these functions could be defined in the global namespace and reused.

You could go even further.
It could be a good idea to move the entire sentence.split(...).filter(...).map(...) to a common function,
as this entire snippet appears twice.

Another example where you repeated yourself is the use of the count variable in the rankSentences function: it's set to 0 in the for system, and then again at the end of the loop body. It would be better to set it to 0 once, at the beginning of the loop body:

for(var i = 0; i  {
                  count += this.wordPoints[word] || 0;
               });

    this.sentencesScore[sentence] = count; 
}


Performance

Note that stopwords.indexOf is an \$O(N)\$ operation.
If performance is important, you could build a map, to make this operation \$O(1)\$, at the expense of using more memory (the storage of the map).

Code Snippets

// jshint esversion: 6
for(var i = 0, count = 0; i < this.sentences.length; i++) {
    var sentence = this.sentences[i];

    sentence.split(/\s+/)
            .filter(word => { return stopwords.indexOf(word.toLowerCase()) < 0; })
            .map(word => { return natural.PorterStemmer.stem(word); })
            .forEach(word => {
                count += this.wordPoints[word] || 0;
             });

    this.sentencesScore[sentence] = count; 
    count = 0;
}
function isStopWord(word) {
    return stopwords.indexOf(word.toLowerCase()) < 0;
}

function setBaseWord(word) {
    return natural.PorterStemmer.stem(word);
}

for(var i = 0, count = 0; i < this.sentences.length; i++) {
    var sentence = this.sentences[i];

    sentence.split(/\s+/)
              .filter(isStopWord)
              .map(setBaseWord)
              .forEach(word => {
                  count += this.wordPoints[word] || 0;
               });

    this.sentencesScore[sentence] = count; 
    count = 0;
}
for(var i = 0; i < this.sentences.length; i++) {
    var sentence = this.sentences[i];
    var count = 0;

    sentence.split(/\s+/)
              .filter(isStopWord)
              .map(setBaseWord)
              .forEach(word => {
                  count += this.wordPoints[word] || 0;
               });

    this.sentencesScore[sentence] = count; 
}

Context

StackExchange Code Review Q#124407, answer score: 7

Revisions (0)

No revisions yet.