patternjavascriptMinor

RSS parser for Node.JS

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

parsernodeforrss

Problem

I would like someone to review this code and tell if it can be done better, or if the code is elegant and simple enough for the task at hand.

I used code climate and I got 4/4 and my test coverage is 96%, yet I would like a professional opinion about it.

```
'use strict';

var cheerio = require('cheerio');
var FeedParser = require('feedparser');
var http = require('http');
var PromisePolyfill = require('promise');
var urlEncode = require('urlencode');

/**
* Extracts a valid url from the RSS Feed Item, taking into account exception urls
* @param {string} content - the item to be analyzed
* @returns {string} - the valid url inside the item
*/
function getValidURL(content) {
var $ = cheerio.load(content),
responseUrl;

$('a').each(function(i, e) {
if ( $(e).attr('href').indexOf('reddit.com') === -1 &&
$(e).attr('href').indexOf('imgur.com') === -1 ) {
responseUrl = $(e).attr('href');
}
});
return responseUrl;
}

/**
* Parses a RSS stream into an Object
* @param {string} rssUrl - RSS url to fetch the stream
* @returns {Object} - the Object with meta and item information
*/
exports.parseRss = function(rssUrl) {
var responseObject = {
title: '',
link: '',
image: '',
items: []
};

return new PromisePolyfill(function(resolve, reject) {
http.get(rssUrl, function(resGet) {
resGet.pipe(new FeedParser({}))
.on('error', function(error) {
reject(error.message);
})
.on('meta', function(meta) {
responseObject.title = meta.title;
responseObject.link = meta.link;
})
.on('readable', function() {
var stream = this,
item,
validUrl = '';

while ((item = stream.read())) {
validUrl = getValidURL(item.description);

Solution

I admit I haven't used cheerio before, but it looks like a good tool! The code looks decent - it appears that it will resolve and reject the promise when appropriate.

I see a couple places that can be simplified. For example:

$('a').each(function(i, e) {
    if ( $(e).attr('href').indexOf('reddit.com') === -1 &&
      $(e).attr('href').indexOf('imgur.com') === -1 ) {
        responseUrl = $(e).attr('href');
    }
});

Instead of selecting all anchor tags and then checking to see if the href attribute doesn't contain one of two strings, those conditions could be used to only select such anchor tags. This can be achieved using the :not() selector with an attribute selector *=:

const links = $('a:not([href*="reddit.com"]):not([href*="imgur.com"])');
if (links.length) {
    responseUrl = links.attr('href');
}

When I tested this with the sample data there was only one such anchor tag, but if there happen to be multiple, should responseUrl be set to the first, last or other?

The only other simplification I see is this:

on('end', function() {
   resolve(responseObject);
});

can be simplified to:

on('end', resolve.bind(null, responseObject));

using a partially applied function.

Code Snippets

$('a').each(function(i, e) {
    if ( $(e).attr('href').indexOf('reddit.com') === -1 &&
      $(e).attr('href').indexOf('imgur.com') === -1 ) {
        responseUrl = $(e).attr('href');
    }
});

const links = $('a:not([href*="reddit.com"]):not([href*="imgur.com"])');
if (links.length) {
    responseUrl = links.attr('href');
}

on('end', function() {
   resolve(responseObject);
});

on('end', resolve.bind(null, responseObject));

Context

StackExchange Code Review Q#91361, answer score: 2

Revisions (0)

No revisions yet.