HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Wikipedia Table Scraper

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scraperwikipediatable

Problem

I created this small script to strip the data out of tables that have hyperlinks as their `` elements. I was hoping to get input on code clarity and possibly simplification, efficiency is not much of an issue for me as there are not any huge tables on Wikipedia.

var arr = $('tbody').children('tr').map(function(idx) {
  return(idx === 0) ?
  {
    rows: $('th', this).map(function(idx) {
      return {
        href: $('a', this).attr('href'),
        title: $('a', this).text()
      }
    }).get()
  } : {
    href: $('th a', this).attr('href'),
    title: $('th a', this).text(),
    rows: $('td', this).map(function(idx) {
      return $(this).text();
    }).get()
  }
}).get();


Here is an example of a table that this scraper can be run on.

Solution

This looks pretty nice, I only have a few nit-picks, really.

What you call "rows" are actually "columns". It's a bit misleading.
The rows are actually the var arr that you're making.

The ternary is a bit confusing.
If clarity is important to you (as you mentioned),
then I think a good old-fashioned if-else would be better.

Lastly, the idx function parameters in the inner map calls are unused, so you could just as well drop them.

Context

StackExchange Code Review Q#94136, answer score: 4

Revisions (0)

No revisions yet.