HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Filtering Pinboard API results by tags

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
tagspinboardresultsapifiltering

Problem

I've got a simple sinatra webapp that pulls data from Pinboard's RSS api, parses the info, and re-displays it. There are 4 tasks I need to perform with the data:

  • I need to remove all instances of a specific tag from all items (in this case, want)



  • I need to create a list of all unique tags across all the items returned



  • I need to dynamically remove certain tags from the list if they are already being used to filter the list



  • I need to add an attribute to each item that is the host of the url associated with that item (i.e. amazon.com for whatever long amazon url may have been added)



  • I need to return a list of unique locations across all the items returned



Here is the way I am currently handling this:

items = []
tags = []
locations = []
begin
  JSON.parse(data).each do |item|
    # Remove the tag want from the list of tags,
    # strip any extra whitespace, and add a location attribute
    item['t'].each { |t| t.strip! }
    item['t'].delete_if { |tag| tag == 'want' }

    # Try like hell to parse the url. Assign an error string as a last resort
    begin
      item['l'] = /https?:\/\/(?:[-\w\d]*\.)?([-\w\d]*\.[a-zA-Z]{2,3}(?:\.[a-zA-Z]{2})?)/i.match(item['u'])[1]
    rescue
      item['l'] = "URL Parse error"
    end

    # Add the item's tags
    item['t'].each do |tag|
      tags << tag unless tags.include? tag or filter_tags.include? tag
    end

    locations << item['l'] unless locations.include? item['l']

    items << item
  end
rescue
end
return items, tags.sort, locations.sort


I really don't like this. It just doesn't feel clean to me, what with the creation of the empty arrays, and adding stuff in, etc. But my thought was that it's better to do a single iteration over the data returned than to do multiple loops. But now I am considering this instead:

```
items = JSON.parse(data)
# Remove the want tag from all items
items.each { |i| i['t'].delete_if { |t| t == 'want' }}

#generate a list of tags
tags = items.map { |i| i['t']}.fla

Solution

The second version is undoubtedly better, more readable, even if you iterate multiple times over the data.

The problem is: both versions, specially the first, suffer from a common problem: imperative style ("do this, do that"). Are you familiar with the difference between imperative (focus on statements and state change) and functional programming (focus on expressions and immutable data)? it's easy, just try to code without in-place updates (no each, delete_if, bang! methods, ...) and you'll see that you'll end up writing less code, easier to test, to debug and to understand. You'll pay a bit of memory/speed performance, but it's usually worth it.

Now, using a functional approach and taking advantage of existing libraries (uri), that's what I'd write (some details may not be right, but you get the idea):

require 'uri'

def parse(data, filter_tags)
  items, nested_tags, nested_locations = JSON.parse(data).map do |item|
    location = URI.parse(item["l"]).host rescue "URL Parse error"
    tags = item["t"].map(&:strip) - filter_tags
    [item, tags, location]
  end.transpose

  [items, nested_tags.flatten(1).uniq.sort, nested_locations.flatten(1).uniq.sort]
end


More on functional programming with Ruby: http://code.google.com/p/tokland/wiki/RubyFunctionalProgramming

Code Snippets

require 'uri'

def parse(data, filter_tags)
  items, nested_tags, nested_locations = JSON.parse(data).map do |item|
    location = URI.parse(item["l"]).host rescue "URL Parse error"
    tags = item["t"].map(&:strip) - filter_tags
    [item, tags, location]
  end.transpose

  [items, nested_tags.flatten(1).uniq.sort, nested_locations.flatten(1).uniq.sort]
end

Context

StackExchange Code Review Q#14283, answer score: 4

Revisions (0)

No revisions yet.