HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Craigslist search-across-regions script

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptsearchcraigslistacrossregions

Problem

I'm a JavaScript developer. I'm pretty sure that will be immediately apparent in the below code if for no other reason than the level/depth of chaining that I'm comfortable with. However, I'm learning Ruby, and so I'd love to write beautiful Ruby code too. My simple first project is a Craigslist search-across-regions script.

The full code is on GitHub, but broken down into snippets with questions below.

def get_us_regions()
  # Accumulator for building up the returned object.
  results = {}

  # Important URLs
  sitelist = URI('http://www.craigslist.org/about/sites')
  geospatial = URI('http://www.craigslist.org/about/areas.json')

  # Get a collection of nodes for the US regions out of craigslist's site list.
  usregions =  Nokogiri::HTML(Net::HTTP.get(sitelist)).search("a[name=US]").first().parent().next_element().search('a')

  # Parse out the information to build a usable representation.
  usregions.each { |usregion|
    hostname = usregion.attr('href').gsub('http://','').gsub('.craigslist.org','')
    results[hostname] = { name: usregion.content, state: usregion.parent().parent().previous_element().content }
  }

  # Merge that information with craigslist's geographic information.
  areas = JSON.parse(Net::HTTP.get(geospatial))
  areas.each { |area|
    if results[area["hostname"]]
      results[area["hostname"]][:stateabbrev] = area["region"]
      results[area["hostname"]][:latitude] = area["lat"]
      results[area["hostname"]][:longitude] = area["lon"]
    end
  }

  # This is a complete list of the US regions, keyed off of their hostname.
  return results
end


Bootstrapping

  • How should I get procedural information I need to get started?



  • Does this change if I were doing this on bootstrap for a long-running application and wanted to refresh, say, monthly?



  • Should I shove bootstrapping this into some really abstract class?



This Isn't JS

  • How should I chain calls to class methods?



  • Why hashes with string keys versus the weird named key th

Solution

Long, long question. I'll take your first snippet and leave others to tackle the rest. First some comments about your code:

-
def get_us_regions(): It's not idiomatic to puts those () on methods without arguments.

-
first(): It's not idiomatic to write them on calls without arguments either.

-
results = {}: I've already written a lot about this subject in CR, so I'll just give the link: functional programming with Ruby.

-
# Important URLs: Not sure if important enough to deserve a comment :-)

-
Nokogiri::HTML(...) ... long expression. Expressions can be chained without end, you must decide when to break and give meaningful names. I'd broke it down at least into two subexpressions.

-
gsub('http://','').gsub('.craigslist.org',''): Use module URI instead of performing manual manipulation of URLs.

-
results[area["hostname"]][:stateabbrev]: Again, a functional approach on this kind of expressions make them more concise and clear.

-
return results: Explicit returns are not idiomatic.

-
def get_us_regions. When a method is so trivial to make it configurable, do so, here give the country as argument -> def get_regions(country_code).

Now how I'd write the method. First I'd use Facets, an excellent library with lots of cool abstractions not present in the core:

require 'uri'
require 'nokogiri'
require 'net/http'
require 'json'
require 'facets'

module CraigsList
  SitelistUrl = 'http://www.craigslist.org/about/sites'
  GeospatialUrl = 'http://www.craigslist.org/about/areas.json'

  # Return hash of pairs (hostname, {:name, :state, :stateabbr, :latitude, :longitude})
  # for US regions in craigslist.
  def self.get_regions(country_code)
    doc = Nokogiri::HTML(Net::HTTP.get(URI(SitelistUrl)))
    usregions = doc.search("a[name=#{country_code}]").first.parent.next_element.search('a')
    state_info = usregions.mash do |usregion|
      hostname = URI.parse(usregion.attr('href')).host.split(".").first
      state = usregion.parent.parent.previous_element.content
      info = {name: usregion.content, state: state}
      [hostname, info] 
    end

    areas = JSON.parse(Net::HTTP.get(URI(GeospatialUrl)))
    geo_info = areas.slice(*state_info.keys).mash do |area|
      info = {stateabbrev: area["region"], latitude: area["lat"], longitude: area["lon"]}
      [area["hostname"], info] 
    end

    state_info.deep_merge(geo_info)
  end
end


You mention that you write Javascript code. The good thing about a functional approach is that code is (syntactic differences aside) identical in any language (if it has minimal functional capabilities). In JS (although FP style is more friendly with Coffeescript) and underscore (+ custom abstractions as mixins) you could write the same code.

Code Snippets

require 'uri'
require 'nokogiri'
require 'net/http'
require 'json'
require 'facets'

module CraigsList
  SitelistUrl = 'http://www.craigslist.org/about/sites'
  GeospatialUrl = 'http://www.craigslist.org/about/areas.json'

  # Return hash of pairs (hostname, {:name, :state, :stateabbr, :latitude, :longitude})
  # for US regions in craigslist.
  def self.get_regions(country_code)
    doc = Nokogiri::HTML(Net::HTTP.get(URI(SitelistUrl)))
    usregions = doc.search("a[name=#{country_code}]").first.parent.next_element.search('a')
    state_info = usregions.mash do |usregion|
      hostname = URI.parse(usregion.attr('href')).host.split(".").first
      state = usregion.parent.parent.previous_element.content
      info = {name: usregion.content, state: state}
      [hostname, info] 
    end

    areas = JSON.parse(Net::HTTP.get(URI(GeospatialUrl)))
    geo_info = areas.slice(*state_info.keys).mash do |area|
      info = {stateabbrev: area["region"], latitude: area["lat"], longitude: area["lon"]}
      [area["hostname"], info] 
    end

    state_info.deep_merge(geo_info)
  end
end

Context

StackExchange Code Review Q#27832, answer score: 2

Revisions (0)

No revisions yet.