patternrubyMinor
Craigslist search-across-regions script
Viewed 0 times
scriptsearchcraigslistacrossregions
Problem
I'm a JavaScript developer. I'm pretty sure that will be immediately apparent in the below code if for no other reason than the level/depth of chaining that I'm comfortable with. However, I'm learning Ruby, and so I'd love to write beautiful Ruby code too. My simple first project is a Craigslist search-across-regions script.
The full code is on GitHub, but broken down into snippets with questions below.
Bootstrapping
This Isn't JS
The full code is on GitHub, but broken down into snippets with questions below.
def get_us_regions()
# Accumulator for building up the returned object.
results = {}
# Important URLs
sitelist = URI('http://www.craigslist.org/about/sites')
geospatial = URI('http://www.craigslist.org/about/areas.json')
# Get a collection of nodes for the US regions out of craigslist's site list.
usregions = Nokogiri::HTML(Net::HTTP.get(sitelist)).search("a[name=US]").first().parent().next_element().search('a')
# Parse out the information to build a usable representation.
usregions.each { |usregion|
hostname = usregion.attr('href').gsub('http://','').gsub('.craigslist.org','')
results[hostname] = { name: usregion.content, state: usregion.parent().parent().previous_element().content }
}
# Merge that information with craigslist's geographic information.
areas = JSON.parse(Net::HTTP.get(geospatial))
areas.each { |area|
if results[area["hostname"]]
results[area["hostname"]][:stateabbrev] = area["region"]
results[area["hostname"]][:latitude] = area["lat"]
results[area["hostname"]][:longitude] = area["lon"]
end
}
# This is a complete list of the US regions, keyed off of their hostname.
return results
endBootstrapping
- How should I get procedural information I need to get started?
- Does this change if I were doing this on bootstrap for a long-running application and wanted to refresh, say, monthly?
- Should I shove bootstrapping this into some really abstract class?
This Isn't JS
- How should I chain calls to class methods?
- Why hashes with string keys versus the weird named key th
Solution
Long, long question. I'll take your first snippet and leave others to tackle the rest. First some comments about your code:
-
-
-
-
-
-
-
-
-
Now how I'd write the method. First I'd use Facets, an excellent library with lots of cool abstractions not present in the core:
You mention that you write Javascript code. The good thing about a functional approach is that code is (syntactic differences aside) identical in any language (if it has minimal functional capabilities). In JS (although FP style is more friendly with Coffeescript) and underscore (+ custom abstractions as mixins) you could write the same code.
-
def get_us_regions(): It's not idiomatic to puts those () on methods without arguments.-
first(): It's not idiomatic to write them on calls without arguments either.-
results = {}: I've already written a lot about this subject in CR, so I'll just give the link: functional programming with Ruby.-
# Important URLs: Not sure if important enough to deserve a comment :-)-
Nokogiri::HTML(...) ... long expression. Expressions can be chained without end, you must decide when to break and give meaningful names. I'd broke it down at least into two subexpressions.-
gsub('http://','').gsub('.craigslist.org',''): Use module URI instead of performing manual manipulation of URLs.-
results[area["hostname"]][:stateabbrev]: Again, a functional approach on this kind of expressions make them more concise and clear.-
return results: Explicit returns are not idiomatic.-
def get_us_regions. When a method is so trivial to make it configurable, do so, here give the country as argument -> def get_regions(country_code).Now how I'd write the method. First I'd use Facets, an excellent library with lots of cool abstractions not present in the core:
require 'uri'
require 'nokogiri'
require 'net/http'
require 'json'
require 'facets'
module CraigsList
SitelistUrl = 'http://www.craigslist.org/about/sites'
GeospatialUrl = 'http://www.craigslist.org/about/areas.json'
# Return hash of pairs (hostname, {:name, :state, :stateabbr, :latitude, :longitude})
# for US regions in craigslist.
def self.get_regions(country_code)
doc = Nokogiri::HTML(Net::HTTP.get(URI(SitelistUrl)))
usregions = doc.search("a[name=#{country_code}]").first.parent.next_element.search('a')
state_info = usregions.mash do |usregion|
hostname = URI.parse(usregion.attr('href')).host.split(".").first
state = usregion.parent.parent.previous_element.content
info = {name: usregion.content, state: state}
[hostname, info]
end
areas = JSON.parse(Net::HTTP.get(URI(GeospatialUrl)))
geo_info = areas.slice(*state_info.keys).mash do |area|
info = {stateabbrev: area["region"], latitude: area["lat"], longitude: area["lon"]}
[area["hostname"], info]
end
state_info.deep_merge(geo_info)
end
endYou mention that you write Javascript code. The good thing about a functional approach is that code is (syntactic differences aside) identical in any language (if it has minimal functional capabilities). In JS (although FP style is more friendly with Coffeescript) and underscore (+ custom abstractions as mixins) you could write the same code.
Code Snippets
require 'uri'
require 'nokogiri'
require 'net/http'
require 'json'
require 'facets'
module CraigsList
SitelistUrl = 'http://www.craigslist.org/about/sites'
GeospatialUrl = 'http://www.craigslist.org/about/areas.json'
# Return hash of pairs (hostname, {:name, :state, :stateabbr, :latitude, :longitude})
# for US regions in craigslist.
def self.get_regions(country_code)
doc = Nokogiri::HTML(Net::HTTP.get(URI(SitelistUrl)))
usregions = doc.search("a[name=#{country_code}]").first.parent.next_element.search('a')
state_info = usregions.mash do |usregion|
hostname = URI.parse(usregion.attr('href')).host.split(".").first
state = usregion.parent.parent.previous_element.content
info = {name: usregion.content, state: state}
[hostname, info]
end
areas = JSON.parse(Net::HTTP.get(URI(GeospatialUrl)))
geo_info = areas.slice(*state_info.keys).mash do |area|
info = {stateabbrev: area["region"], latitude: area["lat"], longitude: area["lon"]}
[area["hostname"], info]
end
state_info.deep_merge(geo_info)
end
endContext
StackExchange Code Review Q#27832, answer score: 2
Revisions (0)
No revisions yet.