HiveBrain v1.2.0
Get Started
← Back to all entries
patternrubyMinor

Refactor an XML to JSON parser class

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
classparserxmljsonrefactor

Problem

I used Nokogiri and a piece of ActiveSupport to parse an xml file from a given URL, format the data properly and return a JSON string. The script works as expected, so I'm only wondering if there are ways to make the code better from architecture/naming perspectives.

#!/usr/bin/env ruby

$:.unshift File.dirname(__FILE__)

require 'open-uri'
require 'nokogiri'
require 'active_support/core_ext'

class Scadenzario
  def initialize(url)
    @url = url
    @events = parse_events
  end

  def xml
    Nokogiri::XML::Document.parse(open(@url))
  end

  def to_json
    @events.to_json
  end

private

  def parse_events
    result = []
    xml.xpath('//item').each do |node|
      result << build_event(node)
    end

    result
  end

  def build_event(node)
    event = Hash.new
    event[:title] = node.at_xpath('title').content
    event[:url] = node.at_xpath('link').content
    event[:start] = format_event_time(node)
    event[:allDay] = false

    event
  end

  def format_event_time(node)
    Time.parse(node.at_xpath('pubDate').content).iso8601
  end
end

puts Scadenzario.new('servizi.seac.it/documenti/rss/rss_scadenzario_annuale.xml').to_json


The parse_events method can be written like this, but I think it would make the code rather unclear.

def parse_events
  xml.xpath('//item').inject([]) do |result, node|
    result << build_event(node)
  end
end

Solution

It all seems really nice, I would change few things:

Firstly, your format_event_time takes a node object as a param, which is counter-intuitive as I would expect it to accept Time object. I would rename it to 'parse_xml_date or similar and give it text instead:

def parse_xml_time(time_string)
  Time.parse(time_string).iso8601
end


The main purpose of this change is to make your
build_event method the only method specifying the exact xml locations of the required properties, so it is easy to find if those changed.

Secondly,your
build_event method can be slightly simplified to (including first change):

def build_event(node)
  {
    title: node.at_xpath('title').content
    url: node.at_xpath('link').content
    start: parse_xml_time(node.at_xpath('pubDate').content)
    allDay: false
  }
end


And finally I would add a small private method
get_param:

def get_xml_param(attribute, node)
  node.at_xpath(attribute).content
end


Which would simplify
build_event` to:

def build_event(node)
  {
    title: get_xml_param('title', node)
    url: get_xml_param('url', node)
    start: parse_xml_time(get_xml_param('pubTime', node))
    allDay: false
  }
end


UPDATE:

Actually there might be few more changes.

Your parse_events method can be written like this:

def parse_events
  xml.xpath('//item').map {|node| build_event(node) }
end


And I would consider caching an xml method:

def xml
  @xml ||= Nokogiri::XML::Document.parse(open(@url))
end

Code Snippets

def parse_xml_time(time_string)
  Time.parse(time_string).iso8601
end
def build_event(node)
  {
    title: node.at_xpath('title').content
    url: node.at_xpath('link').content
    start: parse_xml_time(node.at_xpath('pubDate').content)
    allDay: false
  }
end
def get_xml_param(attribute, node)
  node.at_xpath(attribute).content
end
def build_event(node)
  {
    title: get_xml_param('title', node)
    url: get_xml_param('url', node)
    start: parse_xml_time(get_xml_param('pubTime', node))
    allDay: false
  }
end
def parse_events
  xml.xpath('//item').map {|node| build_event(node) }
end

Context

StackExchange Code Review Q#49512, answer score: 4

Revisions (0)

No revisions yet.