patternMinor
Clojure MapReduce Reducer
Viewed 0 times
reducermapreduceclojure
Problem
This program forms the reducer of a Hadoop MapReduce job. It reads data in from stdin that is tab delimited.
and outputs
Any suggestions for improvements?
foo 1
foo 1
bar 1and outputs
foo 2
bar 1Any suggestions for improvements?
(use '[clojure.string :only [split]])
(def reducer (atom {}))
(defn update-map [map key]
(merge-with + map {key 1}))
(doseq [line (line-seq (java.io.BufferedReader. *in*))]
(let [k (first (split line #"\t"))]
(swap! reducer update-map k)))
(doseq [kv @reducer]
(println (format "%s\t%s" (first kv) (second kv))))Solution
probably a bit too late to help OP, but in case anyone else stumbles upon this question, here's a nice succinct way of doing it, using the
frequencies function:(doseq [[word freq] (frequencies
(map
#(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
(line-seq (java.io.BufferedReader. *in*))))]
(println (str word "\t" freq)))Code Snippets
(doseq [[word freq] (frequencies
(map
#(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
(line-seq (java.io.BufferedReader. *in*))))]
(println (str word "\t" freq)))Context
StackExchange Code Review Q#9221, answer score: 4
Revisions (0)
No revisions yet.