HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Clojure MapReduce Reducer

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
reducermapreduceclojure

Problem

This program forms the reducer of a Hadoop MapReduce job. It reads data in from stdin that is tab delimited.

foo    1
foo    1
bar    1


and outputs

foo    2
bar    1


Any suggestions for improvements?

(use '[clojure.string :only [split]])
(def reducer (atom {}))

(defn update-map [map key]
  (merge-with + map {key 1}))

(doseq [line (line-seq (java.io.BufferedReader. *in*))]
  (let [k (first (split line #"\t"))]
    (swap! reducer update-map k)))

(doseq [kv @reducer]
  (println (format "%s\t%s" (first kv) (second kv))))

Solution

probably a bit too late to help OP, but in case anyone else stumbles upon this question, here's a nice succinct way of doing it, using the frequencies function:

(doseq [[word freq] (frequencies
                      (map
                        #(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
                        (line-seq (java.io.BufferedReader. *in*))))]
  (println (str word "\t" freq)))

Code Snippets

(doseq [[word freq] (frequencies
                      (map
                        #(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
                        (line-seq (java.io.BufferedReader. *in*))))]
  (println (str word "\t" freq)))

Context

StackExchange Code Review Q#9221, answer score: 4

Revisions (0)

No revisions yet.