HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Second to last word

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lastwordsecond

Problem

Grab the input file, and output the second to last word on every line.

Can I make it shorter and/or more efficient?
I'm surprised by the performance. While the runtime, +2s, is likely dominated by start up, it also requires +40M.

(use 'clojure.java.io)
(use '[clojure.string :only (join split)])

(with-open [rdr (reader (first *command-line-args*))]
  (doseq [line (line-seq rdr)]
    (println
      (nth (split line #" ") (- (count (split line #" ")) 2)))))

Solution

Are you using Lein to run? Or straight Java? Either way, I think the memory use is normal even for a hello world program.

Following are some naive timings (I didn't run them many times). Here is my baseline hello world (uses about 30M). My computer is quite slow btw :):

C:\george\test\secondtolast>timemem cmd /c lein run
Helloooooooooo world.
Exit code      : 0
Elapsed time   : 8.31
Kernel time    : 0.03 (0.4%)
User time      : 0.03 (0.4%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB


Next is a baseline with my working corpus (Paradise Lost, 11404 lines of text and about 500 kb). I just output the first line (uses about 30M):

C:\george\test\secondtolast>timemem cmd /c lein run
?The Project Gutenberg EBook of Paradise Lost, by John Milton
Exit code      : 0
Elapsed time   : 8.45
Kernel time    : 0.11 (1.3%)
User time      : 0.03 (0.4%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB


As you can see the time doesn't change much. Here is your program (about same memory):

[11k+ lines of output]
Exit code      : 0
Elapsed time   : 10.98
Kernel time    : 0.05 (0.4%)
User time      : 0.02 (0.1%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB


A little more time but not much. Finally here is a little different version of that code (about same memory):

Exit code      : 0
Elapsed time   : 9.13
Kernel time    : 0.13 (1.4%)
User time      : 0.02 (0.2%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB


I'm not convinced it's actually faster (it would need many more trials), I think really it is about the same. But maybe you can think about using subvec (http://clojuredocs.org/clojure_core/clojure.core/subvec ) or something similar. Nth is O(n), while subvec is O(1), though in reality these are not very long lines.

(ns secondtolast.core
(require [clojure.java.io :as io]
[clojure.string :refer [join split]])
(:gen-class))

(defn -main
"I don't do a whole lot ... yet."
[]
(with-open [rdr (io/reader "pg20.txt")]
(doseq [line (line-seq rdr)]
(let [words (split line #" ")
c (count words)]
(if-not (or (empty? line) (= 1 c))
(println (last (subvec words 0 (- c 1))))
line)))))

Code Snippets

C:\george\test\secondtolast>timemem cmd /c lein run
Helloooooooooo world.
Exit code      : 0
Elapsed time   : 8.31
Kernel time    : 0.03 (0.4%)
User time      : 0.03 (0.4%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB
C:\george\test\secondtolast>timemem cmd /c lein run
?The Project Gutenberg EBook of Paradise Lost, by John Milton
Exit code      : 0
Elapsed time   : 8.45
Kernel time    : 0.11 (1.3%)
User time      : 0.03 (0.4%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB
[11k+ lines of output]
Exit code      : 0
Elapsed time   : 10.98
Kernel time    : 0.05 (0.4%)
User time      : 0.02 (0.1%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB
Exit code      : 0
Elapsed time   : 9.13
Kernel time    : 0.13 (1.4%)
User time      : 0.02 (0.2%)
page fault #   : 762
Working set    : 2864 KB
Paged pool     : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB

Context

StackExchange Code Review Q#45567, answer score: 3

Revisions (0)

No revisions yet.