patternMinor
Second to last word
Viewed 0 times
lastwordsecond
Problem
Grab the input file, and output the second to last word on every line.
Can I make it shorter and/or more efficient?
I'm surprised by the performance. While the runtime, +2s, is likely dominated by start up, it also requires +40M.
Can I make it shorter and/or more efficient?
I'm surprised by the performance. While the runtime, +2s, is likely dominated by start up, it also requires +40M.
(use 'clojure.java.io)
(use '[clojure.string :only (join split)])
(with-open [rdr (reader (first *command-line-args*))]
(doseq [line (line-seq rdr)]
(println
(nth (split line #" ") (- (count (split line #" ")) 2)))))Solution
Are you using Lein to run? Or straight Java? Either way, I think the memory use is normal even for a hello world program.
Following are some naive timings (I didn't run them many times). Here is my baseline hello world (uses about 30M). My computer is quite slow btw :):
Next is a baseline with my working corpus (Paradise Lost, 11404 lines of text and about 500 kb). I just output the first line (uses about 30M):
As you can see the time doesn't change much. Here is your program (about same memory):
A little more time but not much. Finally here is a little different version of that code (about same memory):
I'm not convinced it's actually faster (it would need many more trials), I think really it is about the same. But maybe you can think about using subvec (http://clojuredocs.org/clojure_core/clojure.core/subvec ) or something similar. Nth is O(n), while subvec is O(1), though in reality these are not very long lines.
Following are some naive timings (I didn't run them many times). Here is my baseline hello world (uses about 30M). My computer is quite slow btw :):
C:\george\test\secondtolast>timemem cmd /c lein run
Helloooooooooo world.
Exit code : 0
Elapsed time : 8.31
Kernel time : 0.03 (0.4%)
User time : 0.03 (0.4%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBNext is a baseline with my working corpus (Paradise Lost, 11404 lines of text and about 500 kb). I just output the first line (uses about 30M):
C:\george\test\secondtolast>timemem cmd /c lein run
?The Project Gutenberg EBook of Paradise Lost, by John Milton
Exit code : 0
Elapsed time : 8.45
Kernel time : 0.11 (1.3%)
User time : 0.03 (0.4%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBAs you can see the time doesn't change much. Here is your program (about same memory):
[11k+ lines of output]
Exit code : 0
Elapsed time : 10.98
Kernel time : 0.05 (0.4%)
User time : 0.02 (0.1%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBA little more time but not much. Finally here is a little different version of that code (about same memory):
Exit code : 0
Elapsed time : 9.13
Kernel time : 0.13 (1.4%)
User time : 0.02 (0.2%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBI'm not convinced it's actually faster (it would need many more trials), I think really it is about the same. But maybe you can think about using subvec (http://clojuredocs.org/clojure_core/clojure.core/subvec ) or something similar. Nth is O(n), while subvec is O(1), though in reality these are not very long lines.
(ns secondtolast.core
(require [clojure.java.io :as io]
[clojure.string :refer [join split]])
(:gen-class))
(defn -main
"I don't do a whole lot ... yet."
[]
(with-open [rdr (io/reader "pg20.txt")]
(doseq [line (line-seq rdr)]
(let [words (split line #" ")
c (count words)]
(if-not (or (empty? line) (= 1 c))
(println (last (subvec words 0 (- c 1))))
line)))))
Code Snippets
C:\george\test\secondtolast>timemem cmd /c lein run
Helloooooooooo world.
Exit code : 0
Elapsed time : 8.31
Kernel time : 0.03 (0.4%)
User time : 0.03 (0.4%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBC:\george\test\secondtolast>timemem cmd /c lein run
?The Project Gutenberg EBook of Paradise Lost, by John Milton
Exit code : 0
Elapsed time : 8.45
Kernel time : 0.11 (1.3%)
User time : 0.03 (0.4%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KB[11k+ lines of output]
Exit code : 0
Elapsed time : 10.98
Kernel time : 0.05 (0.4%)
User time : 0.02 (0.1%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBExit code : 0
Elapsed time : 9.13
Kernel time : 0.13 (1.4%)
User time : 0.02 (0.2%)
page fault # : 762
Working set : 2864 KB
Paged pool : 69 KB
Non-paged pool : 2 KB
Page file size : 2088 KBContext
StackExchange Code Review Q#45567, answer score: 3
Revisions (0)
No revisions yet.