HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

String-splitting function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionstringsplitting

Problem

This function was hard to write as a Clojure newbie, and I don't like the result.
Can you help me find a better (more readable) way to do it?

(defn split-seq
  "Splits a seq into blocks defined by start-fn and stop-fn.
  Returns a lazy seq of seqs"
  [start-fn stop-fn lines]
  (let [step (fn [c state]
                 (when-let [s (seq c)]
                   (if (stop-fn (first s)) 
                     (cons state (split-seq start-fn stop-fn (rest s) ))
                     (recur (rest s) 
                        (if (start-fn (first s))
                            '()
                            (cons (first s) state))))))]
    (lazy-seq (step lines '()))))

(defn post-start? [l] (.startsWith l "#ENTRY_START"))
(defn post-end? [l] (.startsWith l "#ENTRY_END"))

(defn split-lines
    "Split a line-seq into entries based on #ENTRY_START and #ENTRY_END."
    [data]
    (split-seq post-start? post-end? data))

;
; Test
;
(def test-data [
    "Header line"
    "#ENTRY_START"
    "entry line 1 "
    "entry line 2"
    "#ENTRY_END"
    "This line should be filtered out"
    "#ENTRY_START Having data here shouldn't make a difference."
    "entry line 1 "
    "entry line 2"
    "#ENTRY_END"
    "This should be gone too"])

(split-lines test-data)

; yields (("entry line 2" "entry line 1 ") ("entry line 2" "entry line 1 "))
; The order of elements doesn't matter in my case because I'm making a map with this data


With Arthurs help, the final code looks like this:

(defn entry-seq [data] 
    (let [[f r] (split-with #(not (post-end? %)) 
                  (rest (drop-while #(not (post-start? %)) data)))] 
      (when (not-empty f) (lazy-seq (cons f (entry-seq2 r))))))


Still lazy and a lot more readable!

Solution

So the idea in this new approach is to write two expressions:

  • one that extracts the next expresstion:



(take-while #(not= "#ENTRY_END" %)
(rest (drop-while #(not= "#ENTRY_START" %) data)))


  • one that extracts everything after the next expression



(rest (drop-while #(not= "#ENTRY_END" %)
(rest (drop-while #(not= "#ENTRY_START" %) data))))


and then wrap them up into a lazy sequence:

first lets expand the test data to include some additional edge cases:

user> (def test-data [
       "Header line"
       "#ENTRY_START"
       "entry line 1 "
       "entry line 2"
       "#ENTRY_END"
       "not part of an entry"
       "also not part of an entry"
       "#ENTRY_START"
       "entry line 1 "
       "entry line 2"
       "#ENTRY_END"
       "footer1"
       "footer2"])


then wrap our two expressions into a function:

user> (defn entry-seq [data] 
        (let [f (take-while #(not= "#ENTRY_END" %) 
                            (rest (drop-while #(not= "#ENTRY_START" %) data))) 
              r (rest (drop-while #(not= "#ENTRY_END" %) 
                            (rest (drop-while #(not= "#ENTRY_START" %) data))))] 
      (when (not-empty f) (lazy-seq (cons f (entry-seq r))))))
#'user/entry-seq


and we test it:

user> (take 4  (entry-seq test-data))
(("entry line 1 " "entry line 2") ("entry line 1 " "entry line 2"))

Code Snippets

user> (def test-data [
       "Header line"
       "#ENTRY_START"
       "entry line 1 "
       "entry line 2"
       "#ENTRY_END"
       "not part of an entry"
       "also not part of an entry"
       "#ENTRY_START"
       "entry line 1 "
       "entry line 2"
       "#ENTRY_END"
       "footer1"
       "footer2"])
user> (defn entry-seq [data] 
        (let [f (take-while #(not= "#ENTRY_END" %) 
                            (rest (drop-while #(not= "#ENTRY_START" %) data))) 
              r (rest (drop-while #(not= "#ENTRY_END" %) 
                            (rest (drop-while #(not= "#ENTRY_START" %) data))))] 
      (when (not-empty f) (lazy-seq (cons f (entry-seq r))))))
#'user/entry-seq
user> (take 4  (entry-seq test-data))
(("entry line 1 " "entry line 2") ("entry line 1 " "entry line 2"))

Context

StackExchange Code Review Q#18071, answer score: 4

Revisions (0)

No revisions yet.