patternMinor
Idiomatic clojure code in a markdown parser
Viewed 0 times
clojureparsermarkdownidiomaticcode
Problem
Some time ago I created a markdown parser in clojure and I would like to get some feedback, since I'm a clojure noob in the first place (is the code understandable?/is it idiomatic?/can some things be improved?).
So I'm looking for feedback on best practices and design pattern usage (performance isn't my main concern).
The most relevant parts are:
blocks.clj
```
(ns mdclj.blocks
(:use [clojure.string :only [blank? split]]
[mdclj.spans :only [parse-spans]]
[mdclj.misc]))
(defn- collect-prefixed-lines [lines prefix]
(when-let [[prefixed remaining] (partition-while #(startswith % prefix) lines)]
[(map #(to-string (drop (count prefix) %)) prefixed) remaining]))
(defn- line-seperated [lines]
(when-let [[par [r & rrest :as remaining]] (partition-while (complement blank?) lines)]
(list par rrest)))
(declare parse-blocks)
(defn- create-block-map [type content & extra]
(into {:type type :content content} extra))
(defn- clean-heading-string [line]
(-> line (to-string)
(clojure.string/trim)
(clojure.string/replace #" #*$" "") ;; match space followed by any number of #s
(clojure.string/trim)))
(defn match-heading [[head & remaining :as text]]
(let [headings (map vector (range 1 6) (iterate #(str \# %) "#")) ;; ([1 "#"] [2 "##"] [3 "###"] ...)
[size rest] (some (fn [[index pattern]]
(let [rest (startswith head pattern)]
(when (seq rest)
[index rest]))) headings)]
(when (not (nil? rest))
[(create-block-map ::heading (parse-spans (clean-heading-string rest)) {:size size}) remaining])))
(defn- match-underline-heading [[caption underline & remaining :as text]]
(let [current (set underline)
marker [\- \=]
markers (mapcat #(list #{\space %} #{%}) marker)]
(when (and (some #(= % current) markers)
(some #(startswith underline [%]) marker)
( (so
So I'm looking for feedback on best practices and design pattern usage (performance isn't my main concern).
The most relevant parts are:
blocks.clj
```
(ns mdclj.blocks
(:use [clojure.string :only [blank? split]]
[mdclj.spans :only [parse-spans]]
[mdclj.misc]))
(defn- collect-prefixed-lines [lines prefix]
(when-let [[prefixed remaining] (partition-while #(startswith % prefix) lines)]
[(map #(to-string (drop (count prefix) %)) prefixed) remaining]))
(defn- line-seperated [lines]
(when-let [[par [r & rrest :as remaining]] (partition-while (complement blank?) lines)]
(list par rrest)))
(declare parse-blocks)
(defn- create-block-map [type content & extra]
(into {:type type :content content} extra))
(defn- clean-heading-string [line]
(-> line (to-string)
(clojure.string/trim)
(clojure.string/replace #" #*$" "") ;; match space followed by any number of #s
(clojure.string/trim)))
(defn match-heading [[head & remaining :as text]]
(let [headings (map vector (range 1 6) (iterate #(str \# %) "#")) ;; ([1 "#"] [2 "##"] [3 "###"] ...)
[size rest] (some (fn [[index pattern]]
(let [rest (startswith head pattern)]
(when (seq rest)
[index rest]))) headings)]
(when (not (nil? rest))
[(create-block-map ::heading (parse-spans (clean-heading-string rest)) {:size size}) remaining])))
(defn- match-underline-heading [[caption underline & remaining :as text]]
(let [current (set underline)
marker [\- \=]
markers (mapcat #(list #{\space %} #{%}) marker)]
(when (and (some #(= % current) markers)
(some #(startswith underline [%]) marker)
( (so
Solution
This is overall very impressive! Here are my thoughts:
It looks like you've pretty much written your parser "from scratch" -- you have it set up so that it takes text as an input and dissects it line by line, looking for blocks and spans, and labeling and converting them appropriately. This is great, but a much easier way to go about this would be to use a parsing library like instaparse. Using this approach, you define a simple grammar as either a string or a separate text file, and then use instaparse to turn it into a custom parser, then you just use the parser as a function on the text, returning a parse tree that contains all of the information you need, in either hiccup or enlive format. There's a little bit of a learning curve if you've never defined a grammar before, but I found it pretty easy to learn, and instaparse is one of the most intuitive parsing libraries out of the handful that I tried. I would recommend this method -- it lets you worry more about defining your grammar and leave the implementation details to instaparse. At this point you've already done most (all?) of the work manually, so you might want to stick with the structure you have in place, but you should at least consider re-doing it with a parsing library -- it would at least make it easier to add new Markdown features.
I think your
I think your
But you have so many different
Lastly, I saw this bit in your
You could simplify this to just:
Since you're only using
Hope that helps. You're off to a great start!
It looks like you've pretty much written your parser "from scratch" -- you have it set up so that it takes text as an input and dissects it line by line, looking for blocks and spans, and labeling and converting them appropriately. This is great, but a much easier way to go about this would be to use a parsing library like instaparse. Using this approach, you define a simple grammar as either a string or a separate text file, and then use instaparse to turn it into a custom parser, then you just use the parser as a function on the text, returning a parse tree that contains all of the information you need, in either hiccup or enlive format. There's a little bit of a learning curve if you've never defined a grammar before, but I found it pretty easy to learn, and instaparse is one of the most intuitive parsing libraries out of the handful that I tried. I would recommend this method -- it lets you worry more about defining your grammar and leave the implementation details to instaparse. At this point you've already done most (all?) of the work manually, so you might want to stick with the structure you have in place, but you should at least consider re-doing it with a parsing library -- it would at least make it easier to add new Markdown features.
I think your
block-matcher/parse-blocks section is elegant and idiomatic as far as I'm concerned. It's a nice demonstration of first-class functions in Clojure. The only thing is, I'm not sure that you need to wrap it in a lazy-seq, since don't you need to realize the entire sequence? I haven't looked super thoroughly at your code, but I'm assuming you would use this function to parse all the lines of text from the input, so this sequence might not necessarily need to be lazy. It all depends on how you're using that function, though.I think your
create-block-map function is nice and idiomatic, too. It's such a simple function that you could potentially do without it, and just have all of your match-* functions return something like this:[{:type ::heading
:content (parse-spans (clean-heading-string rest))
:size size}
remaining]But you have so many different
match-* functions, it would get tedious having that show up under every single one of them, so I think you did the right thing by pulling it out into a separate function, thereby enabling you to express the above as just, e.g., (create-block-map ::heading (parse-spans (clean-heading-string rest)) {:size size}. My only suggestion would be to consider renaming it to just block-map for the sake of simplicity -- that's just a minor aesthetic preference, though.Lastly, I saw this bit in your
match-heading function:(when (not (nil? rest))
[(create-block-map ...You could simplify this to just:
(when rest
[(create-block-map ...Since you're only using
rest in a boolean context (it's either nil or it's a non-nil value) within a when expression, you can just use the value of rest itself. If it's not nil or false, the rest of the expression will be returned, otherwise nil will be returned. The only reason you might not want to do it this way is if you still want the rest of the when expression to be returned if the value of rest is false -- i.e., literally anything but nil. If that's not a concern, though, (when rest ... is more concise and idiomatic.Hope that helps. You're off to a great start!
Code Snippets
[{:type ::heading
:content (parse-spans (clean-heading-string rest))
:size size}
remaining](when (not (nil? rest))
[(create-block-map ...(when rest
[(create-block-map ...Context
StackExchange Code Review Q#37736, answer score: 2
Revisions (0)
No revisions yet.