HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Converting Pandoc Markdown images from captioned to inline

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
convertingmarkdownpandocinlineimagesfromcaptioned

Problem

After writing a rather long document in Markdown and using pandoc to convert it to a PDF, I found, to my dismay, that many of the images were out of place, and that they all had their alternate text proudly displayed underneath them as captions. My document is rather instructional, so this rearrangement was harmful to its readability.

I eventually found a way to display the images as inline. I still wanted to write the document in standard Markdown, though, so I wrote a Python script to convert all the standalone images in a document to this inline form.

pandoc_images.py:

import sys

# Convert standalone images in standard Markdown
# to inline images in Pandoc's Markdown
# (see http://pandoc.org/README.html#images)
with open(sys.argv[1], 'r') as markdown:
    lines = markdown.read().splitlines()
    for index, line in enumerate(lines):
        is_first_line = index == 0
        preceding_blank = True if is_first_line else not lines[index - 1]

        is_last_line = index == len(lines) - 1
        following_blank = True if is_last_line else not lines[index + 1]

        is_standalone = preceding_blank and following_blank
        is_image = line.startswith('![') and '](' in line and line.endswith(')')
        print(line + ('\\\n' if is_standalone and is_image else ''))


Example (text.md):

This is some text.

!This is an image.

### This is a header.


Running python3 pandoc_images.py text.md would produce:

This is some text.

!This is an image.\

### This is a header.


It seems like a lot of mess (enumerate, bounds checking, etc.) for such a simple job, though. Is there any way I can improve any of this code?

Solution

How about a regular expression?

def convert(s):
  return re.sub(r"((?:\A|^ *\n)!\[.*\]\(.*\))\n(^ *\n|\Z)", r"\1\\\2", s, 0, re.M)

def test1():
  print convert("""![foo](bar)\n\nthis is a test\n""")

def test2():
  print convert("""line 1\n\n![foo](asd)\n\nanother test\n""")

def test3():
  print convert("""line 1\n\n![foo](asd)\n""")

def test4():
  print convert("""line 1\n\n![foo](asd)\nNot blank\n""")


Note: I am using ^\s*\n to match a blank line - i.e. it can also contain spaces.

Code Snippets

def convert(s):
  return re.sub(r"((?:\A|^ *\n)!\[.*\]\(.*\))\n(^ *\n|\Z)", r"\1\\\2", s, 0, re.M)

def test1():
  print convert("""![foo](bar)\n\nthis is a test\n""")

def test2():
  print convert("""line 1\n\n![foo](asd)\n\nanother test\n""")

def test3():
  print convert("""line 1\n\n![foo](asd)\n""")

def test4():
  print convert("""line 1\n\n![foo](asd)\nNot blank\n""")

Context

StackExchange Code Review Q#102335, answer score: 2

Revisions (0)

No revisions yet.