HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Removing nested blocks from a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
blocksremovingnestedfromstring

Problem

I wrote this function in scala that uses tail recursion to remove nested blocks from a text.

Usage examples:

removeBlocks("123{456}789", "{", "}")                            yields "123789"
removeBlocks("123{a{b{c}b}a}789", "{", "}")                      yields "123789"
removeBlocks("123456789", "", "")          yields "123789"
removeBlocks("123>>789", "", "")      yields "123>789"


This is one of my first attempts in a functional language, and I'm deeply dissatistied with my code. It's too long, too nested, and could probably be made more readable. I'm coming from C# and hope to be forgiven for this bad functional code. I'll be happy to hear how it can be improved.

def removeBlocks(text: String, startMarker: String, endMarker: String) = {
    val startMarkerSize = startMarker.size
    val endMarkerSize = endMarker.size
    val startMarkerHead = startMarker.head
    val endMarkerHead = endMarker.head

    def removeBlocksAcc(s: String, acc: String, nestingLevel: Int): String =
    if (s.isEmpty) acc
      else {
        val (startMarkerCandidate, tail1) = s.splitAt(startMarkerSize)
        if (startMarkerCandidate == startMarker) 
          removeBlocksAcc(tail1, acc, nestingLevel + 1)
        else {
          val (endMarkerCandidate, tail2) = s.splitAt(endMarkerSize)
          if (endMarkerCandidate == endMarker) 
            removeBlocksAcc(tail2, acc, math.max(nestingLevel - 1, 0))
          else {
            val (safePart, candidate) = s.tail.span(c => c != startMarkerHead && c != endMarkerHead)
            if (nestingLevel == 0) 
              removeBlocksAcc(candidate, acc + s.head + safePart, nestingLevel)
            else 
              removeBlocksAcc(candidate, acc, nestingLevel)
          }
        }
      }
    removeBlocksAcc(text, "", 0)
  }

Solution

I think you are trying to do some pattern matching on strings.

So I did this:

def escape(s: String): String = Seq("{", "}", "whatever").foldLeft(s)((x, y) => x.replaceAllLiterally(y, "\\" + y))

def removeBlocks(text: String, open: String, close: String): String = {
    val startReg = (s"${escape(open)}(.*)$").r
    val endReg = (s"${escape(close)}(.*)$").r
    val otherText = "(.)(.*)$".r

    @tailrec
    def removeBlocksAux(text: String, acc: String = "", lvl: Int = 0): String = {
        assume((text.length() > 0 || lvl == 0) && lvl >= 0)
        (text, lvl) match {
            case (startReg(text), lvl)     => removeBlocksAux(text, acc, lvl + 1)
            case (endReg(text), lvl)       => removeBlocksAux(text, acc, lvl - 1)
            case (otherText(c, text), 0)   => removeBlocksAux(text, acc + c)
            case (otherText(c, text), lvl) => removeBlocksAux(text, acc, lvl)
            case ("", _)                   => acc
        }
    }
    removeBlocksAux(text)
}


I tried to find a way to define the regular expressions inside the match group, but I failed at that, it may not be a good idea after all, but I would like to have a more compact and efficient representation for it. The escape function should escape all special cases of operators in Scala regex, that sounds boring, so there may be already a method for that.

I'm also not a big fan of having a function inside a function, but I kept that design decision. Finally, I'd simply use parser combinators for this, but I guess that's not the point.

Code Snippets

def escape(s: String): String = Seq("{", "}", "whatever").foldLeft(s)((x, y) => x.replaceAllLiterally(y, "\\" + y))

def removeBlocks(text: String, open: String, close: String): String = {
    val startReg = (s"${escape(open)}(.*)$$").r
    val endReg = (s"${escape(close)}(.*)$$").r
    val otherText = "(.)(.*)$".r

    @tailrec
    def removeBlocksAux(text: String, acc: String = "", lvl: Int = 0): String = {
        assume((text.length() > 0 || lvl == 0) && lvl >= 0)
        (text, lvl) match {
            case (startReg(text), lvl)     => removeBlocksAux(text, acc, lvl + 1)
            case (endReg(text), lvl)       => removeBlocksAux(text, acc, lvl - 1)
            case (otherText(c, text), 0)   => removeBlocksAux(text, acc + c)
            case (otherText(c, text), lvl) => removeBlocksAux(text, acc, lvl)
            case ("", _)                   => acc
        }
    }
    removeBlocksAux(text)
}

Context

StackExchange Code Review Q#71031, answer score: 2

Revisions (0)

No revisions yet.