patternMinor
"Intelligent" removal of carriage returns that preserves paragraph breaks, headings, etc
Viewed 0 times
removalparagraphcarriageintelligentheadingsbreaksthatreturnspreservesetc
Problem
It's often useful to strip carriage returns out of a plain text document, for example when copying and pasting into a field that automatically wraps lines. However, it's usually a good idea to leave some of the carriage returns in — most obviously at paragraph breaks, but also around bullet lists, section headings, etc. So one needs an algorithm to decide which carriage returns to retain.
In my attempt to address this problem, I came up with the following:
where
My question is: how would you improve on this algorithm (perhaps by considering more than one line of text at a time?)
In my attempt to address this problem, I came up with the following:
shouldMerge::[Char]->[Char]->Bool
shouldMerge "" _ = False
shouldMerge _ "" = False
shouldMerge _ nextline | (not . isAlphaNum . head) nextline = False
shouldMerge line nextline | length (line ++ " " ++ (head . words) nextline) <
length nextline = False
shouldMerge _ _ = Truewhere
shouldMerge is a function that attempts to guess whether a line should be merged with its successor. The rules state, roughly, 1) never merge blank lines; 2) don't merge with a line beginning with a non-alphanumeric character; and 3) if placing the first word on the next line at the end of the current line would result in a line shorter than the next line, don't merge, as the current line was probably cut short intentionally (catches things like section headings.) This set of rules "seems to work" :) much of the time.My question is: how would you improve on this algorithm (perhaps by considering more than one line of text at a time?)
Solution
Two things:
If you wanted to have your program consider multiple lines, a quick and dirty solution would be something like this:
NOTE: I haven't tested the above code for bugs.
Another solution would be to make shouldMerge recursive, with a type of
- Try using
Stringinstead of[Char], it reads a lot easier (imho).
- Don't use
head, use pattern matching instead. It reads better, and it's good practice, because if it passes an empty list you will get an error (though you will not encounter that in this program, since you already check for empty lists).
If you wanted to have your program consider multiple lines, a quick and dirty solution would be something like this:
function :: [String] -> [Bool]
function [] = []
function [line1,line2] = [(shouldMerge line1 line2)]
function (line1:line2:morelines) = [(shouldMerge line1 line2)] ++ function (line2:morelines)NOTE: I haven't tested the above code for bugs.
Another solution would be to make shouldMerge recursive, with a type of
[String] -> [Bool].Code Snippets
function :: [String] -> [Bool]
function [] = []
function [line1,line2] = [(shouldMerge line1 line2)]
function (line1:line2:morelines) = [(shouldMerge line1 line2)] ++ function (line2:morelines)Context
StackExchange Code Review Q#8687, answer score: 2
Revisions (0)
No revisions yet.