HiveBrain v1.2.0
Get Started
← Back to all entries
patterngoMinor

Un-wrap lines in a text file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
textfilewraplines

Problem

I have a function which takes a string representing a text file, joins lines which were wrapped, and returns a slice with the wrapped lines. I'm interested in making my code maintainable, idiomatic, and fast (in roughly that order).

var startLine = "^([A-Z][A-Za-z ]+[0-9]+)- "
var startLineRegex = regexp.MustCompile(startLine)

// Given a block of text, split it into lines delimited by startLineRegex.
// This is specialized for the format used in CHANGES.
func splitIntoLines(text string) []string {
    lines := strings.Split(html.EscapeString(text), "\n")
    out := []string{}
    cur := ""
    for _, line := range lines {
        if cur == "" {
            cur = line
        } else if startLineRegex.MatchString(line) {
            out = append(out, cur)
            cur = line
        } else {
            // Line continuations start with many spaces, remove them.
            cur += " " + strings.TrimSpace(line)
        }
    }
    if cur != "" {
        out = append(out, cur)
    }
    return out
}


On request I've added input, output, and a simple driver.

Sample input:

` 75- bnfnewprec could return a corrupt bnf structure:
K=bnfinit(x^3-15667x^2-88630960x-1836105977032,1);
bnfisprincipal(K,[29,14,15;0,1,0;0,0,1],3) -> oo loop
76- agm(1,2+O(5)) -> SEGV [#1645]
BA 77- [cygwin64] ellap(ellinit([0,0,1,-1,0]),10007) broken
78- primes([-5,5]) -> [5] (spurious absolute values)
79- matqr([;]) -> crash
80- Fp_rem_mBarrett could return a non-normalized result
p=436^56-35;Mod(271,p)^((p-1)/2) -> p+1
81- plotcopy would corrupt "string" objects (ROt_ST)
BA 82- [GP] default arguments to GP functions could cause corruption [#1658]
VBr83- [darwin] remove obsolete linker options that cause crashes [#1623]
84- divisors([2,1]) -> SEGV [#1664]
85- acos([Pol(1)]) -> GC bug [#1663]
86- matsolve(a,b) and a^(-1) gave wrong results [or SEGV] when t_MAT a
was not square and a,b "modular

Solution

Seeing your input and output really helps to make your code more understandable. Thanks for adding that detail.

It also makes it apparent that your logic is a little reversed. What you're doing is splitting on lines, and then joining where needed. What you should instead do, is split on the "keys", and then replace newlines. Let me explain.....

// Note flags - Non-capturing, and ^ matches start of line, not just start of text
var startChange = "(?m:[A-Z][A-Za-z ]+[0-9]+-) "
var startChangeRegex = regexp.MustCompile(startChange)
// match all end-of-lines surrounded by some, or no spaces.
var trimRegex = regexp.MustCompile("(?m) *$ *")

// Given a block of text, split it into lines delimited by startLineRegex.
// This is specialized for the format used in CHANGES.
func splitIntoLines(text string) []string {
    out := []string{}
    for _, change := range startChangeRegex.split(text) {
        change = trimRegex.ReplaceAllString(change, " ")
        out = append(out, change)
    }
    return out
}


Doing the split-on-change (with the non-capturing change regex) allows you to do a simpler "removal" of the newlines in each change. The result is... simpler.

Code Snippets

// Note flags - Non-capturing, and ^ matches start of line, not just start of text
var startChange = "(?m:[A-Z][A-Za-z ]+[0-9]+-) "
var startChangeRegex = regexp.MustCompile(startChange)
// match all end-of-lines surrounded by some, or no spaces.
var trimRegex = regexp.MustCompile("(?m) *$ *")

// Given a block of text, split it into lines delimited by startLineRegex.
// This is specialized for the format used in CHANGES.
func splitIntoLines(text string) []string {
    out := []string{}
    for _, change := range startChangeRegex.split(text) {
        change = trimRegex.ReplaceAllString(change, " ")
        out = append(out, change)
    }
    return out
}

Context

StackExchange Code Review Q#138957, answer score: 4

Revisions (0)

No revisions yet.