HiveBrain v1.2.0
Get Started
← Back to all entries
patterngoMinor

Regular expression matching with string slice in Go

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
expressionwithregularstringmatchingslice

Problem

I have a slice of strings, and within each string contains multiple key=value formatted messages. I want to pull all the keys out of the strings so I can collect them to use as the header for a CSV file. I do not know all potential key fields, so I have to use regular expression matching to find them.

Here is my code.

package main

import (
    "fmt"
    "regexp"
)
func GetKeys(logs []string) []string {
        // topMatches is the final array to be returned.
        // midMatches contains no duplicates, but the data is `key=`.
        // subMatches contains all initial matches.
        // initialRegex matches for anthing that matches `key=`. this is because the matching patterns.
        // cleanRegex massages `key=` to `key`
        topMatches := []string{}
        midMatches := []string{}
        subMatches := []string{}
        initialRegex := regexp.MustCompile(`([a-zA-Z]{1,}\=)`)
        cleanRegex := regexp.MustCompile(`([a-zA-Z]{1,})`)

        // the nested loop for matches is because FindAllString
        // returns []string
        for _, i := range logs {
                matches := initialRegex.FindAllString(i, -1)
                for _, m := range matches {
                        subMatches = append(subMatches, m)
                }
        }

        // remove duplicates.
        seen := map[string]string{}
        for _, x := range subMatches {
                if _, ok := seen[x]; !ok {
                        midMatches = append(midMatches, x)
                        seen[x] = x
                }
        }
        // this is where I remove the `=` character.
        for _, y := range midMatches {
                clean := cleanRegex.FindAllString(y, 1)
                topMatches = append(topMatches, clean[0])
        }
        return topMatches
}

func main() {
    y := []string{"key=value", "msg=payload", "test=yay", "msg=payload"}
    y = GetKeys(y)
    fmt.Println(y)
}


I think my code is inefficient because I cannot determine how to

Solution

You're not making good use of regular expressions. A single regex can do the job:

pattern := regexp.MustCompile(`([a-zA-Z]+)=`)


The parentheses (...) are the capture the interesting part for you.

You can use result = pattern.FindAllStringSubmatch(s) to match a string against the regex pattern. The return value is a [][]string, where in each []string slice, the 1st element is the entire matched string, and the 2nd, 3rd, ... elements have the content of the capture groups. In this example we have one capture group (...), so the value of the key will be in item[1] of each []string slice.

Instead of a map[string]string map for seen, a map[string]boolean would be more efficient.

Putting it together:

func GetKeys(logs []string) []string {
    var keys []string
    pattern := regexp.MustCompile(`([a-zA-Z]+)=`)

    seen := make(map[string]bool)
    for _, log := range(logs) {
        result := pattern.FindAllStringSubmatch(log, -1)
        for _, item := range result {
            key := item[1]
            if _, ok := seen[key]; !ok {
                keys = append(keys, key)
                seen[key] = true
            }
        }
    }

    return keys
}


If the input strings are not guaranteed to be in the right format matching the pattern, then you might want to add a guard statement inside the main for loop, for example:

if len(result) != 2 {
        continue
    }

Code Snippets

pattern := regexp.MustCompile(`([a-zA-Z]+)=`)
func GetKeys(logs []string) []string {
    var keys []string
    pattern := regexp.MustCompile(`([a-zA-Z]+)=`)

    seen := make(map[string]bool)
    for _, log := range(logs) {
        result := pattern.FindAllStringSubmatch(log, -1)
        for _, item := range result {
            key := item[1]
            if _, ok := seen[key]; !ok {
                keys = append(keys, key)
                seen[key] = true
            }
        }
    }

    return keys
}
if len(result) != 2 {
        continue
    }

Context

StackExchange Code Review Q#121924, answer score: 6

Revisions (0)

No revisions yet.