patterngoMinor
Extract title with regex
Viewed 0 times
titleregexwithextract
Problem
I'm quite new to Go and I feel like this code could be smaller and cleaner.
I would love any suggestions and/or hints about mistakes and conventional go things!
I would love any suggestions and/or hints about mistakes and conventional go things!
func getBookTitle(client *http.Client) {
rsp, err := client.Get(bookSite)
if err != nil {
panic(err)
}
html, _ := ioutil.ReadAll(rsp.Body)
//Get div with title
regTitle := regexp.MustCompile("()[\n+\\s]*()[a-zA-Z–\\-\n\\s:]*()[\n+\\sdd]*()")
//remove linebreaks regex
regFormatTitle := regexp.MustCompile("[\r\n]*")
//apply regex
title := regFormatTitle.ReplaceAllString(string(regTitle.Find(html)),"")
//Remove remove html tags and remove whitespaces
title = strings.TrimSpace(title[strings.Index(title,"")+len(""):strings.Index(title,"")])
fmt.Printf("Book title:%s\n",title)
rsp.Body.Close()
}Solution
Go practices
It's recommended to close things using
so that you won't forget them later.
For example you have:
Using
Another thing, it's recommended to not ignore errors.
This example ignores the
and that may lead to crashes later in the program.
Single responsibility principle
It's recommended for functions to do just one thing.
The
The name implies only the 3rd task. It would be better if it did only that,
and the other tasks should be performed in other functions.
Strange regex
This regex looks strange:
For several reasons:
-
All in all, it seems the regex is mistakenly over-complicated,
this simpler regex is probably good enough and much more clear:
It's recommended to close things using
defer as early as possible,so that you won't forget them later.
For example you have:
html, _ := ioutil.ReadAll(rsp.Body)
// some code
defer rsp.Body.Close()Using
defer, you should write like this instead:html, _ := ioutil.ReadAll(rsp.Body)
defer rsp.Body.Close()
// some codeAnother thing, it's recommended to not ignore errors.
This example ignores the
err value returned by ioutil.ReadAll,and that may lead to crashes later in the program.
Single responsibility principle
It's recommended for functions to do just one thing.
The
getBookTitle function does do many things:- Download a web page from
bookSite(a free variable defined elsewhere)
- Read content from a stream
- Extract title from an HTML string
The name implies only the 3rd task. It would be better if it did only that,
and the other tasks should be performed in other functions.
Strange regex
This regex looks strange:
()[\n+\\s]*()[a-zA-Z–\\-\n\\s:]*()[\n+\\sdd]*()For several reasons:
-
[\n+\\s] is odd. \\s already includes \n. You could write simpler as [+\\s]. But are you really expecting a literal + between the starting ` and ? I seriously doubt it. In which case the expression could be reduced to just \\s.
-
[\n+\\sdd] is similarly odd. Again, the \n is unnecessary. Two d are also unnecessary, and if there are literal d between and , that would be invalid HTML.
-
Why the groupings with (...)`? Those parentheses don't serve any purpose in this program.All in all, it seems the regex is mistakenly over-complicated,
this simpler regex is probably good enough and much more clear:
\\s*[a-zA-Z–\\s:]*\\s*Code Snippets
html, _ := ioutil.ReadAll(rsp.Body)
// some code
defer rsp.Body.Close()html, _ := ioutil.ReadAll(rsp.Body)
defer rsp.Body.Close()
// some code(<div class=\"dotd-title\">)[\n+\\s]*(<h2>)[a-zA-Z–\\-\n\\s:]*(</h2>)[\n+\\sdd]*(</div>)<div class=\"dotd-title\">\\s*<h2>[a-zA-Z–\\s:]*</h2>\\s*</div>Context
StackExchange Code Review Q#158837, answer score: 2
Revisions (0)
No revisions yet.