patterngoMinor
Web Crawler (A Tour of Go #71)
Viewed 0 times
crawlertourweb
Problem
So I was trying to do this exercise from Tour of Go. I managed to get it working. I am not at all sure that it is correctly concurrent or idiomatic (I started learning Go, like, 4 hours ago).
I would really appreciate feedback about my solution. Here's the relevant meat of the code. Full code available below.
The full source is at: http://pastie.org/8443147
I would really appreciate feedback about my solution. Here's the relevant meat of the code. Full code available below.
type urlToFetch struct {
Url string
Depth int
}
// Group stuff together, so that it's easier to pass around.
type fetchContext struct {
Queue chan urlToFetch
History map[string]int
Quit chan int
Fetcher Fetcher
}
func Crawl(url string, depth int, fetcher Fetcher) {
context := fetchContext{
Queue: make(chan urlToFetch, 500),
History: make(map[string]int),
Quit: make(chan int, 50),
Fetcher: fetcher,
}
go fetchOne(&context)
context.Queue 0 {
body, urls, err := ctx.Fetcher.Fetch(utf.Url)
ctx.History[utf.Url] = 1
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", utf.Url, body)
for _, u := range urls {
go fetchOne(ctx)
if _, ok := ctx.History[u]; !ok {
ctx.Queue <- urlToFetch{u, utf.Depth - 1}
} else {
}
}
}
case <-timeout:
ctx.Quit <- 1
}
}The full source is at: http://pastie.org/8443147
Solution
There are two things I see.
-
Don't store a reference to a time out. I think the count down starts when After() is called.
Most code I've seen writes it like this.
-
Use a lock when accessing a map from multiple goroutines because maps aren't thread safe.
Better solution:
Useful link:
https://stackoverflow.com/questions/12224962/exercise-web-crawler-concurrency-not-working
-
Don't store a reference to a time out. I think the count down starts when After() is called.
timeout := time.After(2000 * time.Millisecond)
...
case <- timeout:Most code I've seen writes it like this.
case <- time.After(2000 * time.Millisecond):-
Use a lock when accessing a map from multiple goroutines because maps aren't thread safe.
Better solution:
func Crawl(url string, depth int, fetcher Fetcher) {
m := map[string]bool{url: true}
var mx sync.Mutex
var wg sync.WaitGroup
var c2 func(string, int)
c2 = func(url string, depth int) {
defer wg.Done()
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", url, body)
mx.Lock()
for _, u := range urls {
if !m[u] {
m[u] = true
wg.Add(1)
go c2(u, depth-1)
}
}
mx.Unlock()
}
wg.Add(1)
c2(url, depth)
wg.Wait()
}Useful link:
https://stackoverflow.com/questions/12224962/exercise-web-crawler-concurrency-not-working
Code Snippets
timeout := time.After(2000 * time.Millisecond)
...
case <- timeout:case <- time.After(2000 * time.Millisecond):func Crawl(url string, depth int, fetcher Fetcher) {
m := map[string]bool{url: true}
var mx sync.Mutex
var wg sync.WaitGroup
var c2 func(string, int)
c2 = func(url string, depth int) {
defer wg.Done()
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", url, body)
mx.Lock()
for _, u := range urls {
if !m[u] {
m[u] = true
wg.Add(1)
go c2(u, depth-1)
}
}
mx.Unlock()
}
wg.Add(1)
c2(url, depth)
wg.Wait()
}Context
StackExchange Code Review Q#33524, answer score: 3
Revisions (0)
No revisions yet.