patterngoModerate
Bruteforce MD5 Password cracker
Viewed 0 times
crackermd5passwordbruteforce
Problem
I just started learning Go, and I wanted to created a project to learn more about concurrency in go. I heard about Go's lightweight threads, so I wanted to give them a try.
This program uses backtracking to brute-force a list of passwords loaded from a file. It tries from password of length 2 and go ahead until all passwords are been found.
It works well until password length doesn't come to 6: then my RAM will get full. I've already optimized my code in some ways e.g. in the first iteration of the program I used to create a chan for every thread, and every thread would wait for the spawned thread to terminate. Now it has a barrier.
I would need some suggestion about my code and spatial optimization tips.
```
package main
import (
"fmt"
"io/ioutil"
"strings"
"sync"
"log"
"time"
"crypto/md5"
)
var alfabeto = []rune {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9'}
func compute(prefix int, n int, a string, wgFather *sync.WaitGroup){
defer wgFather.Done()
if prefix == n-1 {
for _, d := range alfabeto {
password := fmt.Sprintf("%s%c", a, d)
if searchPassword(password) {
if len(passwords) == 0{
return
}
}
}
}else {
for i := range alfabeto{
wgFather.Add(1)
go compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
}
}
}
var passwords []string
func main(){
if loadPasswords() {
return
}
fmt.Println("File with passwords loaded. We're gonna crack", len(passwords),"passwords!")
start := time.Now()
cont := 2
for len(passwords) > 0 {
fmt.Println("Searching for passwords at length: ", cont)
var wg sync.WaitGroup
wg.Add(1)
go compute(0, cont, "", &wg)
wg.Wait()
cont++
}
This program uses backtracking to brute-force a list of passwords loaded from a file. It tries from password of length 2 and go ahead until all passwords are been found.
It works well until password length doesn't come to 6: then my RAM will get full. I've already optimized my code in some ways e.g. in the first iteration of the program I used to create a chan for every thread, and every thread would wait for the spawned thread to terminate. Now it has a barrier.
I would need some suggestion about my code and spatial optimization tips.
```
package main
import (
"fmt"
"io/ioutil"
"strings"
"sync"
"log"
"time"
"crypto/md5"
)
var alfabeto = []rune {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9'}
func compute(prefix int, n int, a string, wgFather *sync.WaitGroup){
defer wgFather.Done()
if prefix == n-1 {
for _, d := range alfabeto {
password := fmt.Sprintf("%s%c", a, d)
if searchPassword(password) {
if len(passwords) == 0{
return
}
}
}
}else {
for i := range alfabeto{
wgFather.Add(1)
go compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
}
}
}
var passwords []string
func main(){
if loadPasswords() {
return
}
fmt.Println("File with passwords loaded. We're gonna crack", len(passwords),"passwords!")
start := time.Now()
cont := 2
for len(passwords) > 0 {
fmt.Println("Searching for passwords at length: ", cont)
var wg sync.WaitGroup
wg.Add(1)
go compute(0, cont, "", &wg)
wg.Wait()
cont++
}
Solution
First let's mention a bug / issue:
Primary problem
Your memory problem arises from launching a tremendous amount of goroutines. For example when you call
(36 is the length of your alphabet;
60 million goroutines!
Goroutines are lightweight, but not that lightweight! Even if managing 1 goroutine would cost only 1 KB memory (it has its own stack etc.), this would require 60 GB memory! Understandable you run out of it. Your code spawns goroutines at a much quicker rate than they complete. (It should be noted that nothing in your code prevents spawning these new goroutines before any would be completed, so this is kind of worst case but still...)
An easy fix!
But the good news is that there is a really easy fix to this tremendous number of goroutines: simply do not spawn many goroutines.
A trivial way to limit spawning goroutines is that when you would spawn them, bind it to a condition to keep them at bay. For example launch 36 goroutines to process passwords starting with the different letters, but after that let 1 goroutine try all the combinations with that starting letter.
We can test this "first letter" condition by comparing
By inserting this condition and the
There is a minor "downside": you have no control how these 36 goroutines finish compared to each other, there may be a "relaxed" period when only 1 or 2 goroutines are running and others are finished, in this relaxed period CPU utilization will not be 100%. More formally CPU utilization will be
-
In your
-
It is also an unnecessary round-trip to convert a calculated MD5 to
-
Also note that your algorithm generates passwords multiple times. For example if you want to check passwords with a length of 3, these 3-letter passwords are essentially all the 2-letter passwords +1. But you don't make use of this, you always generate all passwords with a given length from "scratch".
Utilizing these tips would speed up your algorithm big time; not just because we got rid of lots of needless computation / conversion, but also because much less "garbage" will be generated for the GC.
Alternative
An alternative way of implementing this brute-force cracker would be to use the producer-consumer pattern. You could have a designated producer goroutine that would generate the possible passwords, and send them on a channel. You could have a fixed pool of consumer goroutines (e.g. 5 of them) which would loop over the channel on which generated passwords are delivered, and each would do the same: receive passwords, hash them (MD5) and check if it matches a crackable one.
The producer goroutine could simply close the channel when all combinations were generated, properly signalling consumers that no more passwords will be coming. The
This would result in a clean design, would result in fixed number of goroutines, and it would always utilize 100% CPU. It also has the advantage that it can be "throttled" with the proper selection of the channel capacity (buffered channel) and the number of consumer goroutines.
Here is how this producer-consumer could look like in Go if someone wants to play with it (also note that I elaborated th
- Your
passwordsvariable is accessed (read and modified) from multiple goroutines without any syncronization: this is a race condition!
Primary problem
Your memory problem arises from launching a tremendous amount of goroutines. For example when you call
compute() with n = 6 (to try passwords with a length of 6), it will create:36^5 = pow(36, 5) = 60466176(36 is the length of your alphabet;
5 is the prefix value for the condition prefix == n-1 when compute() stops spawning new goroutines)60 million goroutines!
Goroutines are lightweight, but not that lightweight! Even if managing 1 goroutine would cost only 1 KB memory (it has its own stack etc.), this would require 60 GB memory! Understandable you run out of it. Your code spawns goroutines at a much quicker rate than they complete. (It should be noted that nothing in your code prevents spawning these new goroutines before any would be completed, so this is kind of worst case but still...)
An easy fix!
But the good news is that there is a really easy fix to this tremendous number of goroutines: simply do not spawn many goroutines.
A trivial way to limit spawning goroutines is that when you would spawn them, bind it to a condition to keep them at bay. For example launch 36 goroutines to process passwords starting with the different letters, but after that let 1 goroutine try all the combinations with that starting letter.
We can test this "first letter" condition by comparing
prefix to 0:for i := range alfabeto {
wgFather.Add(1)
if prefix == 0 {
go compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
} else {
compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
}
}By inserting this condition and the
else branch just calling compute() on the same goroutine, you keep your goroutine number and memory usage at bay! But still, you utilize multiple goroutines and multiple CPU cores.There is a minor "downside": you have no control how these 36 goroutines finish compared to each other, there may be a "relaxed" period when only 1 or 2 goroutines are running and others are finished, in this relaxed period CPU utilization will not be 100%. More formally CPU utilization will be
-
In your
searchPassword() when you have the MD5 of the potential password, you always iterate over all crackable MD5 strings and you compare to all. This is a waste, you could sort the crackable MD5 values and use binary search to find if the potential is in it (sorting and binary search is implemented in the sort package). Or even better: you could build a map from the crackable MD5 strings, and just check if the candidate is in the map (now this check would be O(1) complexity instead of O(log n) of the binary search).-
It is also an unnecessary round-trip to convert a calculated MD5 to
string in order to check if it is a crackable one. Best would be to convert the crackable MD5 values to a byte array (note: array and not slice), and when you have the MD5 of a potential password as an array, you can check if it is a crackable one without converting it to string. Arrays are comparable in Go (unlike slices!), so you could also build a map from the MD5 arrays to check if a potential MD5 is in the map.-
Also note that your algorithm generates passwords multiple times. For example if you want to check passwords with a length of 3, these 3-letter passwords are essentially all the 2-letter passwords +1. But you don't make use of this, you always generate all passwords with a given length from "scratch".
Utilizing these tips would speed up your algorithm big time; not just because we got rid of lots of needless computation / conversion, but also because much less "garbage" will be generated for the GC.
Alternative
An alternative way of implementing this brute-force cracker would be to use the producer-consumer pattern. You could have a designated producer goroutine that would generate the possible passwords, and send them on a channel. You could have a fixed pool of consumer goroutines (e.g. 5 of them) which would loop over the channel on which generated passwords are delivered, and each would do the same: receive passwords, hash them (MD5) and check if it matches a crackable one.
The producer goroutine could simply close the channel when all combinations were generated, properly signalling consumers that no more passwords will be coming. The
for ... range construct on a channel handles the "close" event and terminates properly.This would result in a clean design, would result in fixed number of goroutines, and it would always utilize 100% CPU. It also has the advantage that it can be "throttled" with the proper selection of the channel capacity (buffered channel) and the number of consumer goroutines.
Here is how this producer-consumer could look like in Go if someone wants to play with it (also note that I elaborated th
Code Snippets
36^5 = pow(36, 5) = 60466176for i := range alfabeto {
wgFather.Add(1)
if prefix == 0 {
go compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
} else {
compute(prefix+1, n, fmt.Sprintf("%s%c", a, alfabeto[i]), wgFather)
}
}var wg sync.WaitGroup
func produce(ch chan<- []byte) {
// Now generate all passwords:
for {
if noMore { // If no more passwords
close(ch)
break
}
pass := ... // Here generate next password
ch <- pass // send it for processing
}
}
func consume(ch <-chan []byte) {
defer wg.Done()
for pass := range ch {
// Hash, check
}
}
func main() {
ch := make(chan []byte, 100) // Buffered channel
// Start consumers:
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(ch)
}
// Start producing: we can run this in the main goroutine
produce(ch)
wg.Wait() // Wait all consumers to finish processing passwords
}Context
StackExchange Code Review Q#114376, answer score: 15
Revisions (0)
No revisions yet.