patterngoMinor
Byte conversion in my Caesar's cipher
Viewed 0 times
bytecaesarconversioncipher
Problem
I was wondering if I am doing any useless conversions byte←→int, for example:
Converting
byte((int(ch-'A')+shift)%26 + 'A')Converting
ch-'A' to an int is because the shift argument can be negative (to implement the decode function). I couldn't figure out a simpler way to negate the operation using bytes.package main
import (
"bufio"
"flag"
"fmt"
"os"
)
func main() {
shift := flag.Int("shift", 13, "Cipher shift")
decode := flag.Bool("decode", false, "Decode input")
flag.Parse()
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
if *decode {
fmt.Println(Decode(scanner.Text(), *shift))
} else {
fmt.Println(Encode(scanner.Text(), *shift))
}
}
}
func Encode(s string, shift int) string {
return cipher(s, shift)
}
func Decode(s string, shift int) string {
return cipher(s, -shift+26)
}
func cipher(s string, shift int) string {
var line string
for _, ch := range []byte(s) {
if ch >= 'A' && ch = 'a' && ch <= 'z' {
ch = byte((int(ch-'a')+shift)%26 + 'a')
}
line += string(ch)
}
return line
}Solution
Flags
It's not very problematic for very short programs like this, but better get good habits: flags should be outside of all code blocks; so that you detect immediately if you have a flag naming conflict between different files.
Code organization
You're checking for the value of the
Type conversions
The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.
while
a newly allocated string.
So here, you allocate a string to copy bytes; and then you allocate a new slice of type
So you're saving two conversions by not dealing with strings at all:
Note the slight change in the way you print the result: using
You're also wondering whether the conversions from
However, this is just a simple type conversion: the performance gain from it will be much lower than when avoiding memory allocations, so you shouldn't worry about it too much =)
Optimizations
Initializing a
okay, now let's go even further: calling
We can go even further by noticing that the
Final words
The code we obtain is more readable, definitely more efficient; and has the same behavior as yours. However, there are still two potential issues with it:
It's not very problematic for very short programs like this, but better get good habits: flags should be outside of all code blocks; so that you detect immediately if you have a flag naming conflict between different files.
var (
shiftF = flag.Int("shift", 13, "Cipher shift")
decodeF = flag.Bool("decode", false, "Decode input")
)
func main() {
…
}Code organization
You're checking for the value of the
-decode flag every time you scan a line, and recomputing -shift+26 every time as well: not very efficient, nor super readable. As you noticed, the only difference between encoding and decoding is the value of shift. Why not do something like:func main() {
flag.Parse()
// we could also directly change the value of the flag, but I find it
// less readable — it's better to treat flag values as immutable.
var shift int
if *decode {
shift = 26 - *shiftF
} else {
shift = *shiftF
}
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
cipher(scanner.Text(), shift)
}
}Type conversions
bufio.Scanner holds bytes internally: the doc for Scanner.Bytes says:The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.
while
Scanner.Text indicates that it returns:a newly allocated string.
So here, you allocate a string to copy bytes; and then you allocate a new slice of type
[]byte to copy this string, which is also an expensive operation.So you're saving two conversions by not dealing with strings at all:
func main() {
…
for scanner.Scan() {
fmt.Printf("%s\n", cipher(scanner.Bytes(), shift))
}
}
func cipher(bytes []byte, shift int) []byte {
var line []byte
for _, b := range bytes {
if b >= 'A' && b = 'a' && b <= 'z' {
b = byte((int(b-'a')+shift)%26 + 'a')
}
line = append(line, b)
}
return line
}Note the slight change in the way you print the result: using
fmt.Println on a []byte variable will print the numeric value of each byte, something like [116 104 101 32 103 97 109 101], so you have to use fmt.Printf to tell it to print it like a string.You're also wondering whether the conversions from
byte to int are inefficient: kind of, yes. Arithmetic operations work on bytes, so you could do instead:shiftB := byte(shift)
b = ((b-'A')+shiftB)%26 + 'A'However, this is just a simple type conversion: the performance gain from it will be much lower than when avoiding memory allocations, so you shouldn't worry about it too much =)
Optimizations
Initializing a
[]byte variable and appending char by char to it is not great: the slice is going to be re-sized every time it hits its maximum default size. It's even worse in your code: you're initializing a string variable and appending char by char to it, which will reallocate the string each time. In cipher, you know in advance which size line is going to be: the same as the original slice. So you can initialize directly with the correct allocated capacity:line := make([]byte, 0, len(bytes))okay, now let's go even further: calling
append at each iteration means that under the hood, there will be a check to see if the slice over capacity. We know it's not going to happen, because the sizes are the same: why not always allocate the exact amount of memory that we need and write at the right place directly?func cipher(bytes []byte, shift int) []byte {
shiftB := byte(shift)
line := make([]byte, len(bytes))
for i, b := range bytes {
if b >= 'A' && b = 'a' && b <= 'z' {
b = ((b-'a')+shiftB)%26 + 'a'
}
line[i] = b
}
return line
}We can go even further by noticing that the
bytes argument won't be used after we call cipher. So we can re-use this slice instead of allocating a new one!func cipher(bytes []byte, shift int) []byte {
shiftB := byte(shift)
for i, b := range bytes {
// shift b…
bytes[i] = b
}
return bytes // optional: callers could simply reuse the argument
}Final words
The code we obtain is more readable, definitely more efficient; and has the same behavior as yours. However, there are still two potential issues with it:
- Encoding: we're considering every byte of the input separately, as if it were ASCII; so it will probably not do what you want if the input is in UTF-8.
- Input size: if the input has very long lines, then using
bufio.NewScanneris a bad idea; see for example this SO question.
Code Snippets
var (
shiftF = flag.Int("shift", 13, "Cipher shift")
decodeF = flag.Bool("decode", false, "Decode input")
)
func main() {
…
}func main() {
flag.Parse()
// we could also directly change the value of the flag, but I find it
// less readable — it's better to treat flag values as immutable.
var shift int
if *decode {
shift = 26 - *shiftF
} else {
shift = *shiftF
}
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
cipher(scanner.Text(), shift)
}
}func main() {
…
for scanner.Scan() {
fmt.Printf("%s\n", cipher(scanner.Bytes(), shift))
}
}
func cipher(bytes []byte, shift int) []byte {
var line []byte
for _, b := range bytes {
if b >= 'A' && b <= 'Z' {
b = byte((int(b-'A')+shift)%26 + 'A')
} else if b >= 'a' && b <= 'z' {
b = byte((int(b-'a')+shift)%26 + 'a')
}
line = append(line, b)
}
return line
}shiftB := byte(shift)
b = ((b-'A')+shiftB)%26 + 'A'line := make([]byte, 0, len(bytes))Context
StackExchange Code Review Q#131677, answer score: 3
Revisions (0)
No revisions yet.