HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Why is this RC4 code in Racket so slow?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thiswhyslowrc4coderacket

Problem

According to shootout.alioth.debian.org, Racket is not much slower than C#, at least in the same order of magnitude due to its JIT compiler.

However, I'm seeing at least two degrees of magnitude slowdown in my implementation of the RC4 cipher in C# vs. Racket.

(Disclaimer: I am not using RC4, in this case a hilariously puny 16-bit version, to ensure any type of confidentiality. It is only used as a fast traffic obfuscating layer - that's why I need it to be fast!)

My (Typed) Racket code is here: https://github.com/quantum1423/Kirisurf-Official/blob/master/src/arcfour.rkt

The relevant portion:

;Returns a stateful function which encrypts one byte of data with the given key
(define (make-rc4 keyd)

  (define key keyd)

  (define S
     (make-bytes 256 0))
  (for ([i 256])
    (bytes-set! S i i))
  (define j 0)
  (for ([i 256])
    (set! j
          (unsafe-fxmodulo
           (unsafe-fx+ j 
                       (unsafe-fx+ (bytes-ref S i)
                                   (bytes-ref key (unsafe-fxmodulo i (unsafe-bytes-length key)))))
         256))
    (define tmp (unsafe-bytes-ref S i))
    (unsafe-bytes-set! S i (bytes-ref S j))
    (unsafe-bytes-set! S j tmp))

  (set! j 0)
  (define i 0)

  (: Sr (Integer -> Integer))
  (define (Sr x) (unsafe-bytes-ref S x))

  (: toret (Integer -> Integer))
  (define (toret c)
    (set! i (unsafe-fxmodulo (unsafe-fx+ 1 i) 256))
    (set! j (unsafe-fxmodulo (unsafe-fx+ j (Sr i)) 256))
    (define temp (Sr i))
    (unsafe-bytes-set! S i (Sr j))
    (unsafe-bytes-set! S j temp)
    (bitwise-xor (Sr (unsafe-fxmodulo (unsafe-fx+ (Sr i) (Sr j)) 256)) c))
  toret)


My old C# code is here: https://github.com/quantum1423/Kirisurf-Official/blob/a04136569025f046872c3cf7fd07955f60bf49dc/Tunnel.cs (scroll to line 398 for the RC4State object)

To put things in perspective, my C# code is always I/O-bound and uses imperceptible CPU. My Racket code vrooms my CPU full and pulls an incredibly slow 700 KiB each second.

Solution

Here are some tips that may help. Note that these are collected from feedback on #racket on irc.freenode.net (which you may find helpful too).

  • The length of the bytestring is counted inside the loop, but it's a loop invariant that you can hoist out of the loop.



  • Folding over j may be faster than mutating it as you are doing. See Mutation and Performance in the Guide for more details.



  • Your for loops use generic sequences. Using specialized sequences like in-range would be faster.



You might also benefit from trying the Optimization Coach plugin for DrRacket. It's a tool that identifies performance pitfalls in your program along with possible solutions. For Typed Racket, it can also find when the type-driven optimizer was not able to optimize the program but could potentially do so with some changes.

Also try the Racket mailing list: http://www.racket-lang.org/community.html

Context

StackExchange Code Review Q#31521, answer score: 3

Revisions (0)

No revisions yet.