HiveBrain v1.2.0
Get Started
← Back to all entries
patternrustModerate

Rust Brainfuck interpreter

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rustinterpreterbrainfuck

Problem

I took the code from kostyas benchmarks for the Rust Brainfuck interpreter and tried to optimize it. There is also a discussion on Reddit about the poor performance of Rust in the Benchmark.

Before my improvements the code needed 16.81s to complete the benchmark and used 6.2Mb of memory, after that the code only needs 4.89s to run but the memory consumption is nearly unchanged.

How can I improve the code and maybe reduce memory usage?

The file with my changes is also on Github.

```
use std::fs::File;
use std::path::Path;
use std::io::prelude::*;
use std::vec::Vec;
use std::io;
use std::env;
use std::collections::BTreeMap;

struct Tape {
pos: usize,
tape: Vec
}

impl Tape {
fn new() -> Tape { Tape { pos: 0, tape: vec![0] } }
fn get(&self) -> isize { self.tape[self.pos] }
fn getc(&self) -> u8 { self.get() as u8 }
fn inc(&mut self) { self.tape[self.pos] += 1; }
fn dec(&mut self) { self.tape[self.pos] -= 1; }
fn advance(&mut self) { self.pos += 1; if self.tape.len() 0 { self.pos -= 1; } }
}

struct Program {
code: Vec,
bracket_map: BTreeMap
}

impl Program {
fn new(content: Vec) -> Program {
let mut code = Vec::new();
let mut bracket_map = BTreeMap::new();
let mut leftstack = Vec::new();

for (pc, b) in content.iter().filter(|&&x| x == b'+' || x == b'-' || x == b'.' || x == b','
|| x == b'' || x == b'[' || x == b']').map(|&x| x).enumerate() {
if b == b'[' {
leftstack.push(pc);
} else if b == b']' {
if let Some(left) = leftstack.pop() {
bracket_map.insert(left, pc);
bracket_map.insert(pc, left);
}
}
code.push(b);
}
Program{ code: code, bracket_map: bracket_map }
}

fn run(&self) {
let mut pc: usize = 0;
let mut tape = Tape::new();
let mut stdout = io::stdout();

while pc tape.inc(),
b'-' => tape.dec(),
b'>' => tape.advance(),
b' tape.devance(),
b'[' => { if tape.get() == 0 { pc = self.brack

Solution

Profile

You can only improve what you can measure. So first of all let us run callgrind to check where we spent most of our time:

$ rustc -C opt-level=3 -g brainfuck.rs
$ valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes ./brainfuck ahpla.bf
$ callgrind_annotate callgrind.out.*


We will end up with something similar to the following profile:

--------------------------------------------------------------------------------
Profile data file 'callgrind.out.29071' (creator: callgrind-3.11.0)
--------------------------------------------------------------------------------
I1 cache: 32768 B, 64 B, 8-way associative
D1 cache: 32768 B, 64 B, 8-way associative
LL cache: 16777216 B, 64 B, 16-way associative
Timerange: Basic block 0 - 10458028484
Trigger: Program termination
Profiled target: ./brainfuck ahpla.bf (PID 29071, part 1)
Events recorded: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Events shown: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Event sort order: Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
Thresholds: 99 0 0 0 0 0 0 0 0
Include dirs:
User annotated:
Auto-annotation: off

--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
--------------------------------------------------------------------------------
49,487,884,719 13,086,961,954 6,269,022,579 2,411 3,281 1,553 2,167 2,109 1,422 PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw file:function
--------------------------------------------------------------------------------
16,380,206,099 5,113,395,424 1,588,909,048 12 4 1 12 4 . brainfuck.rs:brainfuck::main::h21788f29898deec2 [/home/zeta/misc/brainfuck]
14,242,423,355 2,080,022,568 3,380,036,673 2 0 0 2 . . /checkout/src/liballoc/btree/search.rs:alloc::btree::search::search_tree::h9cb21dc425abeb49 [/home/zeta/misc/brainfuck]
4,593,384,291 895,564,901 241 6 0 0 6 . . /checkout/src/liballoc/vec.rs:brainfuck::main::h21788f29898deec2
2,917,813,479 . . . . . . . . /checkout/src/libcore/slice/mod.rs:alloc::btree::search::search_tree::h9cb21dc425abeb49
2,426,711,760 808,903,920 0 1 1 0 1 . . /checkout/src/libcore/cmp.rs:alloc::btree::search::search_tree::h9cb21dc425abeb49
1,848,909,595 1,848,909,577 0 3 0 0 3 . . /checkout/src/liballoc/raw_vec.rs:brainfuck::main::h21788f29898deec2
1,560,016,932 780,008,466 260,002,823 1 1 0 1 . . /checkout/src/libcore/option.rs:brainfuck::main::h21788f29898deec2
1,560,016,926 520,005,642 0 1 0 0 1 . . /checkout/src/liballoc/btree/node.rs:alloc::btree::search::search_tree::h9cb21dc425abeb49
1,300,014,107 260,002,821 260,002,823 . . . . . . /checkout/src/liballoc/btree/map.rs:brainfuck::main::h21788f29898deec2
1,097,796,554 . . . . . . . . /checkout/src/libcore/iter/mod.rs:alloc::btree::search::search_tree::h9cb21dc425abeb49
1,040,011,284 260,002,821 780,008,463 . . . . . . /checkout/src/liballoc/btree/node.rs:brainfuck::main::h21788f29898deec2
260,003,033 260,002,823 207 0 0 3 0 0 1 /checkout/src/libcore/ptr.rs:brainfuck::main::h21788f29898deec2

Your program executes a total of 49,487,884,719 instructions. Only 16,380,206,099 of those originate from your main, the rest originates from other functions. We only have (semi-)direct control over 30% of the used instructions. That's bad.

Bottlenecks

Unfortunately rustc inlines Program::new, Program::run and all Tape functions. Just to make sure that new() doesn't take a large part of the program let's remove run for a second and check again:

$ sed -i 's/.run();/;/'
$ rustc -C opt-level=3 -g brainfuck.rs
$ time ./brainfuck ahpla.bf

real 0m0.002s
user 0m0.004s
sys 0m0.000s


As we could have guessed, new() takes almost no time. run() and the BTreeMap's are the culprits.

Memory usage

What about memory usage? Well, that part has improved on its own. Your code uses 1.6 MB on my system. Apparently BTreeMap's implementation has been greatly improved regarding memory usage.

Use rustfmt and clippy

While rustfmt's output is up to personal preference, clippy can provide valuable input. For example map(|&x| x) is cloned.

Use easy to read code for non-bottleneck functions

Your filter dominates new:

for (pc, b) in content.iter().filter(|&&x| x == b'+' || x == b'-' || x == b'.' || x == b','
            || x == b'' || x == b'[' || x == b']').map(|&x| x).enumerate()


That's mouthful and hard to maintain. Since this isn't the performance critical part, let's change it:

`

Code Snippets

for (pc, b) in content.iter().filter(|&&x| x == b'+' || x == b'-' || x == b'.' || x == b','
            || x == b'<' || x == b'>' || x == b'[' || x == b']').map(|&x| x).enumerate()
for (pc, b) in content.iter().filter(|x| b"+-,.[]<>".contains(x)).cloned().enumerate()
struct Program {
    code: Vec<u8>,
    jumps: Vec<usize>,
}

impl Program {
    fn new(content: &[u8]) -> Program {
        let code: Vec<u8> = content
            .iter()
            .filter(|x| b"-+,.[]<>".contains(x))
            .cloned()
            .collect();

        let mut jumps = vec![0; code.len() + 1];
        let mut leftstack = Vec::new();

        for (pc, &b) in code.iter().enumerate() {
            if b == b'[' {
                leftstack.push(pc);
            } else if b == b']' {
                if let Some(left) = leftstack.pop() {
                    jumps[left] = pc;
                    jumps[pc] = left;
                }
            }
        }
        Program { code, jumps }
    }

    fn run(&self) {
        let mut pc: usize = 0;
        let mut tape = Tape::new();
        let mut stdout = io::stdout();

        while pc < self.code.len() {
            match self.code[pc] {
                b'+' => tape.inc(),
                b'-' => tape.dec(),
                b'>' => tape.advance(),
                b'<' => tape.devance(),
                b'[' => {
                    if tape.get() == 0 {
                        pc = self.jumps[pc];
                    }
                }
                b']' => {
                    if tape.get() != 0 {
                        pc = self.jumps[pc];
                    }
                }
                b'.' => {
                    stdout.write(&[tape.getc()]).unwrap();
                    stdout.flush().unwrap()
                }
                _ => unreachable!(),
            }
            pc += 1;
        }
    }
}

Context

StackExchange Code Review Q#92843, answer score: 10

Revisions (0)

No revisions yet.